Oh, Tay, Can You See?

Microsoft put up a speech-bot name ‘Tay’ on Twitter last week, and it took less than twenty-four hours for it to become a sexist Nazi. While labelled as “artificial intelligence,” Tay did not actually understand what it was saying – it merely parroted the speech of other users. On the /4chan/pol feed, that includes a lot of dialog that most of us would consider inappropriate.

What distresses is that Microsoft hoped to have Tay demonstrate the conversational skills of a typical teenager. Well, maybe it did!

In a recent dialog on the “liar Clinton,” I probed for specific proof, and received back the standard Fox News sound bites. When I described the Congressional hearings on Bengazi, the accuser had the grace to be chastened. This is typical of so much of our political dialog: people parrot sayings without availing themselves of access to the official forums in which real information is exchanged. The goal is to categorize people as “us” or “other,” with the goal of justifying arrangements for the distribution of power that benefit the “us.”

Donald Trump is a master of this political practice. Apparently his campaign doesn’t do any polling. He simply puts up posts on Facebook, and works the lines that people like into his speeches.

So I worry: did Microsoft actually succeed in its demonstration? Most American teenagers don’t understand the realities of the Holocaust or the difficulties of living under a totalitarian regime. In that experiential vacuum, do they actually evolve dialog in the same way that Tay did – with the simple goal of “fitting in?”

Somewhat more frightening is that Donald Trump appears to employ algorithms not too different from Tay’s. For God’s sake, this man could be president of the most powerful country in the world! He’s got to have more going on upstairs than a speech bot!

Fortunately, many teenagers, when brought into dialog regarding offensive speech, actually appreciate receiving a grounding in fact. You’d hope that our politicians would feel the same.

CLR’ing Away the .NET

When Microsoft first began offering “component” technology to the world, it was a result of unmanaged programming. I don’t mean “unmanaged” in the technical sense (more on that later). I mean “unmanaged” in the original sense: after smashing its office suite competition by tying a full-featured release of Office to Windows 3.1, Microsoft found its development teams competing with each other. When Word users asked for limited spread-sheet capabilities, the Word developers began recreating Excel. When Outlook users asked for sophisticated text formatting for the e-mails, the development team began recreating Word.

Now this reflects two things. The first and most significant is that the code base created by the application teams was not modular. The Word team should have been able to lift the basic spread-sheet engine right out of the Excel code base. The second was that Microsoft had a schizophrenic developer base. Visual Basic developers enjoyed the features of an interpreted language that allowed them to modify code during execution, while Visual C++ developers enjoyed the benefits of high-speed execution. Unfortunately, those two worlds didn’t talk well to each other. C++ uses ‘0’ to locate the first item in a list, while VB uses ‘1.’

Microsoft’s attempt to bridge these gulfs was their Component Object Model. As a response to the duplication of code, it was a little heavy: COM enabled users to add an Excel spreadsheet in Word, but at the cost of starting a full instance of Excel in the background, doubling the memory requirements. By contrast, I saw a demonstration of IBM SOM Objects at a software engineering conference in 1997 that brought a spreadsheet into a text editor while adding only 3% to the memory burden.

At that conference, IBM touted a world in which a graduate student could write add-ins to popular business applications. This was obviously not in the interests of Microsoft, whose dominance of the office application market fueled its profits. This was evident in their implementation of COM. When adding a new component to the operating system, the component registers its services (in the “Windows Registry,” of course). Microsoft published its Office services so that developers of enterprise applications could automatically create Word documents and Excel spreadsheets. That should have meant that other teams could create alternative implementations of those interfaces. To hobble that strategy, Microsoft did not include a reverse lookup capability in its registry. In other words, if you wanted to let your user pick which dictionary to use for a spell-checker, there was no way to find out which installed components provided a “Dictionary” service. You had to walk the entire registry and ask each component in turn whether it was a “Dictionary.” This was not a cheap operation: when I tried it in 1998, it took almost a minute.

On top of this, Microsoft biased its component technology to the VB developer, assuming that C++ developers were sophisticated enough to work around the inconsistencies. This was not an minor burden. What took three lines of code in VB could take a page in C++.

However, COM and its successor DCOM were the latest shiny toy, and many C++ developers flocked to the technology, even for pure C++ implementations. I was scandalized, because C++ had its own methods for creating reusable modules, methods that lay at the foundations of COM and DCOM underneath the cruft of type conversions and named method invocations. I finally found an article on MSDN that warned that COM and DCOM should only be used for systems that were configured dynamically. This included, famously, Visual Basic, host to a rich market of third-party user interface controls (known as “ActiveX” controls). But Microsoft’s advice was not heeded by the industry, and even today I am repackaging COM components as dynamically loaded libraries (DLLs) that publish native C++ objects.

I must admit that over time the work demanded of the C++ developer has moderated. Visual Studio can generate C++ interfaces using a COM “type library,” and allows users to decorate a class declaration with symbols that allow tools to automatically generate the COM wrappers that publish code to VB.

Unfortunately, the field tilted against the C++ developer when Microsoft introduced its .NET technology. One of the major charges leveled against C++ over the years is that developers need to explicit manage the resources consumed by their programs. Memory in particular is a bugaboo, and one of the major challenges of writing a complex C++ application is ensuring that memory doesn’t “leak.” This frustration was catered to by the creators of Java and other “managed” languages (including Microsoft’s C#). Unfortunately, it encourages the fiction that memory is the only resource that developers need to manage, a fiction that is addressed explicitly in the documentation of Java and C# class libraries that open network connections or access external components such as databases.

Be that as it may, Microsoft had to decide whether to continue to support new technologies, such as HTML 5 and XML, upon the fundamental foundations of the machine architecture, or within the higher-level abstractions of the managed world. The overwhelming popularity of managed languages drove the choice. Microsoft no longer delivers full-featured libraries for C++ developers. For a long time, they could only access those features through the clumsy methods of COM programming.

This came to a head for my team last year when trying to implement a new feature that required parsing of XML files. A demonstration application was quickly written in C#, but as the effort to access that from our C++ code was prohibitive, we went looking for a third-party XML library. We couldn’t find one that did the job.

The lack of support for C++ libraries has created some distressing contradictions. C++ developers have always been proud to claim that code written in C++ runs twice as fast as code written in a managed language. Recent studies reveal, however, that processing a large file, such as sensor data produced by a networked device or log files from a cloud applications, is dominated by the time it takes to read the file. The C++ libraries appear to take twice as long to read the file as the C# libraries.

Driven by this evidence to seek new methods for using .NET libraries in C++ code, I finally came upon Microsoft’s C++/CLI or CLR technology. In CLR, the developer has direct control over whether his objects are managed or unmanaged. This means that the speed of C++ execution can be leveraged when necessary, while also allowing access to the rich .NET libraries maintained by Microsoft. Originally CLR was advanced as a technology for migrating C++ applications to the .NET libraries, but it turned out that there were too many inconsistencies between the run-time environment established for native C++ applications and .NET applications.

But what about those of use with a million lines of code that runs within the native C++ execution environment? Is there no bridge?

I am glad to report that we have found it. It is to create a CLR library that exports unmanaged objects using the classic DLL methods that COM originally supplanted. The unmanaged objects wrap managed .NET components, and use C++/CLI methods to convert between C++ and .NET data types.

I am certain that there are some constraints on this strategy, particularly when trying to integrate .NET and C++ components in user interfaces or attempting to ensure data consistency during simultaneous execution in both environments. But for simple operations that drop into the managed world temporarily, it seems to work just fine.

And I find some joy in coming full circle, with only a few lines of code being able once again to write code as a C++ developer should, rather than as a second-class citizen in a market targeting to developers solving far simpler problems than I confront every day.

Wish You Were There

Google has recently announced a “photo location” service that will tell you where a picture was taken. They have apparently noticed that every tourist takes the same photos, and so if they have one photo tagged with location, they can assign that location to all similar photos.

I’m curious, as a developer, regarding the nature of the algorithms they use. As a climate change alarmist, I’m also worried about the energy requirements for the analysis. It turns out that most cloud storage is used to store our selfies (whether still or video). Over a petabyte a day is added to YouTube, with the amount expected to grow by a factor of ten by 2020. A petabyte is a million billion bytes. By contrast, the library of Congress can be stored in 10 terabytes, or one percent of what is uploaded daily to YouTube.

Whatever Google is doing to analyze the photos, there’s just a huge amount of data to process, and I’m sure that it’s a huge drain on our electricity network. And this is just Google. Microsoft also touts the accumulation of images as a driver for growth of its cloud infrastructure. A typical data center consumes energy like a mid-size city. To reduce the energy costs, Microsoft is considering deployment of its compute nodes in the ocean, replacing air conditioning with passive cooling by sea water.

But Google’s photo location service suggests another alternative. Why store the photos at all? Rather than take a picture and use Google to remind you where you were, why not tell Google where you were and have it generate the picture?

When I was a kid, the biggest damper on my vacation fun was waiting for the ladies to arrange their hair and clothing when it came time to take a photo. Why impose that on them any longer? Enjoy the sites, relax, be yourself. Then go home, dress for the occasion, and send up a selfie to a service that will embed you in a professional scenery photo, adjusting shadows and colors for weather and lighting conditions at the time of your visit.

It might seem like cheating, but remember how much fun it was to stick your face in those cut-out scenes on the boardwalk when you were a kid? It’s really no different than that. And it may just save the world from the burdens of storing and processing the evidence of our narcissism.

Up in the Cloud

Information Systems, the discipline of organizing computers and software resources to facilitate decision-making and collaboration, is undergoing a revolution. The opportunity is allowed by cheap data storage and high-speed networking. The necessity is driven by the unpredictability of demand and the threat of getting hacked. These factors have driven the construction of huge data and compute centers that allow users to focus on business solutions rather than the details of managing and protecting their data.

As a developer, this proposition is really attractive to me. I’m building a sensor network at home, and I’d like to capture the data without running a server full time. I’d also like to be able to draw upon back-end services such as web or database servers without having to install and maintain software that is designed for far more sophisticated operations.

The fundamental proposition of the cloud is to create an infrastructure that allows we as consumers to pay only for the data and software that we actually use. In concept, it’s similar to the shift from cooking on a wood-fired stove fed by the trees on our lot to cooking on an electric range. Once we shift to electricity, if we decide to open a restaurant, we don’t have to plan ahead ten years to be certain that we have enough wood, we just pay for more electricity. Similarly, if I want to develop a new solution for home heating control, I shouldn’t have to pay a huge amount of money for software licenses and computer hardware up front – that should be borne by the end-users. And, just as a chef probably doesn’t want to learn a lot about forestry, so I shouldn’t have to become an expert in administration of operating systems, databases and web servers. Cloud services promise to relieve me of that worry.

It was in part to assess the reality of that promise that I spent the last two days at Microsoft’s Cloud Road Show in Los Angeles. What I learned was that, while they pursue the large corporate customers, Microsoft is still a technology-driven company, and so they want to hear that they are also helping individual developers succeed.

But there were several amusing disconnects.

Satya Nadella took the helm at Microsoft following Steve Balmer’s debacles with Windows 8 and Nokia. Balmer was pursuing Apple’s vision of constructing a completely closed ecosystem of consumer devices and software. Nadella, head of the Azure cloud services effort, blew the top off of that plan, declaring that Microsoft would deliver solutions on any hardware and operating system that defined a viable market. Perversely, what I learned at the roadshow was that Microsoft is still very much committed to hardware, but not the kind of hardware you can carry on your person. Rather, it’s football fields stacked three-high with shipping containers full of server blades and disk drives, each facility drawing the power consumed by a small city. None of the containers belongs to a specific customer (actually the promise is that your data will be replicated across multiple containers). They are provisioned for aggregate demand of an entire region, running everything from a WordPress blog to global photo-sharing services such as Pinterest.

This scale drives Microsoft to pursue enterprise customers. This is a threat to established interests – large data centers are not an exportable resource, and so provide a secure and lucrative source of employment for their administrators. But that security comes with the pressure of being a bottleneck in the realization of others’ ambitions and a paranoid mind-set necessary to avoid becoming the latest major data-breach headline. The pitch made at the roadshow was that outsourcing those concerns to Microsoft should liberate IT professionals to solve business problems using the operations analysis software offered with the Azure platform.

To someone entering this magical realm, however, the possibilities are dizzying. At a session on business analytics, when asked what analysis package would be best to use for those looking to build custom algorithms, the response was “whatever tool your people are familiar with.” This might include R (preferred by statistics professionals) or Python (computer science graduates) or SQL (database developers). For someone looking to get established, that answer isn’t comforting.

But it reveals something else: Microsoft is no longer in the business of promoting a champion – they are confident that they have built the best tools in the world (Visual Studio, Office, Share Point, etc.). Their goal is to facilitate delivery of ideas to end customers. Microsoft also understands that means long-term maintenance of tightly coupled ecosystems where introduction of a malfunctioning algorithm can cost tens of millions of dollars, and viruses billions.

But what about the little guy? I raised this point in private after a number of sessions. My vision of the cloud is seeded by my sons’ experience in hacker communities, replete with “how-to” videos and open-source software modules. I see this as the great hope for the future of American innovation. If a living space designer in Idaho can source production of a table to a shop in Kentucky with a solid guarantee of supply and pricing comparable to mass-produced models, then we enter a world in which furniture showrooms are a thing of the past, and every person lives in a space designed for their specific needs. As a consumer, the time and money that once would have been spent driving around showrooms and buying high-end furniture is invested instead in a relationship with our designer (or meal planner, or social secretary).

Or how about a “name-your-price” tool for home budgeting? If you’ve got eighty dollars to spend on electricity this July, what should your thermostat setting be? How many loads of laundry can you run? How much TV can you watch? What would be the impact of switching from packaged meals to home-cooked? Can I pre-order the ingredients from the store? Allocate pickup and preparation time to my calendar?

Development of these kinds of solutions is not necessarily approachable at this time. The low-end service on Azure runs about $200 a month. From discussion, it appears that this is just about enough to run a Boy Scout Troop’s activity scheduling service. But I am certain that will change. Microsoft responded to the open-source “threat” by offering development tools and services for free to small teams. Their Azure IoT program allows one sensor to connect for free, with binary data storage at less than twenty dollars a month.

At breakfast on Wednesday, I shared some of these thoughts with a Microsoft solutions analyst focused on the entertainment industry. I ended the conversation with the admission that I had put on my “starry-eyed philosopher” personality. He smiled and replied “You’ve given me a lot to think about.” It was nice to spend some time with people that appreciate that.