When Microsoft first began offering “component” technology to the world, it was a result of unmanaged programming. I don’t mean “unmanaged” in the technical sense (more on that later). I mean “unmanaged” in the original sense: after smashing its office suite competition by tying a full-featured release of Office to Windows 3.1, Microsoft found its development teams competing with each other. When Word users asked for limited spread-sheet capabilities, the Word developers began recreating Excel. When Outlook users asked for sophisticated text formatting for the e-mails, the development team began recreating Word.
Now this reflects two things. The first and most significant is that the code base created by the application teams was not modular. The Word team should have been able to lift the basic spread-sheet engine right out of the Excel code base. The second was that Microsoft had a schizophrenic developer base. Visual Basic developers enjoyed the features of an interpreted language that allowed them to modify code during execution, while Visual C++ developers enjoyed the benefits of high-speed execution. Unfortunately, those two worlds didn’t talk well to each other. C++ uses ‘0’ to locate the first item in a list, while VB uses ‘1.’
Microsoft’s attempt to bridge these gulfs was their Component Object Model. As a response to the duplication of code, it was a little heavy: COM enabled users to add an Excel spreadsheet in Word, but at the cost of starting a full instance of Excel in the background, doubling the memory requirements. By contrast, I saw a demonstration of IBM SOM Objects at a software engineering conference in 1997 that brought a spreadsheet into a text editor while adding only 3% to the memory burden.
At that conference, IBM touted a world in which a graduate student could write add-ins to popular business applications. This was obviously not in the interests of Microsoft, whose dominance of the office application market fueled its profits. This was evident in their implementation of COM. When adding a new component to the operating system, the component registers its services (in the “Windows Registry,” of course). Microsoft published its Office services so that developers of enterprise applications could automatically create Word documents and Excel spreadsheets. That should have meant that other teams could create alternative implementations of those interfaces. To hobble that strategy, Microsoft did not include a reverse lookup capability in its registry. In other words, if you wanted to let your user pick which dictionary to use for a spell-checker, there was no way to find out which installed components provided a “Dictionary” service. You had to walk the entire registry and ask each component in turn whether it was a “Dictionary.” This was not a cheap operation: when I tried it in 1998, it took almost a minute.
On top of this, Microsoft biased its component technology to the VB developer, assuming that C++ developers were sophisticated enough to work around the inconsistencies. This was not an minor burden. What took three lines of code in VB could take a page in C++.
However, COM and its successor DCOM were the latest shiny toy, and many C++ developers flocked to the technology, even for pure C++ implementations. I was scandalized, because C++ had its own methods for creating reusable modules, methods that lay at the foundations of COM and DCOM underneath the cruft of type conversions and named method invocations. I finally found an article on MSDN that warned that COM and DCOM should only be used for systems that were configured dynamically. This included, famously, Visual Basic, host to a rich market of third-party user interface controls (known as “ActiveX” controls). But Microsoft’s advice was not heeded by the industry, and even today I am repackaging COM components as dynamically loaded libraries (DLLs) that publish native C++ objects.
I must admit that over time the work demanded of the C++ developer has moderated. Visual Studio can generate C++ interfaces using a COM “type library,” and allows users to decorate a class declaration with symbols that allow tools to automatically generate the COM wrappers that publish code to VB.
Unfortunately, the field tilted against the C++ developer when Microsoft introduced its .NET technology. One of the major charges leveled against C++ over the years is that developers need to explicit manage the resources consumed by their programs. Memory in particular is a bugaboo, and one of the major challenges of writing a complex C++ application is ensuring that memory doesn’t “leak.” This frustration was catered to by the creators of Java and other “managed” languages (including Microsoft’s C#). Unfortunately, it encourages the fiction that memory is the only resource that developers need to manage, a fiction that is addressed explicitly in the documentation of Java and C# class libraries that open network connections or access external components such as databases.
Be that as it may, Microsoft had to decide whether to continue to support new technologies, such as HTML 5 and XML, upon the fundamental foundations of the machine architecture, or within the higher-level abstractions of the managed world. The overwhelming popularity of managed languages drove the choice. Microsoft no longer delivers full-featured libraries for C++ developers. For a long time, they could only access those features through the clumsy methods of COM programming.
This came to a head for my team last year when trying to implement a new feature that required parsing of XML files. A demonstration application was quickly written in C#, but as the effort to access that from our C++ code was prohibitive, we went looking for a third-party XML library. We couldn’t find one that did the job.
The lack of support for C++ libraries has created some distressing contradictions. C++ developers have always been proud to claim that code written in C++ runs twice as fast as code written in a managed language. Recent studies reveal, however, that processing a large file, such as sensor data produced by a networked device or log files from a cloud applications, is dominated by the time it takes to read the file. The C++ libraries appear to take twice as long to read the file as the C# libraries.
Driven by this evidence to seek new methods for using .NET libraries in C++ code, I finally came upon Microsoft’s C++/CLI or CLR technology. In CLR, the developer has direct control over whether his objects are managed or unmanaged. This means that the speed of C++ execution can be leveraged when necessary, while also allowing access to the rich .NET libraries maintained by Microsoft. Originally CLR was advanced as a technology for migrating C++ applications to the .NET libraries, but it turned out that there were too many inconsistencies between the run-time environment established for native C++ applications and .NET applications.
But what about those of use with a million lines of code that runs within the native C++ execution environment? Is there no bridge?
I am glad to report that we have found it. It is to create a CLR library that exports unmanaged objects using the classic DLL methods that COM originally supplanted. The unmanaged objects wrap managed .NET components, and use C++/CLI methods to convert between C++ and .NET data types.
I am certain that there are some constraints on this strategy, particularly when trying to integrate .NET and C++ components in user interfaces or attempting to ensure data consistency during simultaneous execution in both environments. But for simple operations that drop into the managed world temporarily, it seems to work just fine.
And I find some joy in coming full circle, with only a few lines of code being able once again to write code as a C++ developer should, rather than as a second-class citizen in a market targeting to developers solving far simpler problems than I confront every day.