Writers such as Jon Udell, Tim O’Reilly, and Yochai Benkler have alluded to the rise of the “open infrastructure” where open source and likely peer-to-peer projects provide the network primitives that can be combined and built upon to create services competing with everything from Akamai’s content delivery to Amazon S3′s data storage.
OK, I’m on board. Love it. Stepping back for a moment, though, this is really a new description of an idea that’s been around for some time under different guises. Sun’s JXTA project, for example, has for years attempted to deliver precisely the type of network primitives that would characterize such an open infrastructure, even providing for the type of “service delivery network” Udell envisions. JXTA is a pain to set up, though, and the project is simply not as focused or coordinated as it could be. It also has not had a killer use case to put it over the top. I say this as someone who has participated actively in the JXTA community and as a genuine fan of the project.
The Globus project took this step into an open infrastructure years ago as well, emerging from the world of grid computing. Globus allows developers to use the grid for everything from data backup to distributed processing. New terminology can nudge old technologies over the hump into the mainstream consciousness, however, as we’ve seen in the last several years with AJAX. Could Globus be to “open infrastructure” what LightStreamer is to “AJAX”? It’s possible.
OK, so let’s forge ahead. What are the key components of an open infrastructure? In many ways, an open infrastructure resembles an open, networked operating system. Much like an operating system, the open infrastructure would provide access to CPU, disk space, memory, and network resources. The p2p/grid computing world offers all of these. To compete with the Googles and the Yahoos of the world, what Udell calls the “galactic clusters”, an open infrastructure system needs to leverage p2p’s resource pooling. Projects like Globus emerged from the academic world, from the desire of researchers with access to supercomputers to share their processing power. Projects like SETI@home have demonstrated how much further this idea can go if we bring edge resources into the fold, creating the world’s largest supercomputer from networking hundreds of thousands of much less powerful computers. SETI@home or even the more generalized BOINC don’t quite meet the open infrastructure demand, however, as they demonstrate the specific use case of distributed processing with a central, coordinating node. To utilize all edge resources, we need a more generalized system that does not rely on this centralized coordination and that can fulfill any task. The problem then becomes the heterogeneous nature of Internet hosts, particularly the fact that most nodes are behind Network Address Translators (NAT)s or firewalls. NATs cut off a node’s resources, preventing it from contributing to the pool. That’s where Session Initiation Protocol (SIP) steps in. With resounding success in VoIP, SIP and with the associated STUN and ICE protocols provide a robust and generalized way for two or more users to connect on the Internet regardless of their specific NAT or firewall configuration.
So, an open infrastructure needs p2p to be most effective, and P2P needs SIP. Together, the possibility emerges of a generalized system where the collective bandwidth, CPU, memory, and disk space resources of every internetworked computer on the planet can be dynamically combined to perform arbitrary tasks. Other protocols and specifications such as XMPP, RDF, and SPARQL would also likely play vital roles in such a system, as would distributed hash tables, but I’ll get more into that later.
As Udell points out, the open infrastructure would closely parallel the pooling of knowledge resources we see in Wikipedia or the collaborative filtering of Slashdot. In this case, though, we’re collectively sharing the resources of the computers themselves. The computers are doing the collaborating.