Chris Holmes and Architectures of Participation

August 30, 2006

My good friend Chris Holmes’s recent Tech Talk to Google is now available on Google video. Chris’s work touches on a lot of things, but you can think of it as helping to implement an open standards and open source-based infrastructure for things like Google Maps and Google Earth. You should check out his thoughts.

I get all excited when Chris talks about open standards as a cornerstone of democracy. With the web changing rapidly, we all need to remember this lesson. The web itself was based on the simple open architecture of HTTP and HTML. Analogous standards exist for geographic data. Chris’s work focuses on expanding the web platform to also support geographic data, much as my work focuses on expanding the web platform to support P2P.

I’ll write more about “architectures of participation” in the future. While “Web 2.0″ is a much catchier name, I think “architectures of participation” clears up a lot of the confusion surrounding these issues. I also think it digs deeper. A lot of the Web 2.0 thinking focuses on collaboration on the level of individual web sites. I have no problem with that, and I just love collaborative projects like Wikipedia. There’s a distinct lack of discussion about how architectures of participation at the standards layer enables all of this, though, I think because more people understand web sites than the standards driving them.

Wikipedia would, of course, never exist if we didn’t have HTTP and HTML. HTTP and HTML are really quite simple protocols, but look what they’ve enabled! Imagine what could happen if we really started growing the protocol layer of the web, integrating things like geographic standards and SIP onto standard web projects. What could collaborative projects do atop a more powerful infrastructure? I’m not sure, but it’s a question we should be taking a harder look at.


Daswani on LimeWire Complaint

August 29, 2006

Susheel Daswani has posted another excellent piece in his series about the recent RIAA action against LimeWire.  Susheel worked with me at LimeWire before moving on to study law.

The case will be fascinating to watch.  At LimeWire, we really did all wake up one day to find ourselves in the entertainment business, much to our displeasure.  As I have discussed, the Gnutella developer community was always much more interested in the technology.  Many of the most active members, like Serguei Osokine, Phillipe Verdy, or Raphael Manfredi had absolutely no commercial interest in Gnutella other than as a theoretical and technical exercise.  Susheel and I worked extensively with Serguei to write the current standards for distributed search on Gnutella.  That just always seemed so much more important than the copyright dust up, and it’s sad that it’s come to this.


Skype and Click To Call

August 29, 2006

Om Malik posted a fascinating piece about eBay pushing Skype as the standard protocol for “click-to-call”, the process of clicking on a hyperlink to initiate a VoIP call.  As I mentioned last week, Skype’s push of its proprietary protocol for click-to-call is as if Yahoo decided to introduce a separate standard for HTTP circa 1994.  Imagine if half of all hyperlinks started with “http:” while the other half started with “yahoo:”.  Every browser and every web server would have to implement both.  SIP is today’s HTTP.  It powers VoIP with the almost singular exception of Skype.  Its well-architected and widely implemented in open source projects, just like HTTP was 10 years ago.

The picture gets uglier.  Skype is a proprietary protocol.  EBay is pushing this standard to lock out all the other players.  Imagine if we only had one web browser and one web server from a single company today because the protocols they were proprietary.  This would have set the Internet back years. 

I predict this attempt will fail.  It ignores the importance of open protocols as the glue of the Internet, as the bedrock for the competition that makes it all work.  While the Internet is built on Apache and Linux, it’s also built on the IETF. 


BitTorrent: Old Technology in a New Box

August 21, 2006

The myth of BitTorrent goes something like this: Bram Cohen, hacker extraordinaire, realized circa 2001 that it would be more efficient to break files up into pieces on different servers and to download those pieces separately. This would distribute the load across multiple servers, providing a more robust architecture for accessing the file. The trouble is, the practice was common well before BitTorrent came on the scene. Cohen simply wrote another implementation of a technology that had already become commonplace in the P2P community. The first implementation I know of was Justin Chapweske´s work on SwarmCast in 2000. As I remember it, Justin´s creativity pointed the way for us all.

Heck, we even released swarm downloading in LimeWire long before BitTorrent ever made a public release, as I first announced here. I wrote almost none of the downloading code, but my old LimeWire buddies Chris Rohrs and Sumeet Thadani have more of a claim to having “invented” swarm downloading than Bram Cohen. LimeWire´s also an open source project, and we were working on the swarming implementation as early as January of 2001, as you can see from the CVS logs. Cohen didn´t even start working on it at all until May of 2001. What´s more, it never occurred to us at LimeWire to think of it as a new idea because, well, it wasn´t.

Why do I care? It´s just that it keeps coming up, most recently in the O´Reilly e-mail from a couple of days ago seeking ETech 2007 participants, where they describe “BitTorrent’s use of sufficiently advanced resource locators and fragmented files” as the type of new innovation they´re looking for. I was a history major in college (in addition to computer science), so these things matter to me. Cohen himself perpetuates the myth, most blatantly on the BitTorrent web site where it says: “While it wasn’t clear it could be done, Bram wanted to enable effective swarming distribution – - transferring massive files from server to client with the efficiency of peer-to-peer — reliably, quickly and efficiently.” The fact is, it was clear it could be done because people like Justin and us over at LimeWire had already done it!

The Wired article on Cohen from January 2005 takes the cake, though. The article says “Cohen realized that chopping up a file and handing out the pieces to several uploaders would really speed things up.” Again, he “realized” it because he saw that others were already doing it. They go on to describe how traditional file sharing networks are “slow because they suffer from supply bottlenecks. Even if many users on the network have the same file, swapping is restricted to one uploader and downloader at a time.” It´s all just blatantly wrong.

Now, don´t get me wrong. I love BitTorrent. I think BitTorrent is amazing and a perfect example of the kind of enabling technology that makes us all more free. It offers the clearest hope for a future of media distribution beyond the inadequate cable and network broadcast model we see today. It´s just that BitTorrent´s innovation was far less sexy. BitTorrent worked because it did less, not because it had any amazing new ideas. BitTorrent took what many p2p applications were already doing and scrapped most of it. BitTorrent scrapped search. It didn´t bother with a fully connected network. It didn´t worry about file management. It just took the downloading component and packaged it up nicely. Cohen realized that the downloading technology alone was all people wanted or needed in many cases, and that the tricky distributed search part was often unnecessary. Hats off — it has really changed the way many of us think about technology.

That said, BitTorrent was old technology in a new package. The innovation was in the packaging.


Cringely, Skype, Open Infrastructure

August 16, 2006

I have seemingly plunged myself into a running debate with Robert X. Cringely about the finer points of p2p telephony and NAT traversal. I would first like to acknowledge the remarkable breadth of Cringely´s technical knowledge. An early Apple employee and a longtime savvy observer of technology, Cringely somehow has a strong grasp of the highly specialized technology underlying VoIP. His range is startling.

That said, Cringely continues to stumble over the finer technical details. I only bring this up because the fundamental problems with Skype lie in those details, as I´ll explain. They are what make Skype a “closed garden” and a detriment to the “open infrastructure” I´ve advocated. Cringely puts forth an explanation of Skype´s NAT traversal, asserting that Skype uses STUN, TURN, and ICE servers to do the heavy lifting. There are several problems with this assertion. First, there´s no such thing as an “ICE server”. ICE is a client-side protocol that makes use of STUN and TURN “candidate” endpoints to establish the best possible connection between two peers. Second, and most importantly, Skype doesn´t implement any single one of these protocols regardless. While Cringely likely understands this, his post makes no reference to this key distinction.

For the uninitiated, STUN, TURN, and ICE allow clients to traverse NATs. They are all IETF drafts that continue to change frequently, and they are typically used alongside SIP to power VoIP. This is true for almost every VoIP provider, except for Skype. Skype unfortunately chose to implement a proprietary version of each one, breaking interoperability with other VoIP providers. This makes Skype much like your cell phone in the U.S., where you typically cannot switch cell phone providers while keeping the same phone. If you switch from Sprint to Verizon, for example, you have the joy of putting your $200 phone in a box in your closet or going through the hassle of selling it on eBay. Skype has given us a similar gift. You could never use Skype software with Vonage or Gizmo, for example. If Skype used SIP, TURN, STUN, and ICE, you theoretically could.

Skype´s proprietary protocol is also an issue on web pages. On eBay, you can now press a hyperlink to call users over Skype. This link starts with “skype:” much as typical links start with “http:”. Links on web pages to initiate a phone call will become increasingly common. You could easily imagine links on MySpace pages for calling other users, for example, and some savvy MySpace users likely already have them. The problem is that every other VoIP provider uses SIP, which long ago standardized its own interoperable URIs that start with “sip:”. So, because Skype chose to implement proprietary versions of everything, you will likely have to choose between two links when making a call, one of the form “skype:afisk” and another of the form “sip:afisk@lastbamboo.org”.

Imagine if web servers made a similar choice circa 1994. In this world, instead of every link starting with “http:”, some would start with “http:” and others would start with “mywretchedprotocol:”. All browsers would have to support both. What a nightmare! We have Skype to thank for starting us along that path with VoIP.

The implications of this issue go further. While SIP certainly has its problems (why did they ever include UDP?), the carefully designed interworking of the SIP family of protocols is a thing of beauty. SIP does not depend on STUN or TURN or ICE, for example, just as STUN does not depend on TURN or SIP or ICE, etc. This allows each protocol to evolve independently and allows different developers to implement different parts of the protocol family. One open source project can simply write a STUN server, for example, while another could write a SIP server, and another a TURN client. In the end, the user gets better software because engineers can break apart the problem and focus on implementing one piece well. And they´re all documented in Internet drafts that anyone can read. Skype´s use of proprietary protocols butchers this system.

Because of its careful engineering, SIP can also carry any type of traffic over any type of transport. You can use SIP to transfer RSS feeds using HTTP over TCP as LittleShoot does, for example. Or you can just make a phone call. SIP is simply the signaling protocol used to establish the connection. Skype doesn´t have anywhere near this flexibility.

Skype´s decision to use a closed protocol has security implications as well. When calls are routed through supernodes, for example, there´s a built-in “man-in-the-middle” that can monitor all traffic. Skype encrypts calls, but do they use both server and client authentication to prevent the man-in-the-middle from launching a replay attack? If they don´t, then it´s theoretically possible for an attacker to become a supernode to listen to all of your calls. As a closed protocol, Skype isn´t open to public scrutiny in the security community that could otherwise identify and fix such vulnerabilities. There could be people implementing this exploit to monitor and decrypt your Skype calls right now. While one independent security audit claims Skype does implement both client and server authentication, this is one person evaluating their architecture as opposed to the throngs of security experts who would be eager to identify holes if the system were open. We just don´t know.

These issues all point to the importance of an open infrastructure and to the power of SIP as a bedrock of the next generation of Internet applications. As people like Vint Cerf have noted, SIP may be to the next ten years what HTTP was to the last ten, unless Skype gets in the way and everything degenerates into a battle of ugly proprietary implementations of the same thing.

I choose to believe that good engineering wins in the end. Protocols like SIP, HTTP, and XMPP will enable a new generation of far more powerful applications capable of seamlessly circumventing NATs and pooling resources to put the maximum possible power in the hands of the user.


LimeWire and the Napster Curse

August 11, 2006

Shawn Fanning’s 1999 release of Napster forever associated peer-to-peer technology with music piracy, and we all bear the burden of that curse today.  Why do I call it a curse?  Because p2p can and will be used in far more powerful ways that for distributing copyrighted works, and its inclusion in the incredibly boring copyright squabbles is downright disrespectful to the technology’s potential.

I will continue to hammer home the point that peer-to-peer is about efficiently pooling the resources of all internetworked computers around the world.  Folding@home is a fantastic example of this potential.  So is Skype, and so is Vonage.  Vonage, you say?  That’s a new one.  Why would I include Vonage in a list of p2p applications?  Because Vonage is a service for connecting your computer directly with the person you’re calling, just as Napster connected your computer directly to the person you were downloading from.  The RIAA does not want you to think about Vonage when you think about p2p.  They want you to think about Napster, to “tar [all p2p applications] with the ‘Napster’ brush”, as Joshua Wattles put it in the MGM v. Grokster amicus brief in the district court.  As critical thinkers, we need to break out of this cage.

Now, Robert Cringely might disagree with my use of the term “peer-to-peer” in this context.  The wide variety of applications in the wild smudges the lines delineating peer-to-peer, grid computing, and distributed computing, however.  My purpose is not to say I am correct versus Cringely in my definition of peer-to-peer.  Peer-to-peer on some level means whatever I or Cringely or anyone else says it means.  I am talking about p2p as any system that enables a direct network connection between peers that were exclusively clients in the traditional client/server architecture.  I think the definition has to be this general given the breathtaking diversity of applications in the field.

Nevertheless, Napster shined a spotlight on what to me is one of the most boring aspects of p2p — using it to distribute the overwhelmingly putrid crap coming out of Hollywood and the music studios.  If the first p2p application to explode in the mainstream had been a distributed computing project that found a cure for breast cancer or if SETI@home had actually discovered extra-terrestrial life, the rhetoric surrounding the technology would be far different today.  Instead we started in the gutter with Napster.

The RIAA’s suit against LimeWire marks the latest chapter in this saga.  Their case is built on “inducement”, the idea that LimeWire as a company has historically encouraged users to use the software for copyright infringement.  Now, the MGM v. Grokster decision is more nuanced than that, but I’ll keep it simple for now.  In the RIAA’s version of reality, LimeWire sought to inherit the user base of Napster and to capitalize on the ability of the software to be used for infringement.

This could not be further from the truth.  When my colleagues and I began work on LimeWire in the summer of 2000, we set out to break Napster’s curse.  We saw far more potential in the technology.  Far from building a new tool for music piracy, we instead generalized the Gnutella protocol for commerce.  We imagined users searching for everything from apartment listings to new recipes on Gnutella, much the way many now use SPARQL, REST, or Amazon’s OpenSearch for searching a variety of web services simultaneously.  We added XML schemas to Gnutella requests and responses to enable this shift, and we created the “Lime Peer Server” that businesses could use to serve XML search results over Gnutella.  One real estate company even used the system for a short time.  Despite our best efforts, it never took off.

Why didn’t the peer server work?  What about the other types of searches our XML schemas enabled?  Well, LimeWire was always a generalized tool, agnostic to file types.  Much as you can attach any type of file to an e-mail, users could share any type of file using Gnutella.  P2p technology is particularly effective at sharing large files because you can remove the burden from a single server and instead download files using many different peers.  It just so happens that from the year 2000 to 2006 the vast majority of large files on the Internet were copyrighted media files.  As a result, that’s what users primarily shared on LimeWire.

Now, compare this for a moment to the RIAA’s inducement claims.  Our entire business model for LimeWire was based on selling servers and on getting paid to route traffic to business clients.  Making money off of LimeWire the program was never on our radar.  How, then, can the RIAA claim that we induced users to use the software for infringing purposes?  The fact was that LimeWire users used it to infringe despite the fact that our entire business model had nothing to do with infringement.

Stepping back a bit further, there has always been a disconnect between the p2p developer community and users of file sharing software.  The p2p developer community has always been interested in things like how to search millions of computers in real time or how to ensure trust between peers.  “Oops, I Did it Again” just never seemed quite as interesting.

The p2p community is also far larger than the developers at file sharing companies.  Putting aside the engineers at Vonage for a moment, the p2p community includes the folks at Microsoft Research working on Pastry, the fantastic work coming out of the Stanford Peers Group, as well as researchers at MIT, Rice, Berkeley, and almost every major research university in the world with a computer science department.  Why is this community so vast?  Because of the potential for p2p to harness the world’s collective computing capacity more efficiently to solve the world’s most pressing problems.  Using this technology to make “Pirates of the Caribbean: Dead Man’s Chest” more universally available is not high on the priority list for the researchers at Microsoft just as it wasn’t at the forefront of my mind when I worked on LimeWire.  I’d greatly prefer it if users were more interested in contributing their computing cycles to the understanding of protein folding or to distributing their own creative works than they were in downloading Jenna Jameson’s latest.

The fact is, we are all still living with Napster’s legacy, with the RIAA attempting to draw a straight line from Napster through Aimster, Grokster, Kazaa, and now LimeWire.  The world is more complicated than that, and we are too sophisticated to believe that storyline despite its attractive simplicity.


Cringely Doesn’t Understand P2P

August 2, 2006

Mark Stephens, a.k.a. Robert X. Cringely, the guy who seems to get everything, just doesn’t seem to understand peer-to-peer. His recent post “The Skype is Falling” misses the mark several times. As Yochai Benkler has noted so clearly, p2p is about computers collaborating to share resources, much as Wikipedia is about humans collaborating to share knowledge. On Wikipedia, different people have different levels of knowledge on different topics. That’s what makes it work so well! Specialists in different areas on Wikipedia don’t have to worry about the things they don’t know about — they just contribute what they can. Together, they combine the best of human knowledge for everyone’s maximum benefit.

Peer-to-peer is at its most powerful when it does precisely the same thing — when each peer contributes as much as it’s capable of contributing. Just like humans, computers come in many shapes and sizes. Some have 200 GB hard drives and a dial-up modem. Others are cell phones with no hard drive and little memory, but with bandwidth to spare. Well-architected p2p networks take this into account, harvesting whatever resources are available at any time for the maximum benefit of the network as a whole.

Computers without firewalls and not behind NATs are one of the most precious resources on p2p networks because they supply services others can’t. Because anyone can connect to them, they serve as the glue holding any p2p network together. Without these nodes, most p2p networks simply would not function. Perhaps most importantly, they facilitate connections between NATted/firewalled nodes, allowing any node on a network to theoretically connect to any other.

This is where Cringely just doesn’t get it. He goes into detail describing how Skype uses “servers” to facilitate NAT traversal between peers. He points to the use of servers as somehow making Skype not p2p. The fact is, Skype uses distributed non-firewalled peers as “servers” to allow other peers to connect. This is ahh, well, precisely like every other p2p network on the planet. This architecture could not be more peer-to-peer. In fact, this is p2p at its best – the network is harvesting all available peer resources dynamically.

Cringely claims that “a lot of Skype connections aren’t p2p at all” because of this server interaction and that these servers need to have a “surplus of bandwidth to handle the conversation relay.” This is where his thesis really starts to unravel. He apparently does not understand that the servers are simply used as a signaling protocol to relay contact information between two peers. The call itself typically happens directly between the two computers using UDP NAT hole punching, a practice that VoIP really brought into mainstream use but that is also used in various games and in most p2p networks. There are certainly cases where the hole punching fails, such as with trying to connect 2 peers both behind symmetric NATs, but the call is typically direct. If it weren’t, call quality would often be horrible.

This is a key difference because the vast majority of the bandwidth requirements for calls are for the call itself. The headers exchanged for establishing the call are negligible. I would hazard a guess that 99% of the packets transferred in VoIP calls around the globe are voice packets, not headers. These packets never touch the server unless UDP hole punching fails. It’s an open question as to what Skype does when UDP hole punching fails, but I’d actually be surprised if they were even using “supernodes” in this case, likely instead routing their calls through their own servers. I just don’t think supernodes would be able to provide enough bandwidth for high enough call quality.

Far from the sort of faux p2p Cringely describes, Skype’s use of non-firewalled nodes to negotiate voice sessions between two or more peers is p2p at its finest. Now, don’t get me wrong, I have plenty of problems with Skype’s use of a proprietary protocol instead of SIP, and I’d prefer them to be open source, but this part of Cringely’s analysis just doesn’t make any sense.


Rivers 2.0

August 2, 2006

As many economic historians have observed, the rise of the British Empire depended largely on the abundance of rivers connecting inland manufacturing zones in England to markets around the world. Jeffrey Sachs and others have observed the inverse today, namely that many of the world’s poorest countries are landlocked, dramatically reducing their ability to generate wealth through participation in the global economy. Countries such as Bolivia, Peru, and much of sub-Suharan Africa have struggled to emerge from extreme poverty in part as a result of this phenomenon.

The Internet functions like a river, connecting people to the global economy. High-speed Internet connections allow services to flow in and out of a country, navigable waterways or no. That’s not to say that geographic problems disappear, but the networked information economy can level the playing field, particularly for many of the world’s most impoverished regions where engaging in global trade is otherwise nearly impossible simply because there is little or no way to get to the ocean.

Outsourcing to India has been the most salient example of this phenomenon, resulting in a dramatic, although asymmetric, rise in GNP. The same trend has yet to emerge in many of the world’s most impoverished countries, however, largely because they often lack the initial resources required for the infrastructure investments for high-speed connections.

In this light, the growing number of low-cost connectivity projects become all the more vital, such as Berkeley’s TIER project, the Akshaya Network, and Green WiFi. Akshaya is one of the most interesting in that it’s using low-cost, primarily wireless technologies to connect India’s poorest regions, attempting to bring them along as India emerges out of poverty. These projects combined with the continued efforts of the United Nation’s Millennium Project hold tremendous promise for overcoming these geographic barriers.

None of this is meant to prioritize connectivity in developing regions over providing for basic human needs such as health care, food, housing, and education. Connectivity does provide many unseen benefits, however, such as providing access to health care resources, techniques to improve crop yields, etc. It also can allow communities to directly engage in Internet-based trade, such as weavers or coffee cooperatives in Guatemala establishing web storefronts to sell their goods directly. Connectivity, English language proficiency, access to computers, and technological literacy remain the primary barriers to entry, but the potential is there to pull millions of people out of extreme poverty. Let’s get on it!


Follow

Get every new post delivered to your Inbox.