I have seemingly plunged myself into a running debate with Robert X. Cringely about the finer points of p2p telephony and NAT traversal. I would first like to acknowledge the remarkable breadth of Cringely´s technical knowledge. An early Apple employee and a longtime savvy observer of technology, Cringely somehow has a strong grasp of the highly specialized technology underlying VoIP. His range is startling.
That said, Cringely continues to stumble over the finer technical details. I only bring this up because the fundamental problems with Skype lie in those details, as I´ll explain. They are what make Skype a “closed garden” and a detriment to the “open infrastructure” I´ve advocated. Cringely puts forth an explanation of Skype´s NAT traversal, asserting that Skype uses STUN, TURN, and ICE servers to do the heavy lifting. There are several problems with this assertion. First, there´s no such thing as an “ICE server”. ICE is a client-side protocol that makes use of STUN and TURN “candidate” endpoints to establish the best possible connection between two peers. Second, and most importantly, Skype doesn´t implement any single one of these protocols regardless. While Cringely likely understands this, his post makes no reference to this key distinction.
For the uninitiated, STUN, TURN, and ICE allow clients to traverse NATs. They are all IETF drafts that continue to change frequently, and they are typically used alongside SIP to power VoIP. This is true for almost every VoIP provider, except for Skype. Skype unfortunately chose to implement a proprietary version of each one, breaking interoperability with other VoIP providers. This makes Skype much like your cell phone in the U.S., where you typically cannot switch cell phone providers while keeping the same phone. If you switch from Sprint to Verizon, for example, you have the joy of putting your $200 phone in a box in your closet or going through the hassle of selling it on eBay. Skype has given us a similar gift. You could never use Skype software with Vonage or Gizmo, for example. If Skype used SIP, TURN, STUN, and ICE, you theoretically could.
Skype´s proprietary protocol is also an issue on web pages. On eBay, you can now press a hyperlink to call users over Skype. This link starts with “skype:” much as typical links start with “http:”. Links on web pages to initiate a phone call will become increasingly common. You could easily imagine links on MySpace pages for calling other users, for example, and some savvy MySpace users likely already have them. The problem is that every other VoIP provider uses SIP, which long ago standardized its own interoperable URIs that start with “sip:”. So, because Skype chose to implement proprietary versions of everything, you will likely have to choose between two links when making a call, one of the form “skype:afisk” and another of the form “sip:firstname.lastname@example.org”.
Imagine if web servers made a similar choice circa 1994. In this world, instead of every link starting with “http:”, some would start with “http:” and others would start with “mywretchedprotocol:”. All browsers would have to support both. What a nightmare! We have Skype to thank for starting us along that path with VoIP.
The implications of this issue go further. While SIP certainly has its problems (why did they ever include UDP?), the carefully designed interworking of the SIP family of protocols is a thing of beauty. SIP does not depend on STUN or TURN or ICE, for example, just as STUN does not depend on TURN or SIP or ICE, etc. This allows each protocol to evolve independently and allows different developers to implement different parts of the protocol family. One open source project can simply write a STUN server, for example, while another could write a SIP server, and another a TURN client. In the end, the user gets better software because engineers can break apart the problem and focus on implementing one piece well. And they´re all documented in Internet drafts that anyone can read. Skype´s use of proprietary protocols butchers this system.
Because of its careful engineering, SIP can also carry any type of traffic over any type of transport. You can use SIP to transfer RSS feeds using HTTP over TCP as LittleShoot does, for example. Or you can just make a phone call. SIP is simply the signaling protocol used to establish the connection. Skype doesn´t have anywhere near this flexibility.
Skype´s decision to use a closed protocol has security implications as well. When calls are routed through supernodes, for example, there´s a built-in “man-in-the-middle” that can monitor all traffic. Skype encrypts calls, but do they use both server and client authentication to prevent the man-in-the-middle from launching a replay attack? If they don´t, then it´s theoretically possible for an attacker to become a supernode to listen to all of your calls. As a closed protocol, Skype isn´t open to public scrutiny in the security community that could otherwise identify and fix such vulnerabilities. There could be people implementing this exploit to monitor and decrypt your Skype calls right now. While one independent security audit claims Skype does implement both client and server authentication, this is one person evaluating their architecture as opposed to the throngs of security experts who would be eager to identify holes if the system were open. We just don´t know.
These issues all point to the importance of an open infrastructure and to the power of SIP as a bedrock of the next generation of Internet applications. As people like Vint Cerf have noted, SIP may be to the next ten years what HTTP was to the last ten, unless Skype gets in the way and everything degenerates into a battle of ugly proprietary implementations of the same thing.
I choose to believe that good engineering wins in the end. Protocols like SIP, HTTP, and XMPP will enable a new generation of far more powerful applications capable of seamlessly circumventing NATs and pooling resources to put the maximum possible power in the hands of the user.