Where Google App Engine Spanks Amazon’s Web Services: S3, EC2, Simple DB, SQS

May 28, 2008

First off, I loooove Amazon Web Services (AWS), and we make heavy use of S3, EC2, Simple DB, and Elastic IPs for LittleShoot. We run everything on Amazon, but that’s about to change.  I’ve been in the App Engine beta for about a month, and despite all of the astonishing ways AWS makes building web sites easy, Google App Engine makes it far easier. Here’s why:

1) Google App Engine Has Better Scalability  

Google migrates your application across its infrastructure automatically as needed. With EC2, you have to manually detect machine load and bring instances up or down accordingly. You need to set up load balancing and clustering. While Amazon gives you far more control, it’s also far more work.  With Google App Engine, it’s all done for you.

2) Google App Engine Has a Better Database

Google App Engine’s Big Table blows Amazon’s Simple DB out of the water, and that’s coming from a big fan of Simple DB and Amazon CTO Werner Vogels. Simple DB thankfully yanks developers out of the relational database mindset and automatically replicates data across machines for scalability. You do have to learn a completely new query syntax, however, and, as this blog has noted, sorting is not officially supported. Simple DB is also still in beta.  

With App Engine, you’re using the same database, Big Table, Google engineers use to power some of the busiest sites on the Internet. Billions of queries have been hammering out kinks in Big Table for years. You know it will scale.  What’s more, App Engine’s “GQL” gives developers a familiar SQL-like syntax, lowering the learning curve compared to Simple DB.  Big Table also supports sorting.  Perhaps most significantly, Simple DB costs far more. While Google’s final pricing announcement later this year may change, today’s announcement didn’t mention any difference in price for data stored in the database versus anywhere else. On Simple DB, that data costs $1.50 per GB-month. On App Engine, it appears to cost $0.15 – $0.18 per GB-month. Wow.

3) Google App Engine is Cheaper 

Beyond the database, App Engine gives you the first 5 million or so page views per month for free.  That’s a lot of page views. It doesn’t put you up with the Internet’s top dogs, of course, but at 5 million page views you should be making cash. App Engine is free precisely when you’re building your company and keeping costs low is the most important. If you go beyond that 5 million, Google’s I/O event today will reveal newly announced prices that are remarkably similar to Amazon’s current offerings. They both price everything per GB or CPU-hour, and the numbers are barely distinguishable. That first 5 million page views and the apparent huge disparity in database storage pricing are by far the biggest differentiators, both dramatically tipping the scales in favor of Google.

Conclusion

For building scalable web applications quickly, App Engine beats AWS by a surprisingly wide margin.  Note, however, this refers specifically to web applications. For anything custom, you need Amazon. Because App Engine only supports Python, you also need Amazon for running any non-Python code. While this is a significant difference, many good developers are facile with multiple languages and can move rapidly between them.  Amazon’s flexibility makes it win out for many applications, but not for the most common application there is: web sites.  App Engine is more of a “domain-specific cloud” for web applications, but it’s shockingly good at what it does.  

Oh yeah, and it’s cheap.


P2P in Flash 10 Beta — the Questions Facing a YouTube, Skype, and BitTorrent Killer

May 21, 2008

As I’ve reported, the inclusion of P2P in Flash 10 Beta represents a fundamental disruption of the Internet platform. As with all disruptions, however, this one will progress in fits and starts. Flash 10’s details limit the full power of its P2P features. While features like VoIP will be fully enabled, it will take some ingenuity to turn Flash 10 into a more generalized P2P platform. Here are the issues:

1) Flash Media Server (FMS)

You’ll need Flash Media Server (FMS) to take advantage of Flash P2P. At $995 for the “Streaming Server” and $4,500 for the “Interactive Server”, FMS is beyond the reach of most developers working on their own projects, severely limiting Flash P2P’s disruptive potential. In an ideal world, the new P2P protocols would be openly specified, allowing open source developers to write their own implementations. As it stands now, a single company controls a potentially vital part of the Internet infrastructure, and encryption will likely thwart the initial reverse engineering efforts of open source groups like Red5.

2) No Flash Player in the Background

As David Barrett (formerly of Akamai/Red Swoosh) has emphasized on the Pho List, Flash Player only runs when it’s loaded in your browser. As soon as you navigate to another page, Flash can no longer act as a P2P server. P2P programs like Red Swoosh, BitTorrent, and LittleShoot don’t have this limitation, and it means Flash can’t save web sites as much bandwidth as those full-blown applications can. This limits but does not eliminate Flash’s threat to CDNs. Sure, you could get around this using AIR, but that creates another major barrier to adoption.

3) Usability

While Flash 10 has the ability to save files to your computer and to load them from your computer (essential for P2P), it pops up a dialog box each time that happens. While this is an important security measure, it cripples Flash 10’s ability to mimic BitTorrent because you’d have dialogs popping up all the time to make sure you as a user had authorized any uploads of any part of a file.

4) Limited APIs

While all the required technology is there in the Real Time Media Flow Protocol (RTMFP), ActionScript’s API limits some of the P2P potential of Flash 10. P2P downloading breaks up files into smaller chunks so you can get them from multiple other computers. Flash 10 can only save complete files to your computer — you can’t save in small chunks. As a result, you’d have to use ActionScript very creatively to achieve BitTorrent or LittleShoot-like distribution or to significantly lower bandwidth bills for sites serving videos. It might be possible, but you’d have to work some magic.

So, that’s the deal. There’s still a lot more documentation coming our way from Adobe, so there are undoubtedly useful nuggets yet to be discovered.

Even given all these limitations, however, the key point to remember is the Internet has a new, immensely powerful protocol in its arsenal: Matthew Kaufman and Michael Thornburgh’s Real Time Media Flow Protocol (RTMFP). While Flash might use it primarily for direct streaming between two computers now (think VoIP), it introduces the potential for so much more.

Keep your helmet on.


P2P in Flash 10 Beta – a YouTube, Skype, and BitTorrent Killer

May 16, 2008

The inclusion of p2p in the Flash 10 beta threatens to bring down everyone from YouTube to Skype. Using P2P, Flash sites will be able to serve higher quality video than YouTube at a fraction of the cost. Meanwhile, the combination of the Speex audio codec and the Real Time Media Flow Protocol (RTMFP) will enable sites to seamlessly integrate VoIP without requiring a Skype install. The impact of this change is hard to fathom. We’re talking about a fundamental shift in what is possible on the Internet, with Flash demolishing almost all barriers to integrating P2P on any site.

Hank Williams and Om Malik have discussed the potential for Flash 10 to be used for P2P CDNs, and they’re largely right on. The biggest problem I see with P2P CDNs is oddly latency, however. While P2P theoretically enables you to choose copies of content closer to you on the network, you still have to negotiate with a server somewhere to establish the connection (for traversing NATs), nullifying the P2P advantage unless you’re talking about really big files. As Hank identifies, the sites serving large files are the CDN’s best customers, so we are talking about a significant chunk of the CDN business up for grabs. That said, CDNs could easily start running Flash Media Servers themselves with integrated RTMFP. They’ve already addressed the server locality problem, and taking advantage of Flash deployments would simply be an optimization. Whether the CDNs will realize this shift has taken place before it’s too late is another question.

To me, the really vulnerable players are the video sites themselves and anyone in the client-side VoIP space. Writing a VoIP app is now equivalent to writing your own Flash video player. All the hard stuff is already done. Same with serving videos. You no longer have to worry about setting up an infinitely scalable server cluster — you just offload everything to Flash. No more heavy lifting and no more huge bandwidth bills. In the BitTorrent case, it’s mostly a matter of usability. As with Skype, you no longer need a separate install. Depending on what’s built in to the Flash Media Server, you also no longer need to worry about complicated changes on the server side, and downloads will happen right in the browser.

The stunning engineering behind all of this should be adequately noted. The Real Time Media Flow Protocol (RTMFP) underlies all of these changes. On closer inspection, RTMFP appears to be the latest iteration of Matthew Kaufman and Michael Thornburgh’s Secure Media Flow Protocol (SMP) from Adobe’s 2006 acquisition of Amicima. Adobe appears to have acquired Amicima specifically to integrate SMP into Flash, now in the improved form of RTMFP. This is a very fast media transfer protocol built on UDP with IPSec-like security and congestion control built in. The strength of the protocol was clear to me when Matthew first posted his “preannouncement” on the p2p hackers list. Very shrewd move on Adobe’s part.

Are there any downsides? Well, RTMFP, is for now a closed if breathtakingly cool protocol, and it’s tied to Flash Media Server. That means Adobe holds all the cards, and this isn’t quite the open media platform to end all platforms. If they open up the protocol and open source implementations start emerging, however, the game’s over.

Not that I have much sympathy, but this will also shift a huge amount of traffic to ISPs, as ISPs effectively take the place of CDNs without getting paid for it. While Flash could implement the emerging P4P standards to limit the bleeding at the ISPs and to further improve performance, this will otherwise eventually result in higher bandwidth bills for consumers over the long term. No matter — I’d rather have us all pay a little more in exchange for dramatically increasing the numbers of people who can set up high bandwidth sites on the Internet. The free speech implications are too good to pass up.

Just to clear up some earlier confusion, Flash Beta 10 is not based on SIP or P2P-SIP in any way. Adobe’s SIP work has so far only seen the light of day in Adobe Pacifica, but not in the Flash Player.


Ian Clarke’s Freenet 0.7 Released

May 9, 2008

After 3 years of development, the latest version of Freenet is here. This version protects users from persecution for even using Freenet, let alone for the content they’re distributing. Freenet is a vital tool against censorship, particularly in countries like China where freedom of speech is often severely curtailed. For the unfamiliar, here’s the quick description of Freenet from their site:

Freenet is free software which lets you publish and obtain information on the Internet without fear of censorship. To achieve this freedom, the network is entirely decentralized and publishers and consumers of information are anonymous. Without anonymity there can never be true freedom of speech, and without decentralization the network will be vulnerable to attack.

Congratulations to Ian Clarke, Matthew Toseland, and the other the Freenet developers. The quote on the Freenet site epitomizes the importance of the project:

“I worry about my child and the Internet all the time, even though she’s too young to have logged on yet. Here’s what I worry about. I worry that 10 or 15 years from now, she will come to me and say ‘Daddy, where were you when they took freedom of the press away from the Internet?'”
–Mike Godwin, Electronic Frontier Foundation

Freenet is a vital weapon in that war.

I’m also excited to have Ian as a new addition to the LittleShoot advisory board, one of many things we’ll be making more announcements about soon.  I’ve always had a great respect for Ian’s emphasis on p2p’s importance as a politically disruptive tool for free speech.  We all got caught up in the copyright wars and missed the big picture, but not Ian.


Decentralized Twitter a Bad Idea

May 5, 2008

Michael Arrington’s post on creating a decentralized Twitter is theoretically interesting but practically naive.  Martin Fowler’s First Law of Distributed Computing continually rings true: don’t distribute your objects.  Why?  Because it’s hard.  In every case I’ve ever seen, it’s orders of magnitude harder to distribute a task than it is to centralize it.

Search is the quintessential example.  When I wrote the search algorithms and protocols for the second-generation Gnutella network at LimeWire, the degree of coordination between millions of network nodes was staggering.  It worked amazingly well all things considered, but it still could never compete with the speed or collaborative filtering of centralized search.  Most importantly, it could never compete with simplicity of centralization.  What took us 6 months to distribute would have taken a couple of days to centralize.  Distributed networks also make updating much harder, as you’re now forced to update every instance of your software running on any machine in the world.  It’s a pain in the a$$.  If you have a choice, centralized search wins every time.  That’s one of the reasons LittleShoot follows Fowler’s law whenever possible.  Something about working better and taking less time makes it appealing.

Distributed computing has shown itself to be particularly useful for moving around large files.  In Twitter’s case, you’re working from the opposite extreme: processing a high volume of tiny messages.  This screams centralization.

Centralization is not the reason Twitter can’t scale.  They can’t scale because, well, they just haven’t written an architecture that scales.  Granted the mobile space is still largely DIY, so they have to roll much of their own code.  That’s really a pretty lame excuse, though, especially given their resources and the time they’ve had to figure it out.  My buddies over at Mobile Commons face similar issues processing huge volumes of mobile messages, and they don’t have these issues.  I’m convinced Twitter would be flying if Ben Stein and Chris Muscarella at Mobile Commons were in charge, and I honestly think it’s because they’re just better programmers.

I’m a big Evan Williams fan and was thrilled to meet him for the first time down at SXSW, but that doesn’t mean I have faith in his or his team’s ability to learn how to scale a complex architecture overnight.  A distributed architecture would make their task orders of magnitude harder.  Don’t do it fellas.


Follow

Get every new post delivered to your Inbox.