Dojo 1.2 and Django 1.0 on Google App Engine 1.1.3

September 17, 2008

Important Update: Use Google App Engine Patch for Django integration instead. It integrates Django seamlessly and includes lots of other goodies. For Dojo, also consider using xdomain and loading Dojo from Google’s AJAX Libraries API. For Google App Engine Patch, see my more recent post. This area changes quickly, so check the dates on any blog posts you’re looking at!

With the release of Google App Engine 1.1.3 (GAE), it’s now much more realistic to serve up Dojo on GAE. LittleShoot uses a lot of Dojo (mostly because it rocks), so getting Dojo up and running on our new GAE LittleShoot port has been a high priority.  

The problem is that vanilla Dojo quickly runs up against GAE’s 1000 file limit per application. Django has the same problem. Our initial solution was to run a custom build that would delete a lot of unnecessary files, with thanks to Peter Higgins from SitePen for lighting our path. This got our build down to about 700 files, solving the problem for the basic LittleShoot site.

We’re getting ready to release the LittleShoot platform, however, that allows any site to detect if LittleShoot’s installed and to call the LittleShoot “P2P 2.0” API if it’s available. That means we need external sites to be able to load our JavaScript, i.e. we need to support Dojo’s cross-domain builds. The cross-domain build adds about another 500 files, pushing us again over the 1000 file limit.

Thankfully, GAE 1.1.3 introduces “zipserve,” allowing you to serve static files directly from a zip file. Guido van Rossum deployed a similar technique for serving Django for his fantastic Rietveld code review tool and GAE demo app, but that only works for loading python code. So we copied Guido for loading Django 1.0, and now zipserve allows us to do more or less the same thing for Dojo.  

To serve Dojo using zipserve, you basically modify your app.yaml to include something like the following:

– url: /dojo/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

– url: /dijit/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

– url: /dojox/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

 

This will look for dojo.zip, dijit.zip, and dojox.zip files in your *top-level* directory, the same directory containing app.yaml. You need to separate the zip files to avoid running up against the GAE limit on file sizes (1MB I believe). The zip file name basically acts like a directory in App Engine’s URL resolution. So, when a request goes to http://beta.littleshoot.org/dojo/dojo.js, for example, it loads dojo.js from dojo.zip (not from the dojo directory within dojo.zip). Here’s our script for generating the zips:

#!/usr/bin/env bash

function die()
{
  echo $*
  exit 1
}

cd static/js
dirs=”dojo dijit dojox littleshoot”

for x in $dirs
do
  cd $x || die “Could not cd to $x”
  echo “Building zip for $x”
  zf=../../../$x.zip
  rm $zf
  zip -rv $zf * || die “Could not create zip”
  cd ..
  # We actually delete all the contents because we don’t want to 
  # include them in GAE.
  rm -rf $x
done

exit 0

Even with separating the zip files, dijit and dojox will still come close to the 1MB limit on file sizes.  We use rsync excludes to get rid of the files we don’t need.  Here’s our bash rsync excludes function, which you can customize to your liking (careful not to exclude anything you need!!):

function excludeRsync
{
  rsync –exclude .svn/ \
        –exclude test/ \
        –exclude tests/ \
        –exclude demo/ \
        –exclude demos/ \
        –exclude soria/ \
        –exclude nihilo/ \
        –exclude grid/ \
        –exclude charting/ \
        –exclude util/ \
        –exclude analytics/ \
        –exclude collections/ \
        –exclude README* \
        –exclude *.psd \
        –exclude *.uncompressed.js \
        –exclude *.commented.css \
        –exclude dijit/templates \
        –exclude dijit/form/templates \
        –exclude dijit/layout/templates \
        –exclude *silverlight* \
        –exclude gfx3d/ \
        –exclude dojo/_base/ \
        –exclude dojo/_base.js \
        –exclude dojo/build.txt \
        –exclude functional/ \
        –exclude off/ \
        –exclude presentation/ \
        –exclude sketch/ \
        –exclude storage/ \
        –exclude wire/ \
        –exclude data/ \
        –exclude dtl/ \

        -avz $tmpDir/js $releaseDir || die “Could not sync”

}

We call the rsync script after running our dojo build.  The dojo build puts everything in the “$tmpDir/js” directory listed on the last line of the rsync.

OK, so there are still some hoops to jump through, but it works!  On the Django 1.0 side, I’d highly recommend copying Guido’s Rietveld settings.  The Google App Engine Django Helper is also useful, but Django 1.0 support hasn’t been released as of this writing.  We’re using Guido’s settings for now — basically the Makefile, make_release.sh, settings.py, and main.py.  Oh, I also tweaked his rietveld.py script, turning it into littleshoot.py.  You can find all of these files in the Rietveld svn here.  Most of the tweaks to these files are pretty obvious when you glance at the scripts — nothing super complicated going on.  Not that it’s a breeze, but each files is individually fairly straightforward.  The scripts will also create a zipped version of django from the django directory, so using svn:externals (described below) is handy.

We load both Django and Dojo with svn:externals. You might be on information overload at this point, but here’s another little nugget if you’re still with me. You can run the following in the respective directories where you want Django and Dojo to reside.

svn –editor-cmd=vim pe svn:externals .

For Dojo, enter the following:

dojo http://svn.dojotoolkit.org/src/tags/release-1.2.0b2/

For Django, try:

django http://code.djangoproject.com/svn/django/trunk/django

That will put Dojo 1.2 beta 2 in the “dojo” directory and the Django trunk in the “django” directory when you run “svn up”.  

If you successfully navigate through all of that, the end result is the latest Dojo and the latest Django running on the spanking new Google App Engine 1.1.3. Web app setups just don’t get any sweeter. If you’re a total geek like me, that’s living the good life! Sad, I know…

Advertisements

Vote for ‘P2P 2.0’ Panel at SXSW 2009!

August 22, 2008

I submitted “P2P 2.0 and the Future of Digital Media” as a panel for SXSW 2009.  Please, please, please vote for my panel.  It’s going to rock.

The panel will include a group of leaders in the P2P world who are together ushering in a new generation of P2P applications that will turn media distribution on its head, enabling any web site to seamlessly integrate P2P.  These applications enable sites to include high resolution videos, for example, streamed directly through the browser but using P2P CDNs behind the scenes.

Users on these new sites won’t even know they’re using P2P.  They’ll just know they’re seeing some amazing content. 

Web site owners will be able to give their users higher resolution files than ever before while lowering their bandwidth bills.

Vote early, vote often.


P2P in Flash 10 Beta — the Questions Facing a YouTube, Skype, and BitTorrent Killer

May 21, 2008

As I’ve reported, the inclusion of P2P in Flash 10 Beta represents a fundamental disruption of the Internet platform. As with all disruptions, however, this one will progress in fits and starts. Flash 10’s details limit the full power of its P2P features. While features like VoIP will be fully enabled, it will take some ingenuity to turn Flash 10 into a more generalized P2P platform. Here are the issues:

1) Flash Media Server (FMS)

You’ll need Flash Media Server (FMS) to take advantage of Flash P2P. At $995 for the “Streaming Server” and $4,500 for the “Interactive Server”, FMS is beyond the reach of most developers working on their own projects, severely limiting Flash P2P’s disruptive potential. In an ideal world, the new P2P protocols would be openly specified, allowing open source developers to write their own implementations. As it stands now, a single company controls a potentially vital part of the Internet infrastructure, and encryption will likely thwart the initial reverse engineering efforts of open source groups like Red5.

2) No Flash Player in the Background

As David Barrett (formerly of Akamai/Red Swoosh) has emphasized on the Pho List, Flash Player only runs when it’s loaded in your browser. As soon as you navigate to another page, Flash can no longer act as a P2P server. P2P programs like Red Swoosh, BitTorrent, and LittleShoot don’t have this limitation, and it means Flash can’t save web sites as much bandwidth as those full-blown applications can. This limits but does not eliminate Flash’s threat to CDNs. Sure, you could get around this using AIR, but that creates another major barrier to adoption.

3) Usability

While Flash 10 has the ability to save files to your computer and to load them from your computer (essential for P2P), it pops up a dialog box each time that happens. While this is an important security measure, it cripples Flash 10’s ability to mimic BitTorrent because you’d have dialogs popping up all the time to make sure you as a user had authorized any uploads of any part of a file.

4) Limited APIs

While all the required technology is there in the Real Time Media Flow Protocol (RTMFP), ActionScript’s API limits some of the P2P potential of Flash 10. P2P downloading breaks up files into smaller chunks so you can get them from multiple other computers. Flash 10 can only save complete files to your computer — you can’t save in small chunks. As a result, you’d have to use ActionScript very creatively to achieve BitTorrent or LittleShoot-like distribution or to significantly lower bandwidth bills for sites serving videos. It might be possible, but you’d have to work some magic.

So, that’s the deal. There’s still a lot more documentation coming our way from Adobe, so there are undoubtedly useful nuggets yet to be discovered.

Even given all these limitations, however, the key point to remember is the Internet has a new, immensely powerful protocol in its arsenal: Matthew Kaufman and Michael Thornburgh’s Real Time Media Flow Protocol (RTMFP). While Flash might use it primarily for direct streaming between two computers now (think VoIP), it introduces the potential for so much more.

Keep your helmet on.


P2P in Flash 10 Beta – a YouTube, Skype, and BitTorrent Killer

May 16, 2008

The inclusion of p2p in the Flash 10 beta threatens to bring down everyone from YouTube to Skype. Using P2P, Flash sites will be able to serve higher quality video than YouTube at a fraction of the cost. Meanwhile, the combination of the Speex audio codec and the Real Time Media Flow Protocol (RTMFP) will enable sites to seamlessly integrate VoIP without requiring a Skype install. The impact of this change is hard to fathom. We’re talking about a fundamental shift in what is possible on the Internet, with Flash demolishing almost all barriers to integrating P2P on any site.

Hank Williams and Om Malik have discussed the potential for Flash 10 to be used for P2P CDNs, and they’re largely right on. The biggest problem I see with P2P CDNs is oddly latency, however. While P2P theoretically enables you to choose copies of content closer to you on the network, you still have to negotiate with a server somewhere to establish the connection (for traversing NATs), nullifying the P2P advantage unless you’re talking about really big files. As Hank identifies, the sites serving large files are the CDN’s best customers, so we are talking about a significant chunk of the CDN business up for grabs. That said, CDNs could easily start running Flash Media Servers themselves with integrated RTMFP. They’ve already addressed the server locality problem, and taking advantage of Flash deployments would simply be an optimization. Whether the CDNs will realize this shift has taken place before it’s too late is another question.

To me, the really vulnerable players are the video sites themselves and anyone in the client-side VoIP space. Writing a VoIP app is now equivalent to writing your own Flash video player. All the hard stuff is already done. Same with serving videos. You no longer have to worry about setting up an infinitely scalable server cluster — you just offload everything to Flash. No more heavy lifting and no more huge bandwidth bills. In the BitTorrent case, it’s mostly a matter of usability. As with Skype, you no longer need a separate install. Depending on what’s built in to the Flash Media Server, you also no longer need to worry about complicated changes on the server side, and downloads will happen right in the browser.

The stunning engineering behind all of this should be adequately noted. The Real Time Media Flow Protocol (RTMFP) underlies all of these changes. On closer inspection, RTMFP appears to be the latest iteration of Matthew Kaufman and Michael Thornburgh’s Secure Media Flow Protocol (SMP) from Adobe’s 2006 acquisition of Amicima. Adobe appears to have acquired Amicima specifically to integrate SMP into Flash, now in the improved form of RTMFP. This is a very fast media transfer protocol built on UDP with IPSec-like security and congestion control built in. The strength of the protocol was clear to me when Matthew first posted his “preannouncement” on the p2p hackers list. Very shrewd move on Adobe’s part.

Are there any downsides? Well, RTMFP, is for now a closed if breathtakingly cool protocol, and it’s tied to Flash Media Server. That means Adobe holds all the cards, and this isn’t quite the open media platform to end all platforms. If they open up the protocol and open source implementations start emerging, however, the game’s over.

Not that I have much sympathy, but this will also shift a huge amount of traffic to ISPs, as ISPs effectively take the place of CDNs without getting paid for it. While Flash could implement the emerging P4P standards to limit the bleeding at the ISPs and to further improve performance, this will otherwise eventually result in higher bandwidth bills for consumers over the long term. No matter — I’d rather have us all pay a little more in exchange for dramatically increasing the numbers of people who can set up high bandwidth sites on the Internet. The free speech implications are too good to pass up.

Just to clear up some earlier confusion, Flash Beta 10 is not based on SIP or P2P-SIP in any way. Adobe’s SIP work has so far only seen the light of day in Adobe Pacifica, but not in the Flash Player.


Decentralized Twitter a Bad Idea

May 5, 2008

Michael Arrington’s post on creating a decentralized Twitter is theoretically interesting but practically naive.  Martin Fowler’s First Law of Distributed Computing continually rings true: don’t distribute your objects.  Why?  Because it’s hard.  In every case I’ve ever seen, it’s orders of magnitude harder to distribute a task than it is to centralize it.

Search is the quintessential example.  When I wrote the search algorithms and protocols for the second-generation Gnutella network at LimeWire, the degree of coordination between millions of network nodes was staggering.  It worked amazingly well all things considered, but it still could never compete with the speed or collaborative filtering of centralized search.  Most importantly, it could never compete with simplicity of centralization.  What took us 6 months to distribute would have taken a couple of days to centralize.  Distributed networks also make updating much harder, as you’re now forced to update every instance of your software running on any machine in the world.  It’s a pain in the a$$.  If you have a choice, centralized search wins every time.  That’s one of the reasons LittleShoot follows Fowler’s law whenever possible.  Something about working better and taking less time makes it appealing.

Distributed computing has shown itself to be particularly useful for moving around large files.  In Twitter’s case, you’re working from the opposite extreme: processing a high volume of tiny messages.  This screams centralization.

Centralization is not the reason Twitter can’t scale.  They can’t scale because, well, they just haven’t written an architecture that scales.  Granted the mobile space is still largely DIY, so they have to roll much of their own code.  That’s really a pretty lame excuse, though, especially given their resources and the time they’ve had to figure it out.  My buddies over at Mobile Commons face similar issues processing huge volumes of mobile messages, and they don’t have these issues.  I’m convinced Twitter would be flying if Ben Stein and Chris Muscarella at Mobile Commons were in charge, and I honestly think it’s because they’re just better programmers.

I’m a big Evan Williams fan and was thrilled to meet him for the first time down at SXSW, but that doesn’t mean I have faith in his or his team’s ability to learn how to scale a complex architecture overnight.  A distributed architecture would make their task orders of magnitude harder.  Don’t do it fellas.


Have A Shiner Bock with Me at SXSW

March 8, 2008

Just a quick note to all you SXSW folks out there that I’m down in my home town of Austin for the festivities, and you should track me down for a drink if you like. The energy and breadth of the conference on day 1 was inspiring. Douglas Price, Tim Ferris, Mike Cassidy, and Eve Phillips, and I all went out for a lengthy meal and great conversation at Austin’s finest, the Driskill, and I was ecstatic this morning to learn Tim’s healthy drinking tips actually work.

I’ll be firing up Eve’s Chirp via Bootcamp later today.  Their first app is a social media filter that includes a screen saver displaying photos from your various social media outlets. All you Windows users out there should give it a try.  We also had fun listening to Doug’s latest album, Mass Processor.

If you want to survive SXSW or life with the bottle generally, I second Tim’s recommendation of “Chaser”, carbon-based pills that help ward off hangovers. They work. We picked them up in a corner store on 6th St., but you should be able to find them at GNC or pharmacies.

I’ll be down here through next Wed, so come grab me or shoot me an e-mail at ‘a-at-littleshoot dot org’.  Oh, and if you’re at the conference, check out Pliny Fisk’s (aka my dad) panel on Tuesday on Visualizing Sustainability.  He always has something interesting to say.  Mike Cassidy and Tim Ferriss’s panel later today on efficiently and quickly achieving goals, from writing a #1 New York Times bestseller to building and selling a company for $500 million 2 years after the founding, should also be a hit.  Evan Williams (Blogger, Twitter) and Cali Lewis are on the panel as well.

If you’re having trouble finding me, I’ll probably be off in the corner somewhere on my laptop working on porting LittleShoot to Facebook.