Google App Engine Full Text Search From App Engine Patch Team

July 1, 2009

Waldemar Kornewald and the rest of the Google App Engine Patch team have just released the first full text search implementation for Google App Engine (GAE).  LittleShoot has been using App Engine Patch since the early days of App Engine, and we can’t recommend it highly enough.  Its seamless Django integration has saved us countless hours, and features like its tool for combining and compressing all of your JavaScript and CSS using the YUI compressor are just stellar.

While we don’t have an immediate need for full text search over at LittleShoot, we can tell you the team behind GAE full text search is rock solid and battle tested.  Given the various limitations of the GAE datastore API, it’s also quite a technical feat.

If you need full text search, and you’re running on App Engine using python, GAE search will save you a great deal of pain.  Go get it.


Django and More on Google App Engine with App Engine Patch

November 16, 2008

I’ve recently had the chance to play with Waldemar Kornewald’s “Google App Engine Patch” and have come away very impressed. All LittleShoot Google App Engine (GAE) projects now run on it for a couple of simple reasons:

  1. Seamless Django integration (including 1.0.1)
  2. Thorough documentation
  3. Healthy open source development community with an excellent steward in Kornewald and frequent new releases

The Django integration got me first. Almost everything works, such as manage.py, Django authentication, Django testing (the original reason I switched), etc. There are also lots of other goodies in there, like support for boto’s SQS module. If you’re unfamiliar with it, boto SQS allows you to call Amazon’s Simple Queue Service (SQS) from Python. That’s a huge step in getting around GAE’s limitation on longer lived, CPU-intensive tasks. Just queue it up in SQS, and your GAE app will keep humming along fine — cloud integration at its finest.

To get started with App Engine Patch, download the zip file from the home page and work off the sample project. The download includes App Engine Patch itself as well as the sample project.  

With Django App Engine Helper not supporting Django 1.0 and seemingly inactive, App Engine Patch is a godsend and gets a huge thumbs up.

Great work Waldemar, and don’t forget to donate.


Dojo 1.2 and Django 1.0 on Google App Engine 1.1.3

September 17, 2008

Important Update: Use Google App Engine Patch for Django integration instead. It integrates Django seamlessly and includes lots of other goodies. For Dojo, also consider using xdomain and loading Dojo from Google’s AJAX Libraries API. For Google App Engine Patch, see my more recent post. This area changes quickly, so check the dates on any blog posts you’re looking at!

With the release of Google App Engine 1.1.3 (GAE), it’s now much more realistic to serve up Dojo on GAE. LittleShoot uses a lot of Dojo (mostly because it rocks), so getting Dojo up and running on our new GAE LittleShoot port has been a high priority.  

The problem is that vanilla Dojo quickly runs up against GAE’s 1000 file limit per application. Django has the same problem. Our initial solution was to run a custom build that would delete a lot of unnecessary files, with thanks to Peter Higgins from SitePen for lighting our path. This got our build down to about 700 files, solving the problem for the basic LittleShoot site.

We’re getting ready to release the LittleShoot platform, however, that allows any site to detect if LittleShoot’s installed and to call the LittleShoot “P2P 2.0″ API if it’s available. That means we need external sites to be able to load our JavaScript, i.e. we need to support Dojo’s cross-domain builds. The cross-domain build adds about another 500 files, pushing us again over the 1000 file limit.

Thankfully, GAE 1.1.3 introduces “zipserve,” allowing you to serve static files directly from a zip file. Guido van Rossum deployed a similar technique for serving Django for his fantastic Rietveld code review tool and GAE demo app, but that only works for loading python code. So we copied Guido for loading Django 1.0, and now zipserve allows us to do more or less the same thing for Dojo.  

To serve Dojo using zipserve, you basically modify your app.yaml to include something like the following:

- url: /dojo/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

- url: /dijit/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

- url: /dojox/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

 

This will look for dojo.zip, dijit.zip, and dojox.zip files in your *top-level* directory, the same directory containing app.yaml. You need to separate the zip files to avoid running up against the GAE limit on file sizes (1MB I believe). The zip file name basically acts like a directory in App Engine’s URL resolution. So, when a request goes to http://beta.littleshoot.org/dojo/dojo.js, for example, it loads dojo.js from dojo.zip (not from the dojo directory within dojo.zip). Here’s our script for generating the zips:

#!/usr/bin/env bash

function die()
{
  echo $*
  exit 1
}

cd static/js
dirs=”dojo dijit dojox littleshoot”

for x in $dirs
do
  cd $x || die “Could not cd to $x”
  echo “Building zip for $x”
  zf=../../../$x.zip
  rm $zf
  zip -rv $zf * || die “Could not create zip”
  cd ..
  # We actually delete all the contents because we don’t want to 
  # include them in GAE.
  rm -rf $x
done

exit 0

Even with separating the zip files, dijit and dojox will still come close to the 1MB limit on file sizes.  We use rsync excludes to get rid of the files we don’t need.  Here’s our bash rsync excludes function, which you can customize to your liking (careful not to exclude anything you need!!):

function excludeRsync
{
  rsync –exclude .svn/ \
        –exclude test/ \
        –exclude tests/ \
        –exclude demo/ \
        –exclude demos/ \
        –exclude soria/ \
        –exclude nihilo/ \
        –exclude grid/ \
        –exclude charting/ \
        –exclude util/ \
        –exclude analytics/ \
        –exclude collections/ \
        –exclude README* \
        –exclude *.psd \
        –exclude *.uncompressed.js \
        –exclude *.commented.css \
        –exclude dijit/templates \
        –exclude dijit/form/templates \
        –exclude dijit/layout/templates \
        –exclude *silverlight* \
        –exclude gfx3d/ \
        –exclude dojo/_base/ \
        –exclude dojo/_base.js \
        –exclude dojo/build.txt \
        –exclude functional/ \
        –exclude off/ \
        –exclude presentation/ \
        –exclude sketch/ \
        –exclude storage/ \
        –exclude wire/ \
        –exclude data/ \
        –exclude dtl/ \

        -avz $tmpDir/js $releaseDir || die “Could not sync”

}

We call the rsync script after running our dojo build.  The dojo build puts everything in the “$tmpDir/js” directory listed on the last line of the rsync.

OK, so there are still some hoops to jump through, but it works!  On the Django 1.0 side, I’d highly recommend copying Guido’s Rietveld settings.  The Google App Engine Django Helper is also useful, but Django 1.0 support hasn’t been released as of this writing.  We’re using Guido’s settings for now — basically the Makefile, make_release.sh, settings.py, and main.py.  Oh, I also tweaked his rietveld.py script, turning it into littleshoot.py.  You can find all of these files in the Rietveld svn here.  Most of the tweaks to these files are pretty obvious when you glance at the scripts — nothing super complicated going on.  Not that it’s a breeze, but each files is individually fairly straightforward.  The scripts will also create a zipped version of django from the django directory, so using svn:externals (described below) is handy.

We load both Django and Dojo with svn:externals. You might be on information overload at this point, but here’s another little nugget if you’re still with me. You can run the following in the respective directories where you want Django and Dojo to reside.

svn –editor-cmd=vim pe svn:externals .

For Dojo, enter the following:

dojo http://svn.dojotoolkit.org/src/tags/release-1.2.0b2/

For Django, try:

django http://code.djangoproject.com/svn/django/trunk/django

That will put Dojo 1.2 beta 2 in the “dojo” directory and the Django trunk in the “django” directory when you run “svn up”.  

If you successfully navigate through all of that, the end result is the latest Dojo and the latest Django running on the spanking new Google App Engine 1.1.3. Web app setups just don’t get any sweeter. If you’re a total geek like me, that’s living the good life! Sad, I know…


Amazon Web Services vs. Google App Engine: The Race to the One-Click Cloud

August 27, 2008
One-Click Shopping

Can Amazon Build the One-Click Cloud?

It’s a great time to program for the cloud, no matter what Ted Dziuba’s entertaining but barely coherent rants have to say (will someone get that guy some experience?). Amazon and Google are going toe-to-toe, with Amazon’s addition of sorting in Simple DB bringing it up to par with Google App Engine’s Datastore API. Sorting was the biggest missing piece in Simple DB and the most compelling reason to choose the Datastore API instead. No longer.  

But Google App Engine (GAE) and the Datastore API still win. Here’s why:

  1. The Datastore API is projected to be 10x cheaper. $0.15-$0.18 per GB-month sounds a lot better than Simple DB’s $1.50 per GB-month.
  2. GQL. GAE’s SQL subset is just brain dead simple. As adept as programmers are at learning new frameworks, it’s nice to have something brain dead every once in awhile. Simple DB takes a few more cycles to learn (brain cycles that is — more coffee and such. Modafinil perhaps? Anyone tried it? I’m curious).
  3. GAE has better Object Relational Mapping (ORM). GAE basically uses Django’s sweet ORM system. You’ve got to jump through a lot more hoops to get something as nice with Simple DB. 
  4. GAE automatically scales the web application, not just the database. With Amazon, you have to add load balancing and bring machines up and down yourself, even if you’re using Simple DB. While there are third-party tools to help, they’re not built-in. Again, GAE is brain dead here.  

Sure, App Engine only supports Python. The ultimate question, though, is what functionality can you get in the end? For web apps, App Engine gives you more, particularly for scaling (which is kind of the whole point). Don’t know Python? Learn it. It will save you time in the end. Instead of endlessly fiddling with your load balancer and custom scripts for bringing instances up and down, you’ll spend your time adding the next killer feature your users will love.

In the end, the Amazon/Google “main event” is a huge win for you, me, and our users. The sorting announcement from Amazon comes on the heals of a flurry of other new features from both companies, including Amazon’s impressive persistent storage addition for EC2 called the Elastic Block Store, querying by attributes on Simple DB, GAE’s support for 10 applications per user instead of 3, GAE’s batch writes, etc. Neither one is pulling any punches, and the tools at our disposal as developers are progressing at a breathtaking pace as a result.

Amazon’s is clearly the more complete offering (you can do anything on it, in any language), but it needs to learn from Google’s focus on the dominant deployment scenarios.  Amazon could easily win if it does the following:

  1. Makes Simple DB pricing competitive with Google’s projected prices.
  2. Adds a query language for Simple DB along the lines of GQL.
  3. Adds automatic scaling for web applications, not just the database.
  4. Offers complete deployment solutions for the dominant web applications frameworks, from Tomcat/Spring/Hibernate to Django and Zend, with ORM models already adapted to Simple DB, instances automatically replicated with traffic, etc. Basically the same thing as App Engine for more web app frameworks than App Engine supports and adapted to the Amazon platform. Sure, there are third-party solutions for some of this stuff, but those will never be trusted as much as something offered directly from Amazon.

I’m a big fan of Amazon and Werner Vogels (one of the most innovative people in the industry, and also apparently a pretty nice guy), but Amazon desperately needs to learn from what Google has done. It’s ultimately a question of “usability” for developers. The originators of “one-click shopping” are losing in the game they practically invented. 

Amazon needs to turn on the one-click cloud.


Follow

Get every new post delivered to your Inbox.