As I mentioned in a prior post, the lack of sorting on Amazon’s Simple DB is the one thing holding me back from database nirvana. For LittleShoot, I’m running Hibernate on top of replicated MySQL. The most annoying part about my very typical setup is data migration. Whenever I want to change a table, I have to think hard about compatibility with existing data and about what’s happening with Hibernate underneath. The whole time I’m lamenting the fact I have to think about it at all and about how I’d rather be honing my NAT traversal code.
That’s where Simple DB comes in. With Simple DB, you never have to worry about changes to tables, backups, or performance. All of these make me yearn to switch wholesale to Simple DB and to devote my time to more important pursuits.
There’s just that one snag. Sorting. The lack of it makes me want to swear. Not out of frustration with Amazon — they do an astounding job all the way around, and I couldn’t be happier with EC2 and S3. I want to swear because I have to continue spending precious brain cycles thinking about my database.
That’s why tonight’s latest wandering onto the Simple DB forum was so exciting. A user going by the name “mrtwig” has outlined steps to get sorting working. While it’s not officially supported and could change at any time because Amazon doesn’t guarantee this behavior, it still offers a glimmer of hope. I’ve reprinted the forum post below, and you can find the full thread here.
I was playing around with simpleDB trying to figure out how to do sorting. Here are my findings based on the few test cases I had:
1. The values of a multivalued attribute are already sorted in ascending order.
2. If an attribute say “views” contains the no of views, then doing a query ['views' > '0']
returns the item numbers in ascending order of views.
3. If you want to run a query and sort it by “views” column just add “intersection ['views' > '0'] ” to the end of the query. This seems to return the result in the ascending order of the “views” column.
This seems to solve ascending part of ordering. Now for Descending:
A.) for number types have another column say “viewsDesc” and the value should be whatever is the highest positive number of the “views” column is MINUS the value in the “views” column. Amazon forces you to know the highest positive integer beforehand and prefix smaller numbers with zero’s. This actually helps in keeping the sort order. Then you can use this column to sort in the descending order.
B.) for Date types I used This format “20071024″. this format seems to be stored in the ascending order. For descending use another column and use a larger date or number say “20200000″ and MINUS the date with this number. Then you have descending date order for your queries.
C.) For alphabets they are also already sorted in ascending order. For descending order you might need to know how long the values are because sorting for whole sentences is not good. I have’nt exactly figured this out yet may u can use the same principle as above. Like if u have a word “zzzzzzzzzz” find some way to invert it to “aaaaaaaaaa” or “0000000000″. Then that should give descending order.
The queries ran at the same speed with or without sorting. May be we will notice a difference when there 1000′s of rows.
Please correct me if I am wrong and these are based on few test cases with limited data.
Several other users have verified the basic technique, although again there are far from any guarantees here. I’ll be playing with this tomorrow, and it could save me a great deal of time.
Thanks mrtwig. You’re “the man” or “the woman” depending on your gender.