Simple DB Sorting Solution from Forums

February 27, 2008
 

As I mentioned in a prior post, the lack of sorting on Amazon’s Simple DB is the one thing holding me back from database nirvana. For LittleShoot, I’m running Hibernate on top of replicated MySQL. The most annoying part about my very typical setup is data migration. Whenever I want to change a table, I have to think hard about compatibility with existing data and about what’s happening with Hibernate underneath. The whole time I’m lamenting the fact I have to think about it at all and about how I’d rather be honing my NAT traversal code.

That’s where Simple DB comes in. With Simple DB, you never have to worry about changes to tables, backups, or performance. All of these make me yearn to switch wholesale to Simple DB and to devote my time to more important pursuits.

There’s just that one snag. Sorting. The lack of it makes me want to swear. Not out of frustration with Amazon — they do an astounding job all the way around, and I couldn’t be happier with EC2 and S3. I want to swear because I have to continue spending precious brain cycles thinking about my database.

That’s why tonight’s latest wandering onto the Simple DB forum was so exciting. A user going by the name “mrtwig” has outlined steps to get sorting working. While it’s not officially supported and could change at any time because Amazon doesn’t guarantee this behavior, it still offers a glimmer of hope. I’ve reprinted the forum post below, and you can find the full thread here.

I was playing around with simpleDB trying to figure out how to do sorting. Here are my findings based on the few test cases I had:

1. The values of a multivalued attribute are already sorted in ascending order.

2. If an attribute say “views” contains the no of views, then doing a query [‘views’ > ‘0’]
returns the item numbers in ascending order of views.

3. If you want to run a query and sort it by “views” column just add “intersection [‘views’ > ‘0’] ” to the end of the query. This seems to return the result in the ascending order of the “views” column.

This seems to solve ascending part of ordering. Now for Descending:

A.) for number types have another column say “viewsDesc” and the value should be whatever is the highest positive number of the “views” column is MINUS the value in the “views” column. Amazon forces you to know the highest positive integer beforehand and prefix smaller numbers with zero’s. This actually helps in keeping the sort order. Then you can use this column to sort in the descending order.

B.) for Date types I used This format “20071024”. this format seems to be stored in the ascending order. For descending use another column and use a larger date or number say “20200000” and MINUS the date with this number. Then you have descending date order for your queries.

C.) For alphabets they are also already sorted in ascending order. For descending order you might need to know how long the values are because sorting for whole sentences is not good. I have’nt exactly figured this out yet may u can use the same principle as above. Like if u have a word “zzzzzzzzzz” find some way to invert it to “aaaaaaaaaa” or “0000000000”. Then that should give descending order.

The queries ran at the same speed with or without sorting. May be we will notice a difference when there 1000’s of rows.

Please correct me if I am wrong and these are based on few test cases with limited data.

Several other users have verified the basic technique, although again there are far from any guarantees here. I’ll be playing with this tomorrow, and it could save me a great deal of time.

Thanks mrtwig.  You’re “the man” or “the woman” depending on your gender.


LimeWire Arista RIAA Deposition Recap

February 16, 2008

So I finished my grueling 6 hour deposition an hour or so ago. Present at the deposition were Greg Bildson of LimeWire, Charles Baker (my and LimeWire’s counsel), the RIAA counsel, RIAA special advisor, Kelly Truelove, the counsel for Arista et al from Cravath, the stenographer, and the videographer. I would have liked to have released and distributed the video of the deposition on LittleShoot, Gnutella, and my web servers as a clear demonstration of non-infringing uses, but it looks like it will not be publicly released for the time being.

I fear my testimony damaged LimeWire’s case in large part due to various discussions with Mark Cuban, Jim Griffin, Serguei Osokine and others on the Pho list. Here’s a little excerpt I wrote on 10/27/06:

I believe passionately in p2p and believe it has a bright future, but I do not support the vast majority of p2p companies out there because they’re almost entirely devoted to infringement.

The Cravath lawyer highlighted this and several similar comments as indicating I think LimeWire is completely devoted to distributing infringing content. They successfully pinned me down on this point with precise “yes” and “no” questions, as in “do you have any reason to think you did not write that statement.” I don’t think LimeWire actively sought to make money from infringing content. I think LimeWire was in large part a victim of its historical time, a time when the Internet was still a baby and when users were not savvy about producing and distributing their own works. As a result, the vast majority of digital content available at the time was copyrighted, but only because that’s what the users had. YouTube was not possible then because you didn’t have a threshold of the population who would be comfortable uploading videos to servers and because bandwidth wasn’t cheap enough.

That said, LimeWire is primarily used for distributing infringing material, but it’s clearly the users distributing that material outside of the intents of the LimeWire creators, myself included. When I started working at LimeWire, we were building the Lime Peer Server and planning how Gnutella would be used to search for everything from apartment listings to cars. Despite our best efforts, those plans never came to fruition. My primary critique of LimeWire and of other p2p applications is that they didn’t think as creatively as they could have about other uses of the technology, with the exception of Skype. The conversation on Pho was in the aftermath of the YouTube sale when the potential for distributing non-infringing content was obvious. I think we could have seen that sooner at LimeWire and could have more actively pursued a p2p-enabled YouTube using DMCA protections, but that’s easy to say in retrospect.

My comments on pho were somewhat taken out of context. The Cravath lawyer succeeded in what apparently is the oldest trick in the book: put you to sleep with hours of mind-numbing questioning about the details of query routing hashes and long-forgotten forum posts before slipping in the key potentially incriminating questions just when they think your brain has turned to complete mush. By the time they got to the questions on Pho, I couldn’t remember my name let alone articulately clarify my thoughts on a forum thread from over a year ago. This prevented me from continually pointing out that the Pho forum threads were focused on the details of YouTube’s protections under the DMCA safe harbors and how they could apply to p2p.

Here’s another snippet from Pho they highlighted. I believe I wrote this in response to one of Jim Griffin’s comments:

I agree the underlying technology for LimeWire and Skype are similar. The point is that one makes all of its money off of infringing content while the other does not. You think that’s all great in the spirit of innovation. I think they should be as innovative with their businesses as they are with their technology, like Skype. You say they make money from the same source, I guess the technology. I think that’s ridiculous. There’s so much room to innovate with p2p outside of infringement that it’s mind boggling there hasn’t been more.

The key issue is that, while LimeWire clearly makes money from users’ infringement, they never intended that to be the case. It’s the content that’s infringing, not LimeWire. I simply wished we thought bigger — thought beyond the existing uses of the technology, along the lines of what Skype was able to do. That’s not to say it would have been easy, however, and that’s not to say LimeWire’s liable because they did not more vigorously pursue more creative paths.

As I emphasized continually in the deposition, we were always creating a generalized tool for media distribution. It was a tool for dynamically searching millions of computers for any type of content. We worked with universities around the world, particularly the Stanford Peers Group, on creating the most efficient algorithms for distributed search. Our competitors included Google and Yahoo as much as they did Kazaa, a point the Cravath lawyer failed to fully appreciate or take seriously, even though I could not have been more serious.
If you’re giving a deposition any time soon, my advice is to continually stay on your toes and to watch out for the ol’ put you to sleep with the most boring questions you can possibly imagine trick. It’s a trap.

Hopefully in the long run the First Amendment will matter more than making sure the record industry has plenty of cash to pay the most expensive lawyers in the business to help line their pockets.


LimeWire Arista RIAA Deposition Prep Today

February 14, 2008

I’m heading up to Midtown in about 5 minutes to meet with the LimeWire legal team to prepare for tomorrow’s deposition in the LimeWire case (Arista v. LimeWire). The discovery phase has taken forever, so oral arguments have yet to be heard. I have no idea what to expect, as I’ve never given a deposition before. As I understand it, though, the intent of the deposition will be for the prosecuting attorneys to ask as thorough questions as possible to dig up any potential new angle against my former employer, LimeWire. I’m of course bound to tell the truth, so the more thorough their questioning, the more advantageous it is for them.

That said, there’s not much to hide. As I’ve always maintained, we intended LimeWire as a general tool for distributing information. We always wished users did not download and share infringing, but that soon became the dominant use.

The more tragic story is this: p2p is the most efficient and cost effective way to distribute large files around the Internet. If you have no money and you want to be heard, p2p is your best option. Sure, companies like Joost and Move do a great job distributing video, but they do it for the Viacoms and ABCs of the world. If you have no money, you’re out of luck, accelerating the concentration of power in the hands of the wealthy few. Instead of talking about the democratizing potential of p2p, we’re in court discussing how much one rich guy should pay another rich guy.

That said, we cannot truthfully claim that LimeWire in practice has been a tool for massive democratization of information exchange. I think that’s partially due to the lack of emphasis on publishing original content. There’s more focus on media consumption than on media production. That’s one of the many shortcomings in LimeWire and p2p generally we’re addressing with LittleShoot, in addition to a completely different legal approach using the DMCA safe harbors.

To learn more about the LimeWire case, this is a thorough if somewhat legally technical description.


Atlassian JIRA — Automating the Standalone Install on MySQL

February 2, 2008

I’ve started thinking of my coding wanderings as akin to Alice’s rabbit holes — magical new places I play around in for probably a little too long. Automating sysadmin-type work with shell scripts has become my latest rabbit hole. Quickly running new services on Amazon’s EC2 is my inspiration.

So, this is the first little snippet, a simple initial building block that will become a part of larger scripts down the road. For those who don’t know, Atlassian has started giving away free licenses to all of their products to open source projects, so this gives you access to JIRA, Bamboo, Confluence, FishEye, Clover, Crowd, etc. These tools are amazingly useful and are all the best or amongst the best at what they do. Check out the Atlassian web site for more info.

This script automates the two trickiest parts of installing JIRA:

  1. Connecting to your database. In this case we connect to MySQL.
  2. Customizing the port to run JIRA on.

In the first instance, the script automatically downloads the MySQL JDBC driver, creates the JIRA database, and configures the JIRA user name and password for MySQL. The port customization is something you frequently want because so much runs on 8080 by default. These tasks are more annoying than tricky, but this script makes them a breeze.

Prerequisites:

  1. MySQL already running on the default port.
  2. You need to know your MySQL root password. The script will use it to create the JIRA database and to set permissions for the JIRA MySQL user.
  3. A downloaded version of JIRA standalone from the Atlassian web site. This will be a file called atlassian-jira-VERSION-standalone.tar.gz. The script just looks for a file starting with “atlassian-jira” and ending in “tar.gz” in the current directory.
  4. Java installed with JAVA_HOME set.

Future scripts will also include automated installing and configuring of MySQL as well as Java, but for now you need them configured ahead of time. I chose to run JIRA standalone because in my experience getting the separate wars to play nicely with my existing wars was tricky. In particular, some of the Atlassian war files take awhile to start up and don’t shut down as cleanly as they should. Using the standalone versions insures they won’t interfere with your other webapps.

When you have JIRA downloaded, MySQL running, and Java configured, go ahead and download the script from the LittleShoot web site.

Here’s all you need to run:

./jira.bash

The script will guide you through the process of configuring and running JIRA, and it should be really self-explanatory. When the script is done, you’ll still need to run through JIRA’s configuration procedure within the browser, but the script has taken care of the hard part.

If you need to install JIRA from another script, you can also run something like the following, modifying it for your values of course.

./jira.bash jirauser jirapwd yourMySql_root_password adamfisk

The last argument is the user name of the user on the system who should own the jira directory.

Below is the full script.

#!/usr/bin/env bash
#
# This script performs all the JIRA configuration and setup for running
# JIRA on MYSQL.  This includes creating the JIRA database and creating
# a user on the database.
#
# If no arguments are passed to the script, it prompts you for the
# data it needs.  Otherwise, you must pass all the required data on the
# command line.  This makes it easier to incorporate this script into
# other scripts if desired.
#
# If you decide to pass in arguments, they are (in order):
#
# 1) The name of the new jira user in the database.
# 2) The password of the new jira user in the database.
# 3) Your MYSQL root password to create the JIRA database.
# 4) The user account to install JIRA under.  This account should
#    already exist on the system.
#
# To run this script:
#
# YOU MUST HAVE DOWNLOADED JIRA STANDALONE INTO YOUR CURRENT DIRECTORY
#
# That file should be the downloaded copy of JIRA standalone.
#
# If you have any problems, please see the excellent guide at:
# http://confluence.atlassian.com/display/JIRA/Setting+up+JIRA+Standalone+and+MySQL+on+Linux
#

function die
{
echo $1
exit 1
}

ls ./atlassian-jira-*.tar.gz > /dev/null || die "The Atlassian JIRA tar.gz file must be in the current directory.  Have you successfully downloaded JIRA standalone?"

netstat -na | grep 3306 > /dev/null || die "MySQL does not appear to be running on port 3306.  JIRA cannot be installed without MySQL running"

function askUser
{
echo "Please enter your JIRA database user name:"
read JIRA_USER_NAME

echo "Please enter your JIRA database password:"
read JIRA_PWD

echo "Please enter your MySQL root password:"
read MYSQL_ROOT_PWD

echo "What's the name of the user account on this machine you'd like to install JIRA under?"
read USER_ACCOUNT
}

ARGS=4
if [ $# -ne "$ARGS" ]
then
    if [ $# -ne "0" ]
    then
        echo "Usage: jira.bash jira_mysql_user_name jira_mysql_password mysql_root_password user_account"
        echo "You can also just run ./jira.bash to have the script guide you through the setup process."
        die
    else
        askUser
    fi
else
    JIRA_USER_NAME=$1
    JIRA_PWD=$2
    MYSQL_ROOT_PWD=$3
    USER_ACCOUNT=$4
fi

echo "............................................................"
echo "  Hello $USER, let's start setting up JIRA standalone."
echo "............................................................"

function modifyPort
{
  echo "What port would you like to use for JIRA?  The default is 8080."
  read CUSTOM_PORT
  echo "What shutdown port would you like to use for JIRA?  The default is 8005."
  read CUSTOM_SHUTDOWN_PORT
  echo "OK, got it.  Proceeding with install."
}

echo "Would you like to change the port JIRA runs on from the default of 8080? [y/n]"
read CHANGE_PORT
case $CHANGE_PORT in
y)
  modifyPort || die "Could not modify port"
  ;;
Y)
  modifyPort || die "Could not modify port"
  ;;
*)
  echo "OK, using default port of 8080.  Proceeding with install."
  CUSTOM_PORT=8080
  CUSTOM_SHUTDOWN_PORT=8005
  ;;
esac

function installJira
{
echo "Expanding `ls ./atlassian-jira-*.tar.gz`..."
tar xzf `ls ./atlassian-jira-*.tar.gz` || die "Could not open jira tgz file.  Aborting."

# Add a symbolic link to whichever version of JIRA we're running.
ln -s `ls | grep atlassian-jira-` jira

echo "Downloading MYSQL JDBC connector..."

# Somewhat bad to hard code this, but I don't think JIRA users alone will have much of an impact on this server.
curl -o mysqlj.tgz http://mirrors.24-7-solutions.net/pub/mysql/Downloads/Connector-J/mysql-connector-java-5.1.5.tar.gz
tar xzf mysqlj.tgz
mv mysql-connector-java-5.1.5/mysql-connector-java-5.1.5-bin.jar jira/common/lib || die "Could not move myql jdbc jar"

echo "Customizing server.xml..."
cp jira/conf/server.xml jira/server.xml.copy
perl -pi -e s/Server\ port=\"8005\"/Server\ port=\"$CUSTOM_SHUTDOWN_PORT\"/g jira/conf/server.xml || die "Could not set shutdown port"
perl -pi -e s/Connector\ port=\"8080\"/Connector\ port=\"$CUSTOM_PORT\"/g jira/conf/server.xml || die "Could not set JIRA port"
perl -pi -e s/username=\"sa\"/username=\"$JIRA_USER_NAME\"/g jira/conf/server.xml || die "Could not modify jira user name"
perl -pi -e s/password=\"\"/password=\"$JIRA_PWD\"/g jira/conf/server.xml || die "Could not modify jira password"
perl -pi -e s/driverClassName=\"org.hsqldb.jdbcDriver/driverClassName=\"com.mysql.jdbc.Driver/g jira/conf/server.xml
perl -pi -e s/jdbc:hsqldb:\\$\{catalina.home\}\\/database\\/jiradb\"/jdbc:mysql:\\/\\/localhost\\/jiradb?autoReconnect\=true\&\;useUnicode\=true\&\;characterEncoding\=UTF8\"\\/\>/g jira/conf/server.xml || die "Could not set jdbc"
perl -pi -e s/minEvictableIdleTimeMillis\=/\/\"20\"\ \\/\>--\>/g jira/conf/server.xml || die "Could not finish comment"

echo "Customizing entityengine.xml..."
cp jira/atlassian-jira/WEB-INF/classes/entityengine.xml jira/entityengine.xml.copy || die "Could not make entityengine backup"
cp jira/atlassian-jira/WEB-INF/classes/entityengine.xml . || die "Could not copy entityengine to current directory"

perl -pi -e s/name=\"defaultDS\"\ field-type-name=\"hsql\"/name=\"defaultDS\"\ field-type-name=\"mysql\"/g entityengine.xml || die "Could not set entityengine database to MYSQL"
perl -pi -e s/schema-name=\"PUBLIC\"//g entityengine.xml || die "Could not remove public schema from entiry engine"

mv entityengine.xml jira/atlassian-jira/WEB-INF/classes/ || die "Could not move entity engine"

chown -R $USER_ACCOUNT jira || die "Could not set permissions to specified user: $USER_ACCOUNT"

cat < jira.sql
create database if not exists jiradb character set utf8;
GRANT ALL PRIVILEGES ON jiradb.* TO '$JIRA_USER_NAME'@'localhost'
IDENTIFIED BY '$JIRA_PWD' WITH GRANT OPTION;
flush privileges;
EOL
mysql -uroot -p$MYSQL_ROOT_PWD < jira.sql || die "Could not set up database for JIRA.  Is your root password correct?"
echo "Starting JIRA on port $CUSTOM_PORT..."
./jira/bin/startup.sh || die "Could not start JIRA"

echo ""
echo "-----------------------------------------------------------------------------------------------------------------"
echo "  Great, JIRA's starting up.  You should be able to access it momentarily on port $CUSTOM_PORT on this machine."
echo "-----------------------------------------------------------------------------------------------------------------"
}

installJira

exit 0