Hadoop on EC2 with OpenSolaris

October 2, 2008

The OpenSolaris crew just announced you can run Hadoop AMIs on EC2 running on top of OpenSolaris. That’s just cool. I’m still not ready to abandon my Django code running on App Engine (I’ve got a post coming up on the stellar new update to Google App Engine Patch, by the way), but I’d love to play with it. Anyone else given it a go?

You can run it with:

ec2-run-instances –url https://ec2.amazonaws.com ami-2bdd3942 -k <your-keypair-name>

You can get more info on running OpenSolaris on EC2 here.


Where Google App Engine Spanks Amazon’s Web Services: S3, EC2, Simple DB, SQS

May 28, 2008

First off, I loooove Amazon Web Services (AWS), and we make heavy use of S3, EC2, Simple DB, and Elastic IPs for LittleShoot. We run everything on Amazon, but that’s about to change.  I’ve been in the App Engine beta for about a month, and despite all of the astonishing ways AWS makes building web sites easy, Google App Engine makes it far easier. Here’s why:

1) Google App Engine Has Better Scalability  

Google migrates your application across its infrastructure automatically as needed. With EC2, you have to manually detect machine load and bring instances up or down accordingly. You need to set up load balancing and clustering. While Amazon gives you far more control, it’s also far more work.  With Google App Engine, it’s all done for you.

2) Google App Engine Has a Better Database

Google App Engine’s Big Table blows Amazon’s Simple DB out of the water, and that’s coming from a big fan of Simple DB and Amazon CTO Werner Vogels. Simple DB thankfully yanks developers out of the relational database mindset and automatically replicates data across machines for scalability. You do have to learn a completely new query syntax, however, and, as this blog has noted, sorting is not officially supported. Simple DB is also still in beta.  

With App Engine, you’re using the same database, Big Table, Google engineers use to power some of the busiest sites on the Internet. Billions of queries have been hammering out kinks in Big Table for years. You know it will scale.  What’s more, App Engine’s “GQL” gives developers a familiar SQL-like syntax, lowering the learning curve compared to Simple DB.  Big Table also supports sorting.  Perhaps most significantly, Simple DB costs far more. While Google’s final pricing announcement later this year may change, today’s announcement didn’t mention any difference in price for data stored in the database versus anywhere else. On Simple DB, that data costs $1.50 per GB-month. On App Engine, it appears to cost $0.15 – $0.18 per GB-month. Wow.

3) Google App Engine is Cheaper 

Beyond the database, App Engine gives you the first 5 million or so page views per month for free.  That’s a lot of page views. It doesn’t put you up with the Internet’s top dogs, of course, but at 5 million page views you should be making cash. App Engine is free precisely when you’re building your company and keeping costs low is the most important. If you go beyond that 5 million, Google’s I/O event today will reveal newly announced prices that are remarkably similar to Amazon’s current offerings. They both price everything per GB or CPU-hour, and the numbers are barely distinguishable. That first 5 million page views and the apparent huge disparity in database storage pricing are by far the biggest differentiators, both dramatically tipping the scales in favor of Google.

Conclusion

For building scalable web applications quickly, App Engine beats AWS by a surprisingly wide margin.  Note, however, this refers specifically to web applications. For anything custom, you need Amazon. Because App Engine only supports Python, you also need Amazon for running any non-Python code. While this is a significant difference, many good developers are facile with multiple languages and can move rapidly between them.  Amazon’s flexibility makes it win out for many applications, but not for the most common application there is: web sites.  App Engine is more of a “domain-specific cloud” for web applications, but it’s shockingly good at what it does.  

Oh yeah, and it’s cheap.


Atlassian JIRA — Automating the Standalone Install on MySQL

February 2, 2008

I’ve started thinking of my coding wanderings as akin to Alice’s rabbit holes — magical new places I play around in for probably a little too long. Automating sysadmin-type work with shell scripts has become my latest rabbit hole. Quickly running new services on Amazon’s EC2 is my inspiration.

So, this is the first little snippet, a simple initial building block that will become a part of larger scripts down the road. For those who don’t know, Atlassian has started giving away free licenses to all of their products to open source projects, so this gives you access to JIRA, Bamboo, Confluence, FishEye, Clover, Crowd, etc. These tools are amazingly useful and are all the best or amongst the best at what they do. Check out the Atlassian web site for more info.

This script automates the two trickiest parts of installing JIRA:

  1. Connecting to your database. In this case we connect to MySQL.
  2. Customizing the port to run JIRA on.

In the first instance, the script automatically downloads the MySQL JDBC driver, creates the JIRA database, and configures the JIRA user name and password for MySQL. The port customization is something you frequently want because so much runs on 8080 by default. These tasks are more annoying than tricky, but this script makes them a breeze.

Prerequisites:

  1. MySQL already running on the default port.
  2. You need to know your MySQL root password. The script will use it to create the JIRA database and to set permissions for the JIRA MySQL user.
  3. A downloaded version of JIRA standalone from the Atlassian web site. This will be a file called atlassian-jira-VERSION-standalone.tar.gz. The script just looks for a file starting with “atlassian-jira” and ending in “tar.gz” in the current directory.
  4. Java installed with JAVA_HOME set.

Future scripts will also include automated installing and configuring of MySQL as well as Java, but for now you need them configured ahead of time. I chose to run JIRA standalone because in my experience getting the separate wars to play nicely with my existing wars was tricky. In particular, some of the Atlassian war files take awhile to start up and don’t shut down as cleanly as they should. Using the standalone versions insures they won’t interfere with your other webapps.

When you have JIRA downloaded, MySQL running, and Java configured, go ahead and download the script from the LittleShoot web site.

Here’s all you need to run:

./jira.bash

The script will guide you through the process of configuring and running JIRA, and it should be really self-explanatory. When the script is done, you’ll still need to run through JIRA’s configuration procedure within the browser, but the script has taken care of the hard part.

If you need to install JIRA from another script, you can also run something like the following, modifying it for your values of course.

./jira.bash jirauser jirapwd yourMySql_root_password adamfisk

The last argument is the user name of the user on the system who should own the jira directory.

Below is the full script.

#!/usr/bin/env bash
#
# This script performs all the JIRA configuration and setup for running
# JIRA on MYSQL.  This includes creating the JIRA database and creating
# a user on the database.
#
# If no arguments are passed to the script, it prompts you for the
# data it needs.  Otherwise, you must pass all the required data on the
# command line.  This makes it easier to incorporate this script into
# other scripts if desired.
#
# If you decide to pass in arguments, they are (in order):
#
# 1) The name of the new jira user in the database.
# 2) The password of the new jira user in the database.
# 3) Your MYSQL root password to create the JIRA database.
# 4) The user account to install JIRA under.  This account should
#    already exist on the system.
#
# To run this script:
#
# YOU MUST HAVE DOWNLOADED JIRA STANDALONE INTO YOUR CURRENT DIRECTORY
#
# That file should be the downloaded copy of JIRA standalone.
#
# If you have any problems, please see the excellent guide at:
# http://confluence.atlassian.com/display/JIRA/Setting+up+JIRA+Standalone+and+MySQL+on+Linux
#

function die
{
echo $1
exit 1
}

ls ./atlassian-jira-*.tar.gz > /dev/null || die "The Atlassian JIRA tar.gz file must be in the current directory.  Have you successfully downloaded JIRA standalone?"

netstat -na | grep 3306 > /dev/null || die "MySQL does not appear to be running on port 3306.  JIRA cannot be installed without MySQL running"

function askUser
{
echo "Please enter your JIRA database user name:"
read JIRA_USER_NAME

echo "Please enter your JIRA database password:"
read JIRA_PWD

echo "Please enter your MySQL root password:"
read MYSQL_ROOT_PWD

echo "What's the name of the user account on this machine you'd like to install JIRA under?"
read USER_ACCOUNT
}

ARGS=4
if [ $# -ne "$ARGS" ]
then
    if [ $# -ne "0" ]
    then
        echo "Usage: jira.bash jira_mysql_user_name jira_mysql_password mysql_root_password user_account"
        echo "You can also just run ./jira.bash to have the script guide you through the setup process."
        die
    else
        askUser
    fi
else
    JIRA_USER_NAME=$1
    JIRA_PWD=$2
    MYSQL_ROOT_PWD=$3
    USER_ACCOUNT=$4
fi

echo "............................................................"
echo "  Hello $USER, let's start setting up JIRA standalone."
echo "............................................................"

function modifyPort
{
  echo "What port would you like to use for JIRA?  The default is 8080."
  read CUSTOM_PORT
  echo "What shutdown port would you like to use for JIRA?  The default is 8005."
  read CUSTOM_SHUTDOWN_PORT
  echo "OK, got it.  Proceeding with install."
}

echo "Would you like to change the port JIRA runs on from the default of 8080? [y/n]"
read CHANGE_PORT
case $CHANGE_PORT in
y)
  modifyPort || die "Could not modify port"
  ;;
Y)
  modifyPort || die "Could not modify port"
  ;;
*)
  echo "OK, using default port of 8080.  Proceeding with install."
  CUSTOM_PORT=8080
  CUSTOM_SHUTDOWN_PORT=8005
  ;;
esac

function installJira
{
echo "Expanding `ls ./atlassian-jira-*.tar.gz`..."
tar xzf `ls ./atlassian-jira-*.tar.gz` || die "Could not open jira tgz file.  Aborting."

# Add a symbolic link to whichever version of JIRA we're running.
ln -s `ls | grep atlassian-jira-` jira

echo "Downloading MYSQL JDBC connector..."

# Somewhat bad to hard code this, but I don't think JIRA users alone will have much of an impact on this server.
curl -o mysqlj.tgz http://mirrors.24-7-solutions.net/pub/mysql/Downloads/Connector-J/mysql-connector-java-5.1.5.tar.gz
tar xzf mysqlj.tgz
mv mysql-connector-java-5.1.5/mysql-connector-java-5.1.5-bin.jar jira/common/lib || die "Could not move myql jdbc jar"

echo "Customizing server.xml..."
cp jira/conf/server.xml jira/server.xml.copy
perl -pi -e s/Server\ port=\"8005\"/Server\ port=\"$CUSTOM_SHUTDOWN_PORT\"/g jira/conf/server.xml || die "Could not set shutdown port"
perl -pi -e s/Connector\ port=\"8080\"/Connector\ port=\"$CUSTOM_PORT\"/g jira/conf/server.xml || die "Could not set JIRA port"
perl -pi -e s/username=\"sa\"/username=\"$JIRA_USER_NAME\"/g jira/conf/server.xml || die "Could not modify jira user name"
perl -pi -e s/password=\"\"/password=\"$JIRA_PWD\"/g jira/conf/server.xml || die "Could not modify jira password"
perl -pi -e s/driverClassName=\"org.hsqldb.jdbcDriver/driverClassName=\"com.mysql.jdbc.Driver/g jira/conf/server.xml
perl -pi -e s/jdbc:hsqldb:\\$\{catalina.home\}\\/database\\/jiradb\"/jdbc:mysql:\\/\\/localhost\\/jiradb?autoReconnect\=true\&\;useUnicode\=true\&\;characterEncoding\=UTF8\"\\/\>/g jira/conf/server.xml || die "Could not set jdbc"
perl -pi -e s/minEvictableIdleTimeMillis\=/\/\"20\"\ \\/\>--\>/g jira/conf/server.xml || die "Could not finish comment"

echo "Customizing entityengine.xml..."
cp jira/atlassian-jira/WEB-INF/classes/entityengine.xml jira/entityengine.xml.copy || die "Could not make entityengine backup"
cp jira/atlassian-jira/WEB-INF/classes/entityengine.xml . || die "Could not copy entityengine to current directory"

perl -pi -e s/name=\"defaultDS\"\ field-type-name=\"hsql\"/name=\"defaultDS\"\ field-type-name=\"mysql\"/g entityengine.xml || die "Could not set entityengine database to MYSQL"
perl -pi -e s/schema-name=\"PUBLIC\"//g entityengine.xml || die "Could not remove public schema from entiry engine"

mv entityengine.xml jira/atlassian-jira/WEB-INF/classes/ || die "Could not move entity engine"

chown -R $USER_ACCOUNT jira || die "Could not set permissions to specified user: $USER_ACCOUNT"

cat < jira.sql
create database if not exists jiradb character set utf8;
GRANT ALL PRIVILEGES ON jiradb.* TO '$JIRA_USER_NAME'@'localhost'
IDENTIFIED BY '$JIRA_PWD' WITH GRANT OPTION;
flush privileges;
EOL
mysql -uroot -p$MYSQL_ROOT_PWD < jira.sql || die "Could not set up database for JIRA.  Is your root password correct?"
echo "Starting JIRA on port $CUSTOM_PORT..."
./jira/bin/startup.sh || die "Could not start JIRA"

echo ""
echo "-----------------------------------------------------------------------------------------------------------------"
echo "  Great, JIRA's starting up.  You should be able to access it momentarily on port $CUSTOM_PORT on this machine."
echo "-----------------------------------------------------------------------------------------------------------------"
}

installJira

exit 0

Follow

Get every new post delivered to your Inbox.