Building a high performance cluster with Ubuntu 9.10 and Eucalyptus

I’ve spent the last day (almost) installing Eucalyptus on Ubuntu 9.10 to create a mini ‘high performance computing’ environment.  We’re testing the concept and could build 100+ machines if the prototype works as expected.

This is a running log of my notes, for this post I only have a partial setup.

Note – I have a Eucalytpus follow-up which gets further than this post but ultimately fails.

To start you need Ubuntu 9.10 Server edition, this includes the open Eucalyptus software.  Eucalyptus is an API for cluster computing that is compatible with Amazon’s EC2.  This means you can build an in-house network for testing and private computation and later switch to EC2 if you want to scale up.  This is great if some clients need privacy and some want true utility computing.

Note that the process of installing Eucalyptus requires at least one CD download, or two if you need both the 64bit edition for the Node and 32bit for your Cluster Controller because the machine is too old.  The hardware requirements are a bit steep (Node machines need 1GB+ RAM, 40GB+ HD etc).  Once installed you’ll also have to download at least one instance image that will run on the Nodes, these are about 180MB.  This is a lot to download if you have a tiny VPN pipe to the outside world.

Two good background papers from open.eucalytpus.com are:

Installing UEC via the CD (and UEC main page) is fairly easy, I actually followed these notes (first of three parts) before finding the official docs.

Installing the server took about 30 minutes, most of that was spent reading from the CD.  The questions were pretty easy.  Some notes:

  • For the hard disk setup I used a fresh 40GB disk and chose ‘Guided – Use entire disk’ (not the LVM option)
  • I chose no email configuration (I don’t know the SMTP local details here in the client’s office)
  • For apt-get I had to configuring the proxy so it could see outside of the corporate firewall

To install the Node (1 client) I needed to dual-boot an existing Windows XP machine.  For this I had to use PartitionMagic to resize the 500Gb Windows partition down to 100GB.  This didn’t work – we kept getting ‘error 983 while executing batch’ and the resize would abort.  The solution (as noted many times on the web) is to run ‘chkdsk /f’ at the command prompt – it reboots, does the check, in our case it didn’t report any changes, then PartitionMagic worked.

The candidate Node machine recognises that the Cluster Controller is running on another machine so it nominates itself as a Node.  Only a few questions are asked (e.g. the keyboard) and then everything is installed.  For the HD installation I chose ‘Use the largest contiguous free space’ having blanked 360GB via PartitionMagic earlier.

For reasons that aren’t clear after installation it had trouble finding the network.  I had to ‘sudo /etc/init.d/networking restart’ before it could ‘ping slashdot.org’.  It still won’t do a full ‘sudo apt-get update’ (it completes just fine on the Cluster Controller) but I’ll assume that this isn’t a problem.

Now that the network is good, if I run ‘sudo euca_conf –no-rsync –discover-nodes’ on my Cloud Controller then it reports finding 1 Node.  I can accept the Node but after that I have some sort of authentication fail.  This might be due to the corporate network firewall.

If I jump a step forwards then I can run ‘sudo euca_conf –get-credentials mycreds.zip’, ‘unzip mycreds.zip’, ‘./eucarc’ but then when I run ‘euca-describe-availability-zones verbose’ I get an XML parse error much like this bug.

There are enough network errors here to suggest that the corporate firewall isn’t playing ball (it won’t be the first time).  I’ll restart installation on my two test machines when we have a public internet connection established that avoids the corporate firewall.  I’ll post another entry when I run the second experiment (December, all going well).

Update: I followed the NodeInstallation notes to set the Cloud Controller’s eucalyptus user’s public key into the Node Controller’s eucalyptus user’s authorized_keys file.  That hasn’t fixed the above two errors.

Books:

The following books will help you move forwards, the Eucalyptus one will make the above configuration easier and the second on EC2 will help you see how Eucalyptus and EC2 compare.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and in his Mor Consulting, sign-up for Data Science tutorials in London. He also founded the image and text annotation API Annotate.io, lives in London and is a consumer of fine coffees.

1 Comment

  • Your description is very informative and inspring as we are preparing to create a private using Ubundu and Eucalyptus. regards peter www.peterindia.net