A month back I tried to build an Ubuntu-based Eucalyptus cloud/cluster environment for a client for a parallel processing research project. The project was thwarted by an overly aggressive corporate firewall and my lack of understanding of low-level network config-fu.
I’ve revisited the project using the same machines but with an external public internet connection (no firewall – yay!).
On the node machine I still needed to dual-boot to Windows. Unfortunately whilst reboots to Linux are fine, if Windows is booted it ‘does something’ to the MBR and the machine is unbootable. I delved into the boot-loader and had to learn some Grub2-fu.
Grub2 was introduced in Ubuntu 9.10, it replaces Grub which in turn replaced boot managers like lilo. The wiki page is pretty good for recovering a boot-loader using an Ubuntu LiveCD but it didn’t work quite to plan.
The step for ‘sudo chroot /mnt’ fails as bash or sh can’t be run from within /mnt (which at this point is looking at the originally installed hd). There is something odd going on with the LiveCD, much googling didn’t seem to reveal the answer.
To run grub-install on the hd, rather than via the CD (because chroot fails) I used ‘sudo grub-install –root-directory=/mnt /dev/sda’, it reports that ‘(hd0) /dev/sda’ is installed.
Sidenote – on later attempts somehow a reference to (fd0) got involved and this broke the boot process. I edited /mnt/boot/grub/device.map to remove the fd0 reference, leaving the hd0 reference. I ran grub-install again and all was fine. Now the machine can boot again.
Mounting a USB memory stick
Whilst a 8Gb memory stick was recognised, it didn’t get mounted. I had to edit /etc/fstab and add:
/dev/sdf1 /mnt/stick auto umask=0,user,iocharset=iso8859-1,sync,codepage=850,noauto,exec,users 0 0
After this I used ‘sudo mkdir /mnt/stick’, ‘sudo mount /dev/sdf1’ and it mounted just fine.
The install process this time around was much the same as before, except this time without the firewall it all ‘just worked’. Seeing the fnords part 1 took me through the basic install.
I got the feeling from later steps that the cloud controller needs a static IP so I switched the cluster controller from DHCP to a static IP and rebooted.
The discover nodes process (‘sudo euca_conf –no-rsync –discover-nodes’) for euca_conf also required that I’d setup ssh keys on the Node, step 6 in the NodeInstall doc has the instruction. Typo note – if you spell ‘eucalyptus’ wrong you’ll go round in circles trying to figure out why the password won’t work!
Sometimes I couldn’t get ‘euca-describe-availability-zones verbose’ to work, it’d just report ‘No route to host’. It seems that a reboot of the CC and Node are required, plus a minute or so of patience after boot for Apache to sort itself out, before this problems just goes away.
Using the Ubuntu Store
Having installed the CC and registered a Node, next I ran the web interface via ‘https://10.0.0.4:8443’. Note ‘https’. If you visit the website too soon after a reboot (i.e. <1 minute) then the webapp won’t respond or maybe it won’t recognise the admin user. Having logged in, the first login forces a password change.
Next check the ‘Configuration’ tab and verify the IP addresses. For reasons beyond my understanding our switch rebooted during my first attempt to setup the cluster and it switched from the ‘192.168.x.x’ address range to ’10.x.x.x’ – this royally barfed my configuration. I chose to re-install the CC from scratch (I was plagued by ‘no route to host’ problems no matter how much tweaking I tried).
Next visit the ‘Store’ tab and download an image, I’m using ‘Ubuntu 9.10 Karmic Koala (i386)’. Today this works – I’ve spent 2.5 days building and re-building the cluster to get it to this point. Often the Store would download an image and then report ‘no route to host’. This process is pretty darned frustrating and seems to lack useful error messages.
But ultimately – no cigar
Rather frustratingly I can’t get my Node to run an image. I can see that the Node exists though ‘euca-describe-availability-zones verbose’ shows that a Node exists but doesn’t list its IP address which is odd, the online docs say it should be shown.
If I run an image then it enters the ‘pending’ state and then the ‘terminating’ state. Digging around in Google shows that other people currently have the same problem, it might be related to the lack of Hypervisor instructions on my Node machine (though they’re not supposed to be required…). Possibly also the current build in unstable, there’s a lot of bug-fixing going on.
Eucalyptus has a trouble-shooting guide, this blog series is very useful.
Eucalyptus should give you an EC2-like cloud that runs on your own machines, using an EC2-compatible API so you could move to the cloud when you want to scale up or are less concerned about the privacy of your data. Currently I can’t get it to work but others do have it working – it seems to depend upon your hardware. It also lacks clear error messages so debugging is hard – I resorted to clean installs on three occasions.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.