Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

Using ZeroFree to shrink a VirtualBox Linux Image

My development Ubuntu image inside VirtualBox was using too much space to store empty but non-zero disk blocks on its virtual drive. This sucked space from my laptop’s SSD (which is already not big enough!). Shrinking it by zeroing the blocks took a little bit of effort.

Inside VirtualBox if I boot my Ubuntu 11.04 instance ‘df -h’ reports 14GB used of a 27GB virtual drive (where the .vdi image is 22GB). Back on the host laptop I see that the .vdi file is 22GB which suggests that 8GB could be freed up. The problem is that the unused blocks have been used and then deleted but not reset to 0, so VirtualBox keeps them allocated.

The solution is simple using these instructions:

  • Delete any snapshots in VirtualBox so you’re just dealing with 1 .vdi file (not sure if this helps but it seemed sensible)
  • Get the PartImage ISO (about 300MB), it includes zerofree (unlike my usual favourite SysRescueCD)
  • Mount the PartImage ISO as a CD drive in VirtualBox, boot the virtual machine, it’ll start PartImage
  • Run “$ fsck /dev/sda1” to confirm that the Ubuntu hd (/dev/sda1) is clean
  • Make a temporary mount point “$ mkdir /mnt/ss”
  • Mount the drive read-only and without modifying /etc as our host image is read-only “$ mount -n -o ro -t ext4 /dev/sda1 /mnt/ss”
  • Check the space that’s used “$ du -h -s /mnt/ss” – for me this reported 14GB of a 27GB drive
  • Let zerofree reset all the unused blocks “$ zerofree /dev/sda1” – took about 5 minutes for a 22GB .vdi on my SSD
  • Close the virtual machine
  • Back on the host laptop ask vboxmanage to compact all the zero’d blocks “$ vboxmanage modifyhd –compact /home/ian/<path>…/myimage.vdi” and remember to use a fully-qualified path (not ./myimage.vdi) else it reports an error
  • Wait 2 minutes
  • Check the .vdi image size – mine shrank from 22GB to 17GB
  • Reboot the virtual image to confirm that the disk is unaffected (in the virtual machine it still reports 14GB used on a 27GB partition)

This solution looks easier but requires the dynamically allocated virtual image to expand to its fullest size which won’t work for me – right now my laptop has less free space than the virtual image is allowed to use!


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Encouraging Online Privacy

I’ve begun to think on the increasing need for improved online privacy. Once upon a time my mum’s communications didn’t occur on the Internet, now they routinely do. Once upon a time I had a reasonable idea about how secure my communications were, now it seems that governments are doing things like intercepting our SSL traffic in the UK ‘to protect my freedom’. This is rather worrying, there are ways to stay ahead of these invasions of privacy. Looking around the world we’re leading the pack in some ways whilst lagging (massively – China and Iran for example) in others.

I’m toying with writing a book on the subject, looking at all the ways one can protect their privacy online (including browsing securely, easily configuring and using VPNs, avoiding leaving digital fingerprints that can be exploited etc). More at the end. In the UK we already have one of the highest densities of security cameras in the world (guessed to be 1 camera for every 32 people or You Get Photographed Hundreds Of Times A Day In The UK). Now the Government wants to read and record all of our online communications.

Channel 4 carries the story about the UK Government’s increasingly invasive attitudes – “‘Black boxes’ to monitor all internet and phone data“. Sadly this is not a crazily spun story. It does indeed look like the Government is preparing to issue Black Boxes which strip ‘header data’ from all communications which is stored for a year by our ISPs. Other companies have been involved in inserting themselves into the SSL Certificate Chain so that they can seamlessly decrypt the data they intercept. The Government needs a colluding Certificate Provider and a bunch of ISPs (they carry the traffic from your home) who are legally required to comply, then they’re in business. All in the name of anti-terrorist and anti-paedophillia safety.

What’s really interesting is the increasing realisation that the SSL security layer is often poorly implemented (e.g. according to this ongoing analysis only 12.9% of sites using SSL do so securely). SSL rests on the idea that all members of the certificate chain are trustworthy, haven’t made mistakes, haven’t leaked data and aren’t colluding with third parties who are working against our interests. There have been enough breaches to show that the basic idea of ‘fully trusting in SSL’ is a poor idea.

On top of this it seems that we are required by law to hand over encryption keys or face imprisonment if the police believe we have an encrypted file which they wish to read. They do not have to prove that the document is encrypted (the example given is – what happens if I have a file of random numbers used as a seed in experiments?). It rather feels that our liberty loving Government is happy for us to do whatever we like, as long as they can read everything they want, particularly without requiring us to know (via ISPs) that they’re reading our communications. Feels a touch like a Cold War novel, no?

So, I’m toying with writing a book on the subject of protecting and encouraging Online Privacy, currently I just have a Mailchimp mailing list (sign-up here). For the two previous books I’ve written I first gathered emails, then queried to figure out what needs writing, then covered those topics. I’ll do the same here. Topics that I could cover include technical means to protect  one’s communication (e.g. VPNs and Proxys and using them on mobile devices), ways to encourage free speech (e.g. Freenet and Tor), browser plug-ins that improve privacy and a look at all the ways we leave digital fingerprints that may share more knowledge about us than we’d expect (e.g. Facebook likes and photos and contacts, our locations & preferences via Foursquare).

Sign up if you’re curious and tell me what you’d like covered. There won’t be any spam or foolishness, this is a research mailing list to figure out what ought to be covered by a book on Online Privacy. I’ll be collaborating on the book with Balthazar Rouberol.

If you’re not already a member I’d sincerely recommend you check out the UK’s Open Rights Group (I’m Founder #282) and the USA’s Electronic Frontier Foundation.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Some international flight tips

Having flown a lot recently I’ve discovered that whilst it isn’t super-fun, it is no longer terribly uncomfortable. Maybe the crutches I use are useful to someone else.

Water – it turns out that if you buy water in Duty Free they will bag it and carry it onto the plane for you. This avoids having your water confiscated at security. Bizarre but true. Useful for budget flights if you’re not sure what the service will be like.

Neck pillow – I carry a simple blow-up pillow, it offers enough neck support that I can sleep on a plane and it packs down to a tiny size. I don’t have a specific recommendation, just make sure you try it before buying. Some I’ve tried offer very little neck support which (for me) makes them useless. Most do seem to support the neck well (normally you can try them at the airport). I see this one in Amazon which has one of the highest ratings – I’ve never used this (it might be overkill!) but the reviews are probably a good starting point.

Hearos Xtreme Protection Ear Plugs with plastic carry case – the most effective noise-blocking ear plugs I’ve found, they’re soft and can be cut to size (so they don’t fall out of your ear). I find sleeping with these to be super comfortable (they only irritate if used several nights, all night, in a row). They’re way more comfortable than wax earplugs and they have the highest noise-blocking rating that I’ve found (so they cut out more noise than the other types). If you know of even more effective ear plugs I’d certainly be open to suggestions. The plastic case is great on a plane so you don’t lose them (they stop them being squashed during storage which can stop them expanding properly).

They’re recommended for use in construction (!) and cut out most of the noise of people talking, babies crying etc. They block some of the low frequency engine drone (enough so that I can sleep) but don’t entirely remove it. They’re cheap and seem to work for anyone who has tried them. If you’ve only tried the free earplugs on a plane then you should know that these are significantly better at noise blocking.

Jasmine Silk Filled Eye Mask – the most comfortable eye blinds I’ve ever used (significantly more comfy and light-blocking than the freebie ones from an airline).They’re really comfy (Emily uses them now too), block almost all the light even on a bright day and I’m about to buy a second pair (after a year the elastic has stretched and they’re now a bit loose). I use these in hotels that have rubbish curtains – great for jet lag when you need the dark.

Diphenhydramine based sleeping tablets (like these Nytol One a Night). Personally I use sleeping tablets to help with jet lag and to sleep on planes. I’m not recommending you do the same (specifically I’m not making a medication recommendation), I’m just saying what works for me. Diphenhydramine is a first generation antihistamine which just happens to make you sleepy. It takes about 15 minutes for me to feel groggy and typically I can sleep on a plane for 4+ hours with one tablet. Without a sleeping tablet I never sleep on planes (and getting half a night+ of sleep is a great way to kill time and feel fairly-ok at the other end). The downside is that you’re not alert on the plane which isn’t ideal if there’s an emergency.

Kindle wifi – Emily and I carry our Kindles everywhere. Great for fiction, reasonable for PDFs (e.g. some science papers – but not brilliant as the reformatting isn’t very strong), it is super easy to buy new eBooks via the Amazon site or through the device. We just use the wifi version (we rarely need to buy ‘on the go’ with 3G). The click-to-select keyboard (you use a cursor to ‘type’ on a virtual keyboard) is fairly rubbish but since you rarely use an ebook reader to type, that’s not a problem. It runs for weeks, is visible in most lighting conditions, is small and ‘just works’. We really like ours.

I also use Foursquare at airports to figure out where the good food is to be found (along with wall sockets and free wifi). Staff are often quite helpful if you need a recommendation, particularly in explaining how to easily move between terminals if they have a favourite cafe.  WikiTravel and TripAdvisor are also useful to learn about travel+safety and recommended locations too.

Update@natbat recommends TripIt for travel organisation (along with @seb_ly and @plo). Nat also suggests Melatonin – I used to use it as it helped get me to sleep faster (as in – within 5 minutes) but they didn’t generate a deep sleep (so I prefer an over the counter sleeping tablet for a flight now). The comments at Amazon on Melatonin products seem quite varied – do your reading.

Update – flightfox runs competitions to find the lowest flights e.g. this one is for <$2000 round the world travel. This seems to be an interesting site to try to shave money off of complex flight requirements.

Update – this NYTimes article (“How the Tough Get Going: Silicon Valley Travel Tips”) has a bunch of tips including lightweight clothing and ways to be more time efficient with US travel.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Parallel Computing with Python Tutorial at EuroSciPy (end of August)

I’ll be teaching Parallel Computing with Python (abstract to follow) at EuroSciPy on 23rd August 2012 in Brussels. Early bird tickets for the conference are available until July 22nd (even without the 50% discount the cost is still super-low).

In my tutorial we’ll work through parallel processing examples on 1 machine with multiple cores and across several machines. By the end of the tutorial you should know which choices to make to parallelize your task:

  • multiprocessing (single machine with multiple cores whilst avoiding the GIL)
  • parallelpython (single or multiple machines with multiple cores)
  • iPython cluster (iPython’s cluster-compute support)
  • GearMan (job processing library with Python bindings)
  • PiCloud (cloud-based parallel Python service)

I had a lot of fun at EuroSciPy last year, it was great to meet lots of other Python science geeks. The talk list (‘talk list’ tab) this is pretty amazing. The first two parts of my tutorial are based on my High Performance Python 1 tutorial from PyCon, the other three parts are new.

I’m thinking of writing an updated High Performance+Parallel Python guide (probably as a self-published book), if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Jailbreaking iOS 5.1.1

I thought it’d be hard but it turns out to be super-easy. In the last hour I’ve jailbroken my iOS 5.1 iPhone 4S by upgrading to iOS 5.1.1 and then running the Absinthe 2.0.4 jailbreak (via here). It seems that iOS 5.1 doesn’t have an untethered jailbreak (only a tethered one – you have to boot with your phone tethered to maintain the jailbreak). This opens the door to playing with the jailbroken app store Cydia.

I see that my Galaxy S now has an Ice Cream Sandwich open source OS too. More toys 🙂

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More