Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

28 May 2013 - 11:41Thoughts from a month’s backpacking honeymoon

I’m publishing this on the hoof, right now we’re in Istanbul near the end of our honeymoon back home. Here are some app-travelling notes (for our Nexus 4 Androids).

Google Translate offers Offline dictionaries for all the European languages, each is 150mb. We downloaded new ones before each country hop. Generally they were very useful, some phrases were wrong or not colloquial (often for things like “the bill please”). Some languages had pronunciation guides, they were ok but a phrase book would be better. It worked well as a glorified language dictionary.

Google Maps Offline were great except Hungary where offline wasn’t allowed (it didn’t explain why).

The lack of phrase or dictionary apps was a pain, there’s a real dearth on Android. Someone should fill this gap!

WiFi was fairly common throughout our travels so we rarely used our paper Guides. WiFi was free in all hotels, sometimes in train stations, often in cafes and bars even in Romania.

WikiSherpa caches recent search results which are pulled out of Wikipedia and Wikivoyage, this works like a poor man’s RoughGuide. It doesn’t link to any maps or cache images but if you search on a city, you can read up on it (e.g. landmarks, how to get a taxi etc) whilst you travel.

The official WikiPedia app has page saving, this is useful for background info on a city when reading offline.

AnyMemo is useful for learning phrases in new languages. It is chaotic as the learning files aren’t curated. You can edit the files to remove the phrases you don’t need and to add useful new ones in.

Emily notes that TripAdvisor on Android doesn’t work well (the iPhone version was better but still not great). Emily also notes that hotels.com, lastminute and booking.com were all useful for booking most of our travels and hotels.

We used foursquare when we had WiFi, sadly there is no offline mode so I just starred locations using Google Maps. Foursquare needs a language independent reading system, trying to figure out if a series of Turkish reviews were positive or not based on the prevalence of smileys wasn’t easy (Google Translate integration would have helped). An offline FourSquare would have been useful (e.g. for cafes near to our spot).

We really should have bought a WiFi 3G dongle. The lack of data was a pain. We used Emily’s £5 travel data day plans on occasion (via Three). It works for most of Europe but not Switzerland or Turkey.

Given that we have WikiPedia and Wiktionary, how come we don’t have a “WikiPhrases” (“wikilingo”?) with multi-language forms of common phrases? Just like the phrase books for travel that we can buy but with good local phrases and idioms across any language that gets written up. This feels like it’d have a lot of value.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, Travel

5 May 2013 - 14:32June project: Disambiguating “brands” in Social Media

Having returned from Chile last year, settled in to consulting in London, got married and now on honeymoon I’m planning on a change for June.

I’m taking the month off from clients to work on my own project, an open sourced brand disambiguator for social media. As an example this will detect that the following tweet mentions Apple-the-brand:
“I love my apple, though leopard can be a pain”
and that this tweet does not:
“Really enjoying this apple, very tasty”

I’ve used AlchemyAPI, OpenCalais, DBPedia Spotlight and others for client projects and it turns out that these APIs expect long-form text (e.g. Reuters articles) written with good English.

Tweets are short-form, messy, use colloquialisms, can be compressed (e.g. using contractions) and rely on local context (both local in time and social group). Linguistically a lot is expressed in 140 characters and it doesn’t look like”good English”.

A second problem with existing APIs is that they cannot be trained and often don’t know about European brands, products, people and places. I plan to build a classifier that learns whatever you need to classify.

Examples for disambiguation will include Apple vs apple (brand vs e.g. fruit/drink/pie), Seat vs seat (brand vs furniture), cold vs cold (illness vs temperature), ba (when used as an abbreviation for British Airways).

The goal of the June project will be to out-perform existing Named Entity Recognition APIs for well-specified brands on Tweets, developed openly with a liberal licence. The aim will be to solve new client problems that can’t be solved with existing APIs.

I’ll be using Python, NLTK, scikit-learn and Tweet data. I’m speaking on progress at BrightonPy and DataScienceLondon in June.

Probably for now I should focus on having no computer on my honeymoon…

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

4 Comments | Tags: ArtificialIntelligence, Life, Python, SocialMediaBrandDisambiguator