A few weeks back I took over as maintainer of the twitter-text-python library (source on github). This library lets you take a tweet like:
"@ianozsvald, you now support #IvoWertzel's tweet ... parser! https://github.com/ianozsvald/"
and extract the Twitter entities as defined in the Twitter conformance tests. The entities in the above tweet would be:
-
reply: 'ianozsvald'
-
users: ['ianozsvald']
-
tags: ['IvoWertzel']
-
urls: ['https://github.com/ianozsvald/']
-
lists: [] # no lists in this tweet
-
output html: u'<a href="http://twitter.com/ianozsvald">@ianozsvald</a>, ...
-
you now support <a href="http://search.twitter.com/search?q=%23IvoWertzel">#IvoWertzel</a>\'s
-
tweet parser! <a href="https://github.com/ianozsvald/">https://github.com/ianozsvald/</a>'
If you’re parsing Tweets or status-update-like-entities (from e.g. App.net) in Python then this library makes it easy to extract @people, URLs and #hashtags. You can also request the spans (character locations) for each entity, very useful if you have repeated phrases and you’re doing a search/replace.
The library is easily installed using “$ pip install twitter-text-python” (MIT license) via the Python Package Index, currently at version 1.0.0.2.
Credit – the library was developed by Ivo Wertzel (BonsiaDan on github), I merged a few Pull requests after forking to fix some bugs and have now taken over official maintenance.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
7 Comments