18 July 2014 - 9:12IPython Memory Usage interactive tool
I’ve written a tool (ipython_memory_usage) to help my colleague and I understand how RAM is allocated for large matrix work, it’ll work for any large memory allocations (numpy or regular Python or whatever) and the allocs/deallocs are reported after every command. Here’s an example – we make a matrix of 10,000,000 elements costing 76MB and then delete it:
IPython 2.1.0 -- An enhanced Interactive Python. In : %run -i ipython_memory_usage.py In : a=np.ones(1e7) 'a=np.ones(1e7)' used 76.2305 MiB RAM in 0.32s, peaked 0.00 MiB above current, total RAM usage 125.61 MiB In : del a 'del a' used -76.2031 MiB RAM in 0.10s, peaked 0.00 MiB above current, total RAM usage 49.40 MiB
The more interesting behaviour is to check the intermediate RAM usage during an operation. In the following example we’ve got 3 arrays costing approx. 760MB each, they assign the result to a fourth array, overall the operation adds the cost of a temporary fifth array which would be invisible to the end user if they’re not aware of the use of temporaries in the background:
In : a=np.ones(1e8); b=np.ones(1e8); c=np.ones(1e8) 'a=np.ones(1e8); b=np.ones(1e8); c=np.ones(1e8)' used 2288.8750 MiB RAM in 1.02s, peaked 0.00 MiB above current, total RAM usage 2338.06 MiB In : d=a*b+c 'd=a*b+c' used 762.9453 MiB RAM in 0.91s, peaked 667.91 MiB above current, total RAM usage 3101.01 MiB
If you’re running out of RAM when you work with large datasets in IPython, this tool should give you a clue as to where your RAM is being used.
UPDATE – this works in IPython for PyPy too and so we can show off their homogeneous memory optimisation:
# CPython 2.7 In : l=range(int(1e8)) 'l=range(int(1e8))' used 3107.5117 MiB RAM in 2.18s, peaked 0.00 MiB above current, total RAM usage 3157.91 MiB And the same in PyPy: # IPython with PyPy 2.7 In : l=[x for x in range(int(1e8))] 'l=[x for x in range(int(1e8))]' used 763.2031 MiB RAM in 9.88s, peaked 0.00 MiB above current, total RAM usage 815.09 MiB
If we then add a non-homogenous type (e.g. adding None to the ints) then it gets converted back to a list of regular Python (heavy-weight) objects:
In : l.append(None) 'l.append(None)' used 3850.1680 MiB RAM in 8.16s, peaked 0.00 MiB above current, total RAM usage 4667.53 MiB
The inspiration for this tool came from a chat with my colleague where we were discussing the memory usage techniques I discussed in my new High Performance Python book and I realised that what we needed was a lighter-weight tool that just ran in the background.
My colleague was fighting a scikit-learn feature matrix scaling problem where all the intermediate objects that lead to a binarised matrix took >6GB on his 6GB laptop. As a result I wrote this tool (no, it isn’t in the book, I only wrote this last Saturday!). During discussion (and later validated with the tool) we got his allocation to <4GB so it ran without a hitch on his laptop.
I’m probably going to demo this at a future PyDataLondon meetup.
Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.
No Comments | Tags: Python