On a previous question I was asking about multiprocessing, using multiple cores to make a program run faster, and someone told me this:
More often than not, you can get a 100x+ optimization with better code compared to a 4x improvement and additional complexities with multiprocessing
They then recommended that I should:
Use a profiler to understand what is slow, then focus on optimizing that.
So I went to this question: How can you profile a script?
Here I found cProfile
and implemented it into some testing code to see how it works.
This is my code:
import cProfile
def foo():
for i in range(10000):
a = i**i
if i % 1000 == 0:
print(i)
cProfile.run('foo()')
However, after running it, this was what I got:
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1018 function calls in 20.773 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 20.773 20.773 <string>:1(<module>)
147 0.000 0.000 0.000 0.000 rpc.py:150(debug)
21 0.000 0.000 0.050 0.002 rpc.py:213(remotecall)
21 0.000 0.000 0.002 0.000 rpc.py:223(asynccall)
21 0.000 0.000 0.048 0.002 rpc.py:243(asyncreturn)
21 0.000 0.000 0.000 0.000 rpc.py:249(decoderesponse)
21 0.000 0.000 0.048 0.002 rpc.py:287(getresponse)
21 0.000 0.000 0.000 0.000 rpc.py:295(_proxify)
21 0.001 0.000 0.048 0.002 rpc.py:303(_getresponse)
21 0.000 0.000 0.000 0.000 rpc.py:325(newseq)
21 0.000 0.000 0.002 0.000 rpc.py:329(putmessage)
21 0.000 0.000 0.000 0.000 rpc.py:55(dumps)
20 0.000 0.000 0.001 0.000 rpc.py:556(__getattr__)
1 0.000 0.000 0.001 0.001 rpc.py:574(__getmethods)
20 0.000 0.000 0.000 0.000 rpc.py:598(__init__)
20 0.000 0.000 0.050 0.002 rpc.py:603(__call__)
20 0.000 0.000 0.051 0.003 run.py:340(write)
1 20.722 20.722 20.773 20.773 test.py:3(foo)
42 0.000 0.000 0.000 0.000 threading.py:1226(current_thread)
21 0.000 0.000 0.000 0.000 threading.py:215(__init__)
21 0.000 0.000 0.047 0.002 threading.py:263(wait)
21 0.000 0.000 0.000 0.000 threading.py:74(RLock)
21 0.000 0.000 0.000 0.000 {built-in method _struct.pack}
21 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
42 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 20.773 20.773 {built-in method builtins.exec}
42 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
63 0.000 0.000 0.000 0.000 {built-in method builtins.len}
10 0.000 0.000 0.051 0.005 {built-in method builtins.print}
21 0.000 0.000 0.000 0.000 {built-in method select.select}
21 0.000 0.000 0.000 0.000 {method '_acquire_restore' of '_thread.RLock' objects}
21 0.000 0.000 0.000 0.000 {method '_is_owned' of '_thread.RLock' objects}
21 0.000 0.000 0.000 0.000 {method '_release_save' of '_thread.RLock' objects}
21 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.RLock' objects}
42 0.047 0.001 0.047 0.001 {method 'acquire' of '_thread.lock' objects}
21 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
21 0.000 0.000 0.000 0.000 {method 'dump' of '_pickle.Pickler' objects}
20 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
21 0.000 0.000 0.000 0.000 {method 'getvalue' of '_io.BytesIO' objects}
21 0.000 0.000 0.000 0.000 {method 'release' of '_thread.RLock' objects}
21 0.001 0.000 0.001 0.000 {method 'send' of '_socket.socket' objects}
I was expecting it to show me what parts of my code were taking the longest, for example for it to show that a = i**i
was taking the longest to compute, however all I can gather from what it has told me is that the foo()
function took the longest, however what took the longest inside that function I have no idea from the data.
Also, when I implement this into my actual code, it does the same thing. Everything is in functions, and it only tells me what functions are taking the longest rather than what in the function is taking so long.
So here are my main questions:
How do I see what inside the function is making the code take so long (Should I even use cProfile
?)
What is the best way to set about optimizing my code once I know what is using the most CPU
Note: My RAM and disk etc is absolutely fine, it is just the CPU which is maxed out (12% CPU because it only runs on a single core)
How do I see what inside the function is making the code take so long (Should I even use cProfile?)
Yes, you can use cProfile
but the way you asked the question makes me wonder if line_profiler
(third party module, you need to install it) wouldn't be a better tool.
I'm using the IPython/Jupyter bindings of this package when I want to profile a function:
%load_ext line_profiler
To actually profile a function:
%lprun -f foo foo()
# ^^^^^---- this call will be profiled
# ^^^-----------function to profile
Which produces this output:
Timer unit: 5.58547e-07 s
Total time: 17.1189 s
File: <ipython-input-1-21b5a5f52f66>
Function: foo at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def foo():
2 10001 31906 3.2 0.1 for i in range(10000):
3 10000 30534065 3053.4 99.6 a = i**i
4 10000 75998 7.6 0.2 if i % 1000 == 0:
5 10 6953 695.3 0.0 print(i)
That includes several things that might be interesting. For example 99.6%
of the time is spent in the i**i
line.
- What is the best way to set about optimising my code once I know what is using the most CPU
That depends. Sometimes you need to use different functions/datastructures/algorithms - sometimes you can't do anything. But at least you know where your bottleneck is and you can estimate how much impact a change at the bottleneck or somewhere else would have.