Profiling

The Python standard library includes code profiling functionality. When active, function invocations and the time spent on them are recorded.

The data_profiler module extends this functionality by also recording the functions’ signatures, which are useful because often the precise control flow –and thus function performance– depends on the argument types. For NumPy array types, this includes not only the dtype attribute, but also the array’s shape.

Profiling example

The following code defines a simple dot function (in a file called dot.py and profiles it without signatures to match the behaviour of the Python standard library profile module:

import data_profiler as profiler
import numpy as np

def dot(a, b):
  sum=0
  for i in range(len(a)):
      sum += a[i]*b[i]
  return sum

a = np.arange(16, dtype=np.float32)
b = np.arange(16, dtype=np.float32)

p = profiler.Profile(signatures=False)
p.enable()
dot(a, b)
p.disable()
p.print_stats()

This generates the following output:

      3 function calls in 0.000 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 builtins.len
     1    0.000    0.000    0.000    0.000 dot.py:7(dot)
     1    0.000    0.000    0.000    0.000 {method 'disable' of 'prof.Profiler' objects}

By default, the Profile constructor’s signature flag is set to True, making this change results in the following output:

      3 function calls (2 primitive calls) in 0.000 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 dot.py:1(disable())
   2/1    0.000    0.000    0.000    0.000 dot.py:7(dot(a:ndarray(dtype=float32, shape=(16,)), b:ndarray(dtype=float32, shape=(16,))))

In the case of more realistic code, the call graph (and subsequently the table of function calls) is typically much bigger. This can make working with data in tabular form inconvenient. To solve this issue, the data_profiler module provides functionality to visualize the data by calling the data_profiler.plot() function in place of the print_stats() method.

NOTE: At this time, the data_profiler.plot() function may only be called from inside a notebook.

If the code above was executed from inside a notebook, run the following code:

In [3]: plot(p)

Which generates the following output:

_images/profiling.png

data_profiler.profiler API