Profiling
=========

.. raw:: html

    <p>The Python standard library includes functionality to profile code. In that mode
    function invocations and the time spent in them are recorded. The <cite>accelerate.profiler</cite>
    module extends that functionality by also recording the functions&#8217; signatures, which
    is useful as often the precise control flow (and thus function performance) depends
    on the argument types. For numpy array types, this includes not only the dtype
    attribute, but also the array&#8217;s shape.
    To demonstrate this, let us define a simple dot function and profile it without
    signatures, to match the behaviour of the Python standard library profile module.</p>
    <div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">accelerate</span> <span class="kn">import</span> <span class="n">profiler</span>

    <span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
      <span class="nb">sum</span><span class="o">=</span><span class="mi">0</span>
      <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">a</span><span class="p">)):</span>
          <span class="nb">sum</span> <span class="o">+=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">*</span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
      <span class="k">return</span> <span class="nb">sum</span>

    <span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
    <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>

    <span class="n">p</span> <span class="o">=</span> <span class="n">profiler</span><span class="o">.</span><span class="n">Profile</span><span class="p">(</span><span class="n">signatures</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
    <span class="n">p</span><span class="o">.</span><span class="n">enable</span><span class="p">()</span>
    <span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
    <span class="n">p</span><span class="o">.</span><span class="n">disable</span><span class="p">()</span>
    <span class="n">p</span><span class="o">.</span><span class="n">print_stats</span><span class="p">()</span>
    </pre></div>
    </div>
    <p>which will generate output like this:</p>
    <div class="highlight-python"><div class="highlight"><pre><span></span>      3 function calls in 0.000 seconds

    Ordered by: standard name

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    0.000    0.000 builtins.len
         1    0.000    0.000    0.000    0.000 dot.py:7(dot)
         1    0.000    0.000    0.000    0.000 {method &#39;disable&#39; of &#39;prof.Profiler&#39; objects}
    </pre></div>
    </div>
    <p>However, by default the <cite>Profile</cite> constructor&#8217;s <cite>signature</cite> flag is set to <cite>True</cite>,
    resulting in this output instead:</p>
    <div class="highlight-python"><div class="highlight"><pre><span></span>      3 function calls (2 primitive calls) in 0.000 seconds

    Ordered by: standard name

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    0.000    0.000 dot.py:1(disable())
       2/1    0.000    0.000    0.000    0.000 dot.py:7(dot(a:ndarray(dtype=float32, shape=(16,)), b:ndarray(dtype=float32, shape=(16,))))
    </pre></div>
    </div>
    <p>For more realistic code the call graph (and thus table of function calls) is obviously
    much bigger, so working with the data in tabular form is not very convenient.
    The <cite>accelerate.profiler</cite> module therefore also provides functionality to visualize the
    data. Instead of calling the <cite>print_stats()</cite> method we may call the <cite>accelerate.profiler.plot()</cite> function. Running in an interactive notebook, this results in output like this:</p>
    <img alt="../_images/profiling.png" src="../_images/profiling.png" />
    <div class="section" id="the-accelerate-profiler-api">
    <h2>The accelerate.profiler API<a class="headerlink" href="#the-accelerate-profiler-api" title="Permalink to this headline">¶</a></h2>
    <dl class="class">
    <dt id="accelerate.profiler.Profile">
    <em class="property">class </em><code class="descclassname">accelerate.profiler.</code><code class="descname">Profile</code><span class="sig-paren">(</span><em>custom_timer=None</em>, <em>time_unit=None</em>, <em>subcalls=True</em>, <em>builtins=True</em>, <em>signatures=True</em><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.Profile" title="Permalink to this definition">¶</a></dt>
    <dd><p>Builds a profiler object using the specified timer function.
    The default timer is a fast built-in one based on real time.
    For custom timer functions returning integers, time_unit can
    be a float specifying a scale (i.e. how long each integer unit
    is, in seconds).</p>
    <dl class="method">
    <dt id="accelerate.profiler.Profile.clear">
    <code class="descname">clear</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.Profile.clear" title="Permalink to this definition">¶</a></dt>
    <dd><p>Clear all profiling information collected so far.</p>
    </dd></dl>

    <dl class="method">
    <dt id="accelerate.profiler.Profile.disable">
    <code class="descname">disable</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.Profile.disable" title="Permalink to this definition">¶</a></dt>
    <dd><p>Stop collecting profiling information.</p>
    </dd></dl>

    <dl class="method">
    <dt id="accelerate.profiler.Profile.enable">
    <code class="descname">enable</code><span class="sig-paren">(</span><em>subcalls=True</em>, <em>builtins=True</em><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.Profile.enable" title="Permalink to this definition">¶</a></dt>
    <dd><p>Start collecting profiling information.
    If &#8216;subcalls&#8217; is True, also records for each function
    statistics separated according to its current caller.
    If &#8216;builtins&#8217; is True, records the time spent in
    built-in functions separately from their caller.</p>
    </dd></dl>

    <dl class="method">
    <dt id="accelerate.profiler.Profile.getstats">
    <code class="descname">getstats</code><span class="sig-paren">(</span><span class="sig-paren">)</span> &rarr; list of profiler_entry objects<a class="headerlink" href="#accelerate.profiler.Profile.getstats" title="Permalink to this definition">¶</a></dt>
    <dd><p>Return all information collected by the profiler.
    Each profiler_entry is a tuple-like object with the
    following attributes:</p>
    <blockquote>
    <div>code          code object
    callcount     how many times this was called
    reccallcount  how many times called recursively
    totaltime     total time in this entry
    inlinetime    inline time in this entry (not in subcalls)
    calls         details of the calls</div></blockquote>
    <p>The calls attribute is either None or a list of
    profiler_subentry objects:</p>
    <blockquote>
    <div>code          called code object
    callcount     how many times this is called
    reccallcount  how many times this is called recursively
    totaltime     total time spent in this call
    inlinetime    inline time (not in further subcalls)</div></blockquote>
    </dd></dl>

    <dl class="method">
    <dt id="accelerate.profiler.Profile.print_stats">
    <code class="descname">print_stats</code><span class="sig-paren">(</span><em>sort=-1</em><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.Profile.print_stats" title="Permalink to this definition">¶</a></dt>
    <dd><p>Print a table with profile statistics.</p>
    </dd></dl>

    </dd></dl>

    <dl class="function">
    <dt id="accelerate.profiler.plot">
    <code class="descclassname">accelerate.profiler.</code><code class="descname">plot</code><span class="sig-paren">(</span><em>profiler</em><span class="sig-paren">)</span><a class="headerlink" href="#accelerate.profiler.plot" title="Permalink to this definition">¶</a></dt>
    <dd><p>Generate visualization of the current profile statistics
    in <cite>profiler</cite>. Right now this is only supported for interactive notebooks</p>
    </dd></dl>
