====================
Data Scientist tasks
====================

A data scientist is a user who authors notebooks for distribution to other
users to use in Excel. This guide provides important information for these
users.

Registering Excel functions
===========================

To register a Python function that will be exposed in Excel using Fusion:

#. Run this code to add the ``@fusion.register()`` decorator:

   .. code-block:: python

    from anacondafusion.fusion import fusion

    @fusion.register()
    def add_evens(data):
        total = 0
        for item in *data:
            if item % 2 == 0:
                total = total + item
        return total

Passing a list of predefined inputs
===================================

You can pass a list of predefined inputs into drop-down lists for Excel users
by using Fusion to pass them as an argument in the decorator.

EXAMPLE:

.. code-block:: python

    algorithms = ['MiniBatchKMeans', 'AffinityPropagation', 'MeanShift',
                  'SpectralClustering', 'Ward', 'AgglomerativeClustering',
                 'Birch', 'DBSCAN']

    @fusion.register(args={'algorithm':{"values": algorithms}, 'n_clusters': {"values":[3, 4, 5, 6]}})

The Excel user will be able to select the values for algorithms and n_clusters
from lists with the options you entered:

.. figure:: /img/fusion_clustering_options.png
    :width: 50%

|

Documenting functions for end users
===================================

You can write documentation for users of your Fusion functions in markdown by
writing docstrings.

EXAMPLE:

.. code-block:: python

    @fusion.register(args={'algorithm':{"values": algorithms},
    'n_clusters': {"values":[3, 4, 5, 6]}})
    def clustering(data, algorithm='MiniBatchKMeans',
    n_clusters=3):
    """
    Use Clustering function
    -----------------------

    The clustering function receives a 2-column table (x, y)
    `data` and applies the selected `algorithm` with the
    number of clusters `n_clusters`.

    The available algorithms are:

    * MiniBatchKMeans
    * AffinityPropagation
    * MeanShift
    * SpectralClustering
    * Ward
    * AgglomerativeClustering
    * Birch
    * DBSCAN

    For more information see,  `clustering and scikit-learn
    <http://scikit-learn.org/stable/modules/clustering.html>`_.



When a user clicks the i Information icon next to the function in the Fusion
pane of Excel, they can read the documentation written in the docstring:

.. figure:: /img/fusion_clustering_docs.png
    :width: 30%

|

Plotting with Bokeh
===================

When plotting with Bokeh, use the ``display_plot`` function for the plot to be
displayed in Fusion.

EXAMPLE:

.. code-block:: python

    from bokeh.plotting import figure, output_file, show
    from anacondafusion.fusion import fusion, display_plot

    @fusion.register()
    def plot_example():

        plot = figure(plot_width=400, plot_height=400)
        plot.circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=20,
        color="navy", alpha=0.5)

        display_plot(plot)

|

.. figure:: /img/fusion_display_plot.png
    :width: 50%

|
