Using Anaconda with Cloudera CDH
================================

NOTE: This page is superseded, please see https://docs.continuum.io/anaconda-scale/cloudera-cdh

There are two methods of using Anaconda on an existing cluster with
`Cloudera CDH <http://www.cloudera.com/products/apache-hadoop/key-cdh-components.html>`_,
Cloudera's Distribution Including Apache Hadoop:
1) the `Anaconda parcel for Cloudera CDH <http://docs.continuum.io/anaconda/cloudera>`_,
and 2) Anaconda for cluster management. The instructions below
describe how to uninstall the Anaconda parcel on a CDH cluster and transition
to Anaconda for cluster management.

Uninstalling the Anaconda parcel
--------------------------------

If the Anaconda parcel is installed on the CDH cluster, use the following steps
to uninstall the parcel. Otherwise, you can skip to the next section.

#. From the Cloudera Manager Admin Console, click the Parcels indicator in the
   top navigation bar.

#. Click the ``Deactivate`` button to the right of the Anaconda parcel listing.

#. Click ``OK`` on the Deactivate prompt to deactive the Anaconda parcel and
   restart Spark and related services.

#. Click the arrow to the right of the Anaconda parcel listing and choose
   ``Remove From Hosts``, which will prompt with a confirmation dialog.

#. The Anaconda parcel has been removed from the cluster nodes.

For more information about managing Cloudera parcels, refer to the
`Cloudera documentation <http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_parcels.html#cmug_topic_7_11_5_unique_1__section_sd4_bzx_bm_unique_1>`_.


Using Anaconda for cluster management
-------------------------------------

Anaconda for cluster management provides additional functionality, including
the ability to manage multiple conda environments and packages (including
Python and R) alongside an existing CDH cluster.

#. Configure the nodes with Anaconda for cluster management using the
   `Bare-metal Cluster Setup instructions <http://docs.continuum.io/anaconda-cluster/create-bare>`_.

#. During this process, you will create a profile and provider that describes
   the cluster.

#. Provision the cluster using the following command, replacing ``cluster-cdh``
   with the name of your cluster and ``profile-cdh`` with the name of your
   profile:

   .. code-block:: bash

      $ acluster create cluster-cdh -p profile-cdh

#. You can submit Spark jobs along with the ``PYSPARK_PYTHON`` environment
   variable that refers to the location of Anaconda, for example:

   .. code-block:: bash

      $ PYSPARK_PYTHON=/opt/anaconda/bin/python spark-submit pyspark_script.py
