Troubleshooting (AEN 4.2.1)
===========================

.. raw:: html

    <p>This troubleshooting guide provides you with ways to deal with
    issues that may occur with your AEN installation.</p>
    <div class="contents local topic" id="contents">
    <ul class="simple">
    <li><a class="reference internal" href="#general-troubleshooting-steps" id="id20">General troubleshooting steps</a></li>
    <li><a class="reference internal" href="#browser-error-too-many-redirects" id="id21">Browser error: too many redirects</a></li>
    <li><a class="reference internal" href="#error-unix-opt-wakari-wakari-server-etc-supervisor-sock-no-such-file" id="id22">Error: unix:////opt/wakari/wakari-server/etc/supervisor.sock no such file</a></li>
    <li><a class="reference internal" href="#error-data-center-not-found-when-deleting-a-project" id="id23">Error: &#8220;Data Center Not Found&#8221; when deleting a project</a></li>
    <li><a class="reference internal" href="#forgotten-administrator-password" id="id24">Forgotten administrator password</a></li>
    <li><a class="reference internal" href="#log-files-being-deleted" id="id25">Log files being deleted</a></li>
    <li><a class="reference internal" href="#error-this-socket-is-closed" id="id26">Error: This socket is closed</a></li>
    <li><a class="reference internal" href="#service-error-502-cannot-connect-to-the-application-manager" id="id27">Service error 502: Cannot connect to the application manager</a></li>
    <li><a class="reference internal" href="#communication-error-on-amazon-web-services-aws" id="id28">502 communication error on Amazon web services (AWS)</a></li>
    <li><a class="reference internal" href="#invalid-username" id="id29">Invalid username</a></li>
    <li><a class="reference internal" href="#notebook-error-cannot-download-notebook-as-pdf-via-latex" id="id30">Notebook Error: Cannot download notebook as PDF via LaTeX</a></li>
    <li><a class="reference internal" href="#unresponsive-wk-server-thread-without-error-messages" id="id31">Unresponsive <code class="docutils literal"><span class="pre">wk-server</span></code> thread without error messages</a></li>
    <li><a class="reference internal" href="#unresponsive-wk-gateway-thread-without-error-messages" id="id32">Unresponsive <code class="docutils literal"><span class="pre">wk-gateway</span></code> thread without error messages</a></li>
    </ul>
    </div>
    <div class="section" id="general-troubleshooting-steps">
    <h2><a class="toc-backref" href="#id20">General troubleshooting steps</a><a class="headerlink" href="#general-troubleshooting-steps" title="Permalink to this headline">¶</a></h2>
    <ol class="arabic simple">
    <li>Clear browser cookies. When you change the AEN configuration
    or upgrade AEN, cookies remaining in the browser can cause
    issues. Clearing cookies and logging in again can help to
    resolve problems.</li>
    <li><a class="reference internal" href="sys-mgmt/verify-nginx-mongodb.html"><span class="doc">Make sure NGINX and MongoDB are running</span></a>.</li>
    <li>Make sure that AEN services are <a class="reference internal" href="sys-mgmt/manage-services.html#verify-services-start-at-boot"><span class="std std-ref">set to start at boot</span></a>, on all nodes.</li>
    <li><a class="reference internal" href="sys-mgmt/manage-services.html"><span class="doc">Make sure that services are running</span></a> as expected. If any services are
    not running or are missing, <a class="reference internal" href="sys-mgmt/manage-services.html#restart-services"><span class="std std-ref">restart them</span></a>.</li>
    <li><a class="reference internal" href="sys-mgmt/manage-services.html#identify-extra-services"><span class="std std-ref">Check for and remove extraneous processes</span></a>.</li>
    <li><a class="reference internal" href="sys-mgmt/check-node-connections.html"><span class="doc">Check the connectivity between nodes</span></a>.</li>
    <li><a class="reference internal" href="install/config/use-config-files.html#check-config-syntax"><span class="std std-ref">Check the configuration file syntax</span></a>.</li>
    <li><a class="reference internal" href="user-mgmt/manage-permissions.html"><span class="doc">Check file ownership</span></a>.</li>
    <li><a class="reference internal" href="user-mgmt/manage-permissions.html"><span class="doc">Verify that POSIX ACLs are enabled</span></a>.</li>
    </ol>
    </div>
    <div class="section" id="browser-error-too-many-redirects">
    <h2><a class="toc-backref" href="#id21">Browser error: too many redirects</a><a class="headerlink" href="#browser-error-too-many-redirects" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="cause">
    <h3>Cause<a class="headerlink" href="#cause" title="Permalink to this headline">¶</a></h3>
    <p>Browser cookies are out of date.</p>
    </div>
    <div class="section" id="solution">
    <h3>Solution<a class="headerlink" href="#solution" title="Permalink to this headline">¶</a></h3>
    <ol class="arabic simple">
    <li>Log out.</li>
    <li>Clear the browser&#8217;s cookies.</li>
    <li>Clear the browser cache.</li>
    <li>Log in.</li>
    </ol>
    </div>
    </div>
    <div class="section" id="error-unix-opt-wakari-wakari-server-etc-supervisor-sock-no-such-file">
    <h2><a class="toc-backref" href="#id22">Error: unix:////opt/wakari/wakari-server/etc/supervisor.sock no such file</a><a class="headerlink" href="#error-unix-opt-wakari-wakari-server-etc-supervisor-sock-no-such-file" title="Permalink to this headline">¶</a></h2>
    <p>This is a supervisorctl error.</p>
    <div class="section" id="id1">
    <h3>Cause<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3>
    <p>supervisord is not running on the Server.</p>
    </div>
    <div class="section" id="id2">
    <h3>Solution<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h3>
    <p>Ensure that supervisord is included in the crontab. Then restart
    supervisord manually.</p>
    </div>
    </div>
    <div class="section" id="error-data-center-not-found-when-deleting-a-project">
    <h2><a class="toc-backref" href="#id23">Error: &#8220;Data Center Not Found&#8221; when deleting a project</a><a class="headerlink" href="#error-data-center-not-found-when-deleting-a-project" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="id3">
    <h3>Cause<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h3>
    <p>The data center has been removed.</p>
    </div>
    <div class="section" id="id4">
    <h3>Solution<a class="headerlink" href="#id4" title="Permalink to this headline">¶</a></h3>
    <p>As root, run:</p>
    <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">/</span><span class="n">opt</span><span class="o">/</span><span class="n">wakari</span><span class="o">/</span><span class="n">wakari</span><span class="o">-</span><span class="n">server</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">wk</span><span class="o">-</span><span class="n">server</span><span class="o">-</span><span class="n">admin</span> <span class="n">remove</span><span class="o">-</span><span class="n">project</span> <span class="o">--</span><span class="n">db</span><span class="o">-</span><span class="n">only</span> <span class="o">&lt;</span><span class="n">user</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="n">project</span><span class="o">&gt;</span>
    </pre></div>
    </div>
    </div>
    </div>
    <div class="section" id="forgotten-administrator-password">
    <h2><a class="toc-backref" href="#id24">Forgotten administrator password</a><a class="headerlink" href="#forgotten-administrator-password" title="Permalink to this headline">¶</a></h2>
    <ol class="arabic">
    <li><p class="first">Use ssh to log into the server as root.</p>
    </li>
    <li><p class="first">Run:</p>
    <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">/</span><span class="n">opt</span><span class="o">/</span><span class="n">wakari</span><span class="o">/</span><span class="n">wakari</span><span class="o">-</span><span class="n">server</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">wk</span><span class="o">-</span><span class="n">server</span><span class="o">-</span><span class="n">admin</span> <span class="n">reset</span><span class="o">-</span><span class="n">password</span> <span class="o">-</span><span class="n">u</span> <span class="n">SOME_USER</span> <span class="o">-</span><span class="n">p</span> <span class="n">SOME_PASSWORD</span>
    </pre></div>
    </div>
    <p>NOTE: Replace SOME_USER with the administrator username and SOME_PASSWORD with the password.</p>
    </li>
    <li><p class="first">Log into AEN as the administrator user with the new password.</p>
    </li>
    </ol>
    <p>Alternatively you may add an administrator user:</p>
    <ol class="arabic">
    <li><p class="first">Use ssh to log into the server as root.</p>
    </li>
    <li><p class="first">Run:</p>
    <div class="highlight-default"><div class="highlight"><pre><span></span><span class="o">/</span><span class="n">opt</span><span class="o">/</span><span class="n">wakari</span><span class="o">/</span><span class="n">wakari</span><span class="o">-</span><span class="n">server</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">wk</span><span class="o">-</span><span class="n">server</span><span class="o">-</span><span class="n">admin</span> <span class="n">add</span><span class="o">-</span><span class="n">user</span> <span class="n">SOME_USER</span> <span class="o">--</span><span class="n">admin</span> <span class="o">-</span><span class="n">p</span> <span class="n">SOME_PASSWORD</span> <span class="o">-</span><span class="n">e</span> <span class="n">YOUR_EMAIL</span>
    </pre></div>
    </div>
    <p>NOTE: Replace SOME_USER with the username, replace SOME_PASSWORD with the password, and replace YOUR_EMAIL with your email address.</p>
    </li>
    <li><p class="first">Log into AEN as the administrator user with the new password.</p>
    </li>
    </ol>
    </div>
    <div class="section" id="log-files-being-deleted">
    <h2><a class="toc-backref" href="#id25">Log files being deleted</a><a class="headerlink" href="#log-files-being-deleted" title="Permalink to this headline">¶</a></h2>
    <p>Log files are being deleted.</p>
    <p>NOTE: Locations of AEN log files for each process and application
    are shown in the node sections in <a class="reference internal" href="concepts.html"><span class="doc">Concepts</span></a>.</p>
    <div class="section" id="id5">
    <h3>Cause<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h3>
    <p>AEN installers log into
    <code class="docutils literal"><span class="pre">/tmp/wakari\_{server,gateway,compute}.log</span></code>. If the log files
    grow too large, they might be deleted.</p>
    </div>
    <div class="section" id="id6">
    <h3>Solution<a class="headerlink" href="#id6" title="Permalink to this headline">¶</a></h3>
    <p>To set the logs to be more or less verbose, Jupyter Notebooks
    uses <a class="reference external" href="http://jupyter-notebook.readthedocs.io/en/latest/config.html">Application.log_level</a>.</p>
    <p>To make the logs less verbose than the default, but still
    informative, set Application.log_level to ERROR.</p>
    </div>
    </div>
    <div class="section" id="error-this-socket-is-closed">
    <h2><a class="toc-backref" href="#id26">Error: This socket is closed</a><a class="headerlink" href="#error-this-socket-is-closed" title="Permalink to this headline">¶</a></h2>
    <p>You receive the &#8220;This socket is closed&#8221; error message when you
    try to start an application.</p>
    <div class="section" id="id7">
    <h3>Cause<a class="headerlink" href="#id7" title="Permalink to this headline">¶</a></h3>
    <p>When the supervisord process is killed, information sent to the
    standard output <code class="docutils literal"><span class="pre">stdout</span></code> and the standard error <code class="docutils literal"><span class="pre">stderr</span></code> is
    held in a pipe that will eventually fill up.</p>
    <p>Once full, attempting to start any application will cause the
    &#8220;This socket is closed&#8221; error.</p>
    </div>
    <div class="section" id="id8">
    <h3>Solution<a class="headerlink" href="#id8" title="Permalink to this headline">¶</a></h3>
    <p>To prevent this issue:</p>
    <ul class="simple">
    <li>Follow the instructions in <a class="reference internal" href="sys-mgmt/manage-services.html"><span class="doc">Managing services</span></a> to
    stop and restart processes.</li>
    <li>Do not stop or kill supervisord without first stopping
    wk-compute and any other processes that use it.</li>
    </ul>
    <p>To resolve the &#8220;This socket is closed&#8221; error:</p>
    <ol class="arabic">
    <li><p class="first">Stop wk-compute by running <code class="docutils literal"><span class="pre">sudo</span> <span class="pre">kill</span> <span class="pre">-9</span></code>.</p>
    </li>
    <li><p class="first">Restart the supervisord and wk-compute processes:</p>
    <div class="highlight-bash"><div class="highlight"><pre><span></span>sudo /etc/init.d/wakari-compute stop
    sudo /etc/init.d/wakari-compute start
    </pre></div>
    </div>
    </li>
    </ol>
    </div>
    </div>
    <div class="section" id="service-error-502-cannot-connect-to-the-application-manager">
    <h2><a class="toc-backref" href="#id27">Service error 502: Cannot connect to the application manager</a><a class="headerlink" href="#service-error-502-cannot-connect-to-the-application-manager" title="Permalink to this headline">¶</a></h2>
    <p>Gateway node displays &#8220;Service Error 502: Can not connect
    to the application manager.&#8221;</p>
    <div class="section" id="id9">
    <h3>Cause<a class="headerlink" href="#id9" title="Permalink to this headline">¶</a></h3>
    <p>A compute node is not responding because the wk-compute process
    has stopped.</p>
    </div>
    <div class="section" id="id10">
    <h3>Solution<a class="headerlink" href="#id10" title="Permalink to this headline">¶</a></h3>
    <p>Stop and then restart the supervisord and wk-compute processes:</p>
    <div class="highlight-bash"><div class="highlight"><pre><span></span>sudo /etc/init.d/wakari-compute stop
    sudo /etc/init.d/wakari-compute start
    </pre></div>
    </div>
    </div>
    </div>
    <div class="section" id="communication-error-on-amazon-web-services-aws">
    <h2><a class="toc-backref" href="#id28">502 communication error on Amazon web services (AWS)</a><a class="headerlink" href="#communication-error-on-amazon-web-services-aws" title="Permalink to this headline">¶</a></h2>
    <p>You receive the &#8220;502 Communication Error: This gateway could not
    communicate with the Wakari server&#8221; error message.</p>
    <div class="section" id="id11">
    <h3>Cause<a class="headerlink" href="#id11" title="Permalink to this headline">¶</a></h3>
    <p>An AEN gateway cannot communicate with the Wakari server on
    AWS. There may be an issue with the IP address of
    the Wakari server.</p>
    </div>
    <div class="section" id="id12">
    <h3>Solution<a class="headerlink" href="#id12" title="Permalink to this headline">¶</a></h3>
    <p>Configure your AEN gateway to use the DNS hostname of the server.
    On AWS this is the DNS hostname of the Amazon Elastic Compute
    Cloud (EC2) instance.</p>
    </div>
    </div>
    <div class="section" id="invalid-username">
    <h2><a class="toc-backref" href="#id29">Invalid username</a><a class="headerlink" href="#invalid-username" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="id13">
    <h3>Cause<a class="headerlink" href="#id13" title="Permalink to this headline">¶</a></h3>
    <p>The username does not follow 1 or more of these rules:</p>
    <ul class="simple">
    <li>Must be at least 3 characters and no more than 25 characters.</li>
    <li>The first character must be a letter (A-Z) or a digit (0-9).</li>
    <li>Other characters can be a letter, digit, period (.),
    underscore (_) or hyphen (-).</li>
    <li>The <a class="reference external" href="http://serverfault.com/a/578264/117528">POSIX standard</a> specifies that these
    characters are the portable filename character set, and that
    portable usernames have the same character set.</li>
    </ul>
    </div>
    <div class="section" id="id14">
    <h3>Solution<a class="headerlink" href="#id14" title="Permalink to this headline">¶</a></h3>
    <p>Follow the above rules for usernames.</p>
    </div>
    </div>
    <div class="section" id="notebook-error-cannot-download-notebook-as-pdf-via-latex">
    <h2><a class="toc-backref" href="#id30">Notebook Error: Cannot download notebook as PDF via LaTeX</a><a class="headerlink" href="#notebook-error-cannot-download-notebook-as-pdf-via-latex" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="id15">
    <h3>Cause<a class="headerlink" href="#id15" title="Permalink to this headline">¶</a></h3>
    <p>LaTeX is not properly installed.</p>
    </div>
    <div class="section" id="centos-6-solution">
    <h3>CentOS/6 Solution<a class="headerlink" href="#centos-6-solution" title="Permalink to this headline">¶</a></h3>
    <ol class="arabic">
    <li><p class="first">Install TeXLive from the <a class="reference external" href="https://www.tug.org/texlive/quickinstall.html">TUG site</a>.
    Follow the described steps. The installation may take some time.</p>
    </li>
    <li><p class="first">Add the installation to the <code class="docutils literal"><span class="pre">PATH</span></code> in the file
    <code class="docutils literal"><span class="pre">/etc/profile.d/latex.sh</span></code>. Add the following, replacing the year and architecture as needed:</p>
    <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="nv">PATH</span><span class="o">=</span>/usr/local/texlive/2017/bin/x86_64-linux:<span class="nv">$PATH</span>
    </pre></div>
    </div>
    </li>
    <li><p class="first">Restart the compute node.</p>
    </li>
    </ol>
    </div>
    <div class="section" id="centos-7-solution">
    <h3>CentOS/7 Solution<a class="headerlink" href="#centos-7-solution" title="Permalink to this headline">¶</a></h3>
    <ol class="arabic">
    <li><p class="first">Install the missing packages running the command:</p>
    <div class="highlight-bash"><div class="highlight"><pre><span></span>yum install texlive texlive-xetex texlive-xetexconfig texlive-xetex-def texlive-adjustbox texlive-upquote texlive-ulem
    </pre></div>
    </div>
    </li>
    </ol>
    </div>
    </div>
    <div class="section" id="unresponsive-wk-server-thread-without-error-messages">
    <h2><a class="toc-backref" href="#id31">Unresponsive <code class="docutils literal"><span class="pre">wk-server</span></code> thread without error messages</a><a class="headerlink" href="#unresponsive-wk-server-thread-without-error-messages" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="id16">
    <h3>Cause<a class="headerlink" href="#id16" title="Permalink to this headline">¶</a></h3>
    <p>Two things can cause the <code class="docutils literal"><span class="pre">wk-server</span></code> thread to freeze without error messages:</p>
    <ul class="simple">
    <li>LDAP freezing</li>
    <li>MongoDB freezing</li>
    </ul>
    <p>If LDAP or MongoDB are configured with a long timeout, Gunicorn can time out first and kill the
    LDAP or MongoDB process. Then the LDAP or MongoDB process dies without logging a timeout error.</p>
    </div>
    <div class="section" id="id17">
    <h3>Solution<a class="headerlink" href="#id17" title="Permalink to this headline">¶</a></h3>
    <ol class="arabic simple">
    <li>Check for frozen LDAP or MongoDB server processes.</li>
    <li>You may also wish to configure the Gunicorn timeout to more than 30 seconds.</li>
    </ol>
    </div>
    </div>
    <div class="section" id="unresponsive-wk-gateway-thread-without-error-messages">
    <h2><a class="toc-backref" href="#id32">Unresponsive <code class="docutils literal"><span class="pre">wk-gateway</span></code> thread without error messages</a><a class="headerlink" href="#unresponsive-wk-gateway-thread-without-error-messages" title="Permalink to this headline">¶</a></h2>
    <div class="section" id="id18">
    <h3>Cause<a class="headerlink" href="#id18" title="Permalink to this headline">¶</a></h3>
    <p>If TLS is configured with a passphrase protected private key,
    <code class="docutils literal"><span class="pre">wk-gateway</span></code> will freeze without any error messages.</p>
    </div>
    <div class="section" id="id19">
    <h3>Solution<a class="headerlink" href="#id19" title="Permalink to this headline">¶</a></h3>
    <p>Update the TLS configuration so that it does not use a
    passphrase protected private key.</p>
    </div>
    </div>
