Metadata-Version: 2.4
Name: PPanGGOLiN
Version: 2.2.5
Summary: Pangenome analysis suite
Author: Guillaume Gautreau, Adelme Bazin, Jérôme Arnoux, Jean Mainguy
Maintainer: Guillaume Gautreau, Adelme Bazin, Jérôme Arnoux, Jean Mainguy
License: 
          CeCILL FREE SOFTWARE LICENSE AGREEMENT
        
        Version 2.1 dated 2013-06-21
        
        
            Notice
        
        This Agreement is a Free Software license agreement that is the result
        of discussions between its authors in order to ensure compliance with
        the two main principles guiding its drafting:
        
          * firstly, compliance with the principles governing the distribution
            of Free Software: access to source code, broad rights granted to users,
          * secondly, the election of a governing law, French law, with which it
            is conformant, both as regards the law of torts and intellectual
            property law, and the protection that it offers to both authors and
            holders of the economic rights over software.
        
        The authors of the CeCILL (for Ce[a] C[nrs] I[nria] L[ogiciel] L[ibre]) 
        license are: 
        
        Commissariat à l'énergie atomique et aux énergies alternatives - CEA, a
        public scientific, technical and industrial research establishment,
        having its principal place of business at 25 rue Leblanc, immeuble Le
        Ponant D, 75015 Paris, France.
        
        Centre National de la Recherche Scientifique - CNRS, a public scientific
        and technological establishment, having its principal place of business
        at 3 rue Michel-Ange, 75794 Paris cedex 16, France.
        
        Institut National de Recherche en Informatique et en Automatique -
        Inria, a public scientific and technological establishment, having its
        principal place of business at Domaine de Voluceau, Rocquencourt, BP
        105, 78153 Le Chesnay cedex, France.
        
        
            Preamble
        
        The purpose of this Free Software license agreement is to grant users
        the right to modify and redistribute the software governed by this
        license within the framework of an open source distribution model.
        
        The exercising of this right is conditional upon certain obligations for
        users so as to preserve this status for all subsequent redistributions.
        
        In consideration of access to the source code and the rights to copy,
        modify and redistribute granted by the license, users are provided only
        with a limited warranty and the software's author, the holder of the
        economic rights, and the successive licensors only have limited liability.
        
        In this respect, the risks associated with loading, using, modifying
        and/or developing or reproducing the software by the user are brought to
        the user's attention, given its Free Software status, which may make it
        complicated to use, with the result that its use is reserved for
        developers and experienced professionals having in-depth computer
        knowledge. Users are therefore encouraged to load and test the
        suitability of the software as regards their requirements in conditions
        enabling the security of their systems and/or data to be ensured and,
        more generally, to use and operate it in the same conditions of
        security. This Agreement may be freely reproduced and published,
        provided it is not altered, and that no provisions are either added or
        removed herefrom.
        
        This Agreement may apply to any or all software for which the holder of
        the economic rights decides to submit the use thereof to its provisions.
        
        Frequently asked questions can be found on the official website of the
        CeCILL licenses family (http://www.cecill.info/index.en.html) for any 
        necessary clarification.
        
        
            Article 1 - DEFINITIONS
        
        For the purpose of this Agreement, when the following expressions
        commence with a capital letter, they shall have the following meaning:
        
        Agreement: means this license agreement, and its possible subsequent
        versions and annexes.
        
        Software: means the software in its Object Code and/or Source Code form
        and, where applicable, its documentation, "as is" when the Licensee
        accepts the Agreement.
        
        Initial Software: means the Software in its Source Code and possibly its
        Object Code form and, where applicable, its documentation, "as is" when
        it is first distributed under the terms and conditions of the Agreement.
        
        Modified Software: means the Software modified by at least one
        Contribution.
        
        Source Code: means all the Software's instructions and program lines to
        which access is required so as to modify the Software.
        
        Object Code: means the binary files originating from the compilation of
        the Source Code.
        
        Holder: means the holder(s) of the economic rights over the Initial
        Software.
        
        Licensee: means the Software user(s) having accepted the Agreement.
        
        Contributor: means a Licensee having made at least one Contribution.
        
        Licensor: means the Holder, or any other individual or legal entity, who
        distributes the Software under the Agreement.
        
        Contribution: means any or all modifications, corrections, translations,
        adaptations and/or new functions integrated into the Software by any or
        all Contributors, as well as any or all Internal Modules.
        
        Module: means a set of sources files including their documentation that
        enables supplementary functions or services in addition to those offered
        by the Software.
        
        External Module: means any or all Modules, not derived from the
        Software, so that this Module and the Software run in separate address
        spaces, with one calling the other when they are run.
        
        Internal Module: means any or all Module, connected to the Software so
        that they both execute in the same address space.
        
        GNU GPL: means the GNU General Public License version 2 or any
        subsequent version, as published by the Free Software Foundation Inc.
        
        GNU Affero GPL: means the GNU Affero General Public License version 3 or
        any subsequent version, as published by the Free Software Foundation Inc.
        
        EUPL: means the European Union Public License version 1.1 or any
        subsequent version, as published by the European Commission.
        
        Parties: mean both the Licensee and the Licensor.
        
        These expressions may be used both in singular and plural form.
        
        
            Article 2 - PURPOSE
        
        The purpose of the Agreement is the grant by the Licensor to the
        Licensee of a non-exclusive, transferable and worldwide license for the
        Software as set forth in Article 5 <#scope> hereinafter for the whole
        term of the protection granted by the rights over said Software.
        
        
            Article 3 - ACCEPTANCE
        
        3.1 The Licensee shall be deemed as having accepted the terms and
        conditions of this Agreement upon the occurrence of the first of the
        following events:
        
          * (i) loading the Software by any or all means, notably, by
            downloading from a remote server, or by loading from a physical medium;
          * (ii) the first time the Licensee exercises any of the rights granted
            hereunder.
        
        3.2 One copy of the Agreement, containing a notice relating to the
        characteristics of the Software, to the limited warranty, and to the
        fact that its use is restricted to experienced users has been provided
        to the Licensee prior to its acceptance as set forth in Article 3.1
        <#accepting> hereinabove, and the Licensee hereby acknowledges that it
        has read and understood it.
        
        
            Article 4 - EFFECTIVE DATE AND TERM
        
        
              4.1 EFFECTIVE DATE
        
        The Agreement shall become effective on the date when it is accepted by
        the Licensee as set forth in Article 3.1 <#accepting>.
        
        
              4.2 TERM
        
        The Agreement shall remain in force for the entire legal term of
        protection of the economic rights over the Software.
        
        
            Article 5 - SCOPE OF RIGHTS GRANTED
        
        The Licensor hereby grants to the Licensee, who accepts, the following
        rights over the Software for any or all use, and for the term of the
        Agreement, on the basis of the terms and conditions set forth hereinafter.
        
        Besides, if the Licensor owns or comes to own one or more patents
        protecting all or part of the functions of the Software or of its
        components, the Licensor undertakes not to enforce the rights granted by
        these patents against successive Licensees using, exploiting or
        modifying the Software. If these patents are transferred, the Licensor
        undertakes to have the transferees subscribe to the obligations set
        forth in this paragraph.
        
        
              5.1 RIGHT OF USE
        
        The Licensee is authorized to use the Software, without any limitation
        as to its fields of application, with it being hereinafter specified
        that this comprises:
        
         1. permanent or temporary reproduction of all or part of the Software
            by any or all means and in any or all form.
        
         2. loading, displaying, running, or storing the Software on any or all
            medium.
        
         3. entitlement to observe, study or test its operation so as to
            determine the ideas and principles behind any or all constituent
            elements of said Software. This shall apply when the Licensee
            carries out any or all loading, displaying, running, transmission or
            storage operation as regards the Software, that it is entitled to
            carry out hereunder.
        
        
              5.2 ENTITLEMENT TO MAKE CONTRIBUTIONS
        
        The right to make Contributions includes the right to translate, adapt,
        arrange, or make any or all modifications to the Software, and the right
        to reproduce the resulting software.
        
        The Licensee is authorized to make any or all Contributions to the
        Software provided that it includes an explicit notice that it is the
        author of said Contribution and indicates the date of the creation thereof.
        
        
              5.3 RIGHT OF DISTRIBUTION
        
        In particular, the right of distribution includes the right to publish,
        transmit and communicate the Software to the general public on any or
        all medium, and by any or all means, and the right to market, either in
        consideration of a fee, or free of charge, one or more copies of the
        Software by any means.
        
        The Licensee is further authorized to distribute copies of the modified
        or unmodified Software to third parties according to the terms and
        conditions set forth hereinafter.
        
        
                5.3.1 DISTRIBUTION OF SOFTWARE WITHOUT MODIFICATION
        
        The Licensee is authorized to distribute true copies of the Software in
        Source Code or Object Code form, provided that said distribution
        complies with all the provisions of the Agreement and is accompanied by:
        
         1. a copy of the Agreement,
        
         2. a notice relating to the limitation of both the Licensor's warranty
            and liability as set forth in Articles 8 and 9,
        
        and that, in the event that only the Object Code of the Software is
        redistributed, the Licensee allows effective access to the full Source
        Code of the Software for a period of at least three years from the
        distribution of the Software, it being understood that the additional
        acquisition cost of the Source Code shall not exceed the cost of the
        data transfer.
        
        
                5.3.2 DISTRIBUTION OF MODIFIED SOFTWARE
        
        When the Licensee makes a Contribution to the Software, the terms and
        conditions for the distribution of the resulting Modified Software
        become subject to all the provisions of this Agreement.
        
        The Licensee is authorized to distribute the Modified Software, in
        source code or object code form, provided that said distribution
        complies with all the provisions of the Agreement and is accompanied by:
        
         1. a copy of the Agreement,
        
         2. a notice relating to the limitation of both the Licensor's warranty
            and liability as set forth in Articles 8 and 9,
        
        and, in the event that only the object code of the Modified Software is
        redistributed,
        
         3. a note stating the conditions of effective access to the full source
            code of the Modified Software for a period of at least three years
            from the distribution of the Modified Software, it being understood
            that the additional acquisition cost of the source code shall not
            exceed the cost of the data transfer.
        
        
                5.3.3 DISTRIBUTION OF EXTERNAL MODULES
        
        When the Licensee has developed an External Module, the terms and
        conditions of this Agreement do not apply to said External Module, that
        may be distributed under a separate license agreement.
        
        
                5.3.4 COMPATIBILITY WITH OTHER LICENSES
        
        The Licensee can include a code that is subject to the provisions of one
        of the versions of the GNU GPL, GNU Affero GPL and/or EUPL in the
        Modified or unmodified Software, and distribute that entire code under
        the terms of the same version of the GNU GPL, GNU Affero GPL and/or EUPL.
        
        The Licensee can include the Modified or unmodified Software in a code
        that is subject to the provisions of one of the versions of the GNU GPL,
        GNU Affero GPL and/or EUPL and distribute that entire code under the
        terms of the same version of the GNU GPL, GNU Affero GPL and/or EUPL.
        
        
            Article 6 - INTELLECTUAL PROPERTY
        
        
              6.1 OVER THE INITIAL SOFTWARE
        
        The Holder owns the economic rights over the Initial Software. Any or
        all use of the Initial Software is subject to compliance with the terms
        and conditions under which the Holder has elected to distribute its work
        and no one shall be entitled to modify the terms and conditions for the
        distribution of said Initial Software.
        
        The Holder undertakes that the Initial Software will remain ruled at
        least by this Agreement, for the duration set forth in Article 4.2 <#term>.
        
        
              6.2 OVER THE CONTRIBUTIONS
        
        The Licensee who develops a Contribution is the owner of the
        intellectual property rights over this Contribution as defined by
        applicable law.
        
        
              6.3 OVER THE EXTERNAL MODULES
        
        The Licensee who develops an External Module is the owner of the
        intellectual property rights over this External Module as defined by
        applicable law and is free to choose the type of agreement that shall
        govern its distribution.
        
        
              6.4 JOINT PROVISIONS
        
        The Licensee expressly undertakes:
        
         1. not to remove, or modify, in any manner, the intellectual property
            notices attached to the Software;
        
         2. to reproduce said notices, in an identical manner, in the copies of
            the Software modified or not.
        
        The Licensee undertakes not to directly or indirectly infringe the
        intellectual property rights on the Software of the Holder and/or
        Contributors, and to take, where applicable, vis-à-vis its staff, any
        and all measures required to ensure respect of said intellectual
        property rights of the Holder and/or Contributors.
        
        
            Article 7 - RELATED SERVICES
        
        7.1 Under no circumstances shall the Agreement oblige the Licensor to
        provide technical assistance or maintenance services for the Software.
        
        However, the Licensor is entitled to offer this type of services. The
        terms and conditions of such technical assistance, and/or such
        maintenance, shall be set forth in a separate instrument. Only the
        Licensor offering said maintenance and/or technical assistance services
        shall incur liability therefor.
        
        7.2 Similarly, any Licensor is entitled to offer to its licensees, under
        its sole responsibility, a warranty, that shall only be binding upon
        itself, for the redistribution of the Software and/or the Modified
        Software, under terms and conditions that it is free to decide. Said
        warranty, and the financial terms and conditions of its application,
        shall be subject of a separate instrument executed between the Licensor
        and the Licensee.
        
        
            Article 8 - LIABILITY
        
        8.1 Subject to the provisions of Article 8.2, the Licensee shall be
        entitled to claim compensation for any direct loss it may have suffered
        from the Software as a result of a fault on the part of the relevant
        Licensor, subject to providing evidence thereof.
        
        8.2 The Licensor's liability is limited to the commitments made under
        this Agreement and shall not be incurred as a result of in particular:
        (i) loss due the Licensee's total or partial failure to fulfill its
        obligations, (ii) direct or consequential loss that is suffered by the
        Licensee due to the use or performance of the Software, and (iii) more
        generally, any consequential loss. In particular the Parties expressly
        agree that any or all pecuniary or business loss (i.e. loss of data,
        loss of profits, operating loss, loss of customers or orders,
        opportunity cost, any disturbance to business activities) or any or all
        legal proceedings instituted against the Licensee by a third party,
        shall constitute consequential loss and shall not provide entitlement to
        any or all compensation from the Licensor.
        
        
            Article 9 - WARRANTY
        
        9.1 The Licensee acknowledges that the scientific and technical
        state-of-the-art when the Software was distributed did not enable all
        possible uses to be tested and verified, nor for the presence of
        possible defects to be detected. In this respect, the Licensee's
        attention has been drawn to the risks associated with loading, using,
        modifying and/or developing and reproducing the Software which are
        reserved for experienced users.
        
        The Licensee shall be responsible for verifying, by any or all means,
        the suitability of the product for its requirements, its good working
        order, and for ensuring that it shall not cause damage to either persons
        or properties.
        
        9.2 The Licensor hereby represents, in good faith, that it is entitled
        to grant all the rights over the Software (including in particular the
        rights set forth in Article 5 <#scope>).
        
        9.3 The Licensee acknowledges that the Software is supplied "as is" by
        the Licensor without any other express or tacit warranty, other than
        that provided for in Article 9.2 <#good-faith> and, in particular,
        without any warranty as to its commercial value, its secured, safe,
        innovative or relevant nature.
        
        Specifically, the Licensor does not warrant that the Software is free
        from any error, that it will operate without interruption, that it will
        be compatible with the Licensee's own equipment and software
        configuration, nor that it will meet the Licensee's requirements.
        
        9.4 The Licensor does not either expressly or tacitly warrant that the
        Software does not infringe any third party intellectual property right
        relating to a patent, software or any other property right. Therefore,
        the Licensor disclaims any and all liability towards the Licensee
        arising out of any or all proceedings for infringement that may be
        instituted in respect of the use, modification and redistribution of the
        Software. Nevertheless, should such proceedings be instituted against
        the Licensee, the Licensor shall provide it with technical and legal
        expertise for its defense. Such technical and legal expertise shall be
        decided on a case-by-case basis between the relevant Licensor and the
        Licensee pursuant to a memorandum of understanding. The Licensor
        disclaims any and all liability as regards the Licensee's use of the
        name of the Software. No warranty is given as regards the existence of
        prior rights over the name of the Software or as regards the existence
        of a trademark.
        
        
            Article 10 - TERMINATION
        
        10.1 In the event of a breach by the Licensee of its obligations
        hereunder, the Licensor may automatically terminate this Agreement
        thirty (30) days after notice has been sent to the Licensee and has
        remained ineffective.
        
        10.2 A Licensee whose Agreement is terminated shall no longer be
        authorized to use, modify or distribute the Software. However, any
        licenses that it may have granted prior to termination of the Agreement
        shall remain valid subject to their having been granted in compliance
        with the terms and conditions hereof.
        
        
            Article 11 - MISCELLANEOUS
        
        
              11.1 EXCUSABLE EVENTS
        
        Neither Party shall be liable for any or all delay, or failure to
        perform the Agreement, that may be attributable to an event of force
        majeure, an act of God or an outside cause, such as defective
        functioning or interruptions of the electricity or telecommunications
        networks, network paralysis following a virus attack, intervention by
        government authorities, natural disasters, water damage, earthquakes,
        fire, explosions, strikes and labor unrest, war, etc.
        
        11.2 Any failure by either Party, on one or more occasions, to invoke
        one or more of the provisions hereof, shall under no circumstances be
        interpreted as being a waiver by the interested Party of its right to
        invoke said provision(s) subsequently.
        
        11.3 The Agreement cancels and replaces any or all previous agreements,
        whether written or oral, between the Parties and having the same
        purpose, and constitutes the entirety of the agreement between said
        Parties concerning said purpose. No supplement or modification to the
        terms and conditions hereof shall be effective as between the Parties
        unless it is made in writing and signed by their duly authorized
        representatives.
        
        11.4 In the event that one or more of the provisions hereof were to
        conflict with a current or future applicable act or legislative text,
        said act or legislative text shall prevail, and the Parties shall make
        the necessary amendments so as to comply with said act or legislative
        text. All other provisions shall remain effective. Similarly, invalidity
        of a provision of the Agreement, for any reason whatsoever, shall not
        cause the Agreement as a whole to be invalid.
        
        
              11.5 LANGUAGE
        
        The Agreement is drafted in both French and English and both versions
        are deemed authentic.
        
        
            Article 12 - NEW VERSIONS OF THE AGREEMENT
        
        12.1 Any person is authorized to duplicate and distribute copies of this
        Agreement.
        
        12.2 So as to ensure coherence, the wording of this Agreement is
        protected and may only be modified by the authors of the License, who
        reserve the right to periodically publish updates or new versions of the
        Agreement, each with a separate number. These subsequent versions may
        address new issues encountered by Free Software.
        
        12.3 Any Software distributed under a given version of the Agreement may
        only be subsequently distributed under the same version of the Agreement
        or a subsequent version, subject to the provisions of Article 5.3.4
        <#compatibility>.
        
        
            Article 13 - GOVERNING LAW AND JURISDICTION
        
        13.1 The Agreement is governed by French law. The Parties agree to
        endeavor to seek an amicable solution to any disagreements or disputes
        that may arise during the performance of the Agreement.
        
        13.2 Failing an amicable solution within two (2) months as from their
        occurrence, and unless emergency proceedings are necessary, the
        disagreements or disputes shall be referred to the Paris Courts having
        jurisdiction, by the more diligent Party.
        
        
Project-URL: Homepage, https://labgem.genoscope.cns.fr/2023/04/27/ppanggolin/
Project-URL: Repository, https://github.com/labgem/PPanGGOLiN/
Project-URL: Documentation, https://ppanggolin.readthedocs.io
Keywords: Pangenomics,Comparative genomics,Bioinformatics,Prokaryote
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: CEA CNRS Inria Logiciel Libre License, version 2.1 (CeCILL-2.1)
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: numpy<2.0.0,>1.24.0
Requires-Dist: pandas<3.0.0,>=2.0.0
Requires-Dist: tqdm<5.0.0,>=4.0.0
Requires-Dist: tables<4.0.0,>=3.0.0
Requires-Dist: pyrodigal<4.0.0,>=3.0.0
Requires-Dist: networkx<4.0.0,>=3.0.0
Requires-Dist: scipy<2.0.0,>=1.0.0
Requires-Dist: plotly<6.0.0,>=5.0.0
Requires-Dist: gmpy2<3.0.0,>=2.0.0
Requires-Dist: bokeh<4.0.0,>=3.0.0
Provides-Extra: doc
Requires-Dist: sphinx==6.2.1; extra == "doc"
Requires-Dist: sphinx_rtd_theme==1.2.2; extra == "doc"
Requires-Dist: readthedocs-sphinx-search==0.3.2; extra == "doc"
Requires-Dist: sphinx-autobuild==2021.3.14; extra == "doc"
Requires-Dist: myst-parser==2; extra == "doc"
Requires-Dist: docutils==0.18.1; extra == "doc"
Requires-Dist: sphinxcontrib.mermaid==0.9.2; extra == "doc"
Provides-Extra: test
Requires-Dist: pytest==7; extra == "test"
Requires-Dist: black==24.*; extra == "test"
Dynamic: license-file

# PPanGGOLiN: Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors

[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyrodigal/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/labgem/ppanggolin/actions)
[![License](https://anaconda.org/bioconda/ppanggolin/badges/license.svg)](http://www.cecill.info/licences.fr.html)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/ppanggolin?style=flat-square&maxAge=3600&logo=anaconda)](https://anaconda.org/bioconda/ppanggolin)
[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/labgem/ppanggolin/)
[![GitHub issues](https://img.shields.io/github/issues/labgem/ppanggolin.svg?style=flat-square&maxAge=600)](https://github.com/labgem/ppanggolin/issues)
[![Docs](https://img.shields.io/readthedocs/ppanggolin/latest?style=flat-square&maxAge=600)](https://ppanggolin.readthedocs.io)
[![Downloads](https://anaconda.org/bioconda/ppanggolin/badges/downloads.svg)](https://bioconda.github.io/recipes/ppanggolin/README.html#download-stats)

**PPanGGOLiN**
([Gautreau et al. 2020](https://doi.org/10.1371/journal.pcbi.1007732)) is a software suite used to create and manipulate prokaryotic pangenomes from a set of either genomic DNA sequences or provided genome annotations.
It is designed to scale up to tens of thousands of genomes.
It has the specificity to partition the pangenome using a statistical approach rather than using fixed thresholds which gives it the ability to work with low-quality data such as *Metagenomic Assembled Genomes (MAGs)* or *Single-cell Amplified Genomes (SAGs)* thus taking advantage of large scale environmental studies and letting users study the pangenome of uncultivable species.

**PPanGGOLiN** builds pangenomes through a graphical model and a statistical method to partition gene families in persistent, shell and cloud genomes.
It integrates both information on the presence/absence of protein-coding genes and their genomic neighborhood to build a graph of gene families where each node is a gene family, and each edge is a relation of genetic contiguity.
The partitioning method promotes that two gene families that are consistent neighbors in the graph are more likely to belong to the same partition.
It results in a Partitioned Pangenome Graph (PPG) made of persistent, shell and cloud nodes drawing genomes on rails like a subway map to help biologists navigate the great diversity of microbial life.


Moreover, the panRGP method ([Bazin et al. 2020](https://doi.org/10.1093/bioinformatics/btaa792)) included in **PPanGGOLiN** predicts, for each genome, Regions of Genome Plasticity (RGPs) that are clusters of genes made of shell and cloud genomes in the pangenome graph.
Most of them arise from Horizontal gene transfer (HGT) and correspond to Genomic Islands (GIs). 
RGPs from different genomes are next grouped in spots of insertion based on their conserved flanking persistent genes.


Those RGPs can be further divided in conserved modules by panModule ([Bazin et al. 2021](https://doi.org/10.1101/2021.12.06.471380)). Those conserved modules correspond to groups of cooccurring and colocalized genes that are gained or lost together in the variable regions of the pangenome.

A complete documentation is available [here](https://ppanggolin.readthedocs.io).

<!-- ![PPanGGOLiN logo](docs/_static/logo.png) -->

<!-- center the image with html syntax -->
<p align="center">
  <img src="docs/_static/logo.png" alt="logo">
</p>

# Installation

**PPanGGOLiN** can be is easily installed via conda, accessible through the bioconda channel.

To ensure a smoother installation and avoid conflicting dependencies, it's highly recommended to create a dedicated environment for PPanGGOLiN:

```bash
# Install PPanGGOLiN into a new conda environment
conda create -n ppanggolin -c defaults -c conda-forge -c bioconda ppanggolin

# Check PPanGGOLiN install
conda activate ppanggolin
ppanggolin --version
```

# Quick usage

## Run a complete pangenome analysis

A complete pangenomic analysis with PPanGGOLiN can be performed using the [`all`](https://ppanggolin.readthedocs.io/en/latest/user/QuickUsage/quickAnalyses.html#ppanggolin-complete-workflow-analyses) subcommand. This workflow runs a series of PPanGGOLiN commands to generate a **partitioned pangenome graph** with predicted **RGPs** (Regions of Genomic Plasticity), **spots** of insertion and **modules**.


Execute the following command to run the `all` workflow:

```bash
ppanggolin all --fasta GENOMES_FASTA_LIST
```

By default, it uses parameters that we have found to be generally the best for working with species pangenomes. For further customization, you can adjust some parameters directly on the command line. Alternatively, you can use a configuration file to fine-tune the parameters of each subcommand used by the workflow (see [here](https://ppanggolin.readthedocs.io/en/latest/user/practicalInformation.html#configuration-file) for more details).

### Input files

The file `GENOMES_FASTA_LIST` is a tsv-separated file with the following organization :

1. The first column contains a unique genome name **(without space)**
2. The second column contains the path to the associated FASTA file
3. Circular contig identifiers are indicated in the following columns
4. Each line represents a genome

An [example](testingDataset/genomes.fasta.list) with 50 *Chlamydia trachomatis* genomes can be found in the [testingDataset/](testingDataset/) directory.


You can also give **PPanGGOLiN** your own annotations using *.gff* or *.gbff/.gbk* files instead of *.fasta* files,
such as the ones provided by [Bakta](https://github.com/oschwengers/bakta) with the following command :

```bash
ppanggolin all --anno GENOMES_ANNOTATION_LIST
```

Another [example](testingDataset/genomes.gbff.list) of such a file can be found in the [testingDataset/](testingDataset/) directory.


A minimum of 5 genomes is generally required to perform a pangenomic analysis using the traditional *core genome*/*accessory genome* paradigm.
It is recommended to use at least 15 genomes with genomic variation (and not only SNPs) to obtain robust results with the **PPanGGOLiN** statistical approach.

### Results files

Upon executing the `all` command, multiple output files and graphics are generated  (more information [here](https://ppanggolin.readthedocs.io/en/latest/user/QuickUsage/quickAnalyses.html#usual-pangenome-outputs)). Most notably, it writes an HDF-5 file (`pangenome.h5`).
This file can be used as input to any of the subcommands to rerun parts of the analysis with different parameters,
write and draw different representations of the pangenome, or perform additional analyses with **PPanGGOLiN**.


## Other Workflow Commands

PPanGGOLiN offers additional workflow commands that perform more specialized functions:

- [**`workflow`**](https://ppanggolin.readthedocs.io/en/latest/user/PangenomeAnalyses/pangenomeAnalyses.html#workflow): Generates a partitioned pangenome graph.
- [**`panrgp`**](https://ppanggolin.readthedocs.io/en/latest/user/RGP/rgpAnalyses.html#panrgp): Combine the `workflow` command and the prediction of RGPs (Regions of Genomic Plasticity) and insertion spots on top of the partitioned pangenome graph.
- [**`panmodule`**](https://ppanggolin.readthedocs.io/en/latest/user/Modules/moduleAnalyses.html#the-panmodule-workflow): Combine the `workflow` command and the prediction of Modules on top of the partitioned pangenome graph.

These commands utilize the same type of file input as the `all` command.


# Issues, Questions, Remarks
If you have any questions or issues with installing,
using or understanding **PPanGGOLiN**, please do not hesitate to post an issue!
We cannot correct bugs if we do not know about them, and will try to help you the best we can.

# Citation
If you use this tool for your research, please cite:

> **PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph**
> Gautreau G et al. (2020)
> *PLOS Computational Biology 16(3): e1007732.*
> doi: [10.1371/journal.pcbi.1007732](https://doi.org/10.1371/journal.pcbi.1007732)


If you use this tool to study genomic islands, please cite:


> **panRGP: a pangenome-based method to predict genomic islands and explore their diversity**
> Bazin et al. (2020)
> *Bioinformatics, Volume 36, Issue Supplement_2, Pages i651–i658*
> doi: [10.1093/bioinformatics/btaa792](https://doi.org/10.1093/bioinformatics/btaa792)

If you use this tool to study modules, please cite:

> **panModule: detecting conserved modules in the variable regions of a pangenome graph**
> Bazin et al. (2021)
> *bioRxiv* 
> doi: [10.1101/2021.12.06.471380](https://doi.org/10.1101/2021.12.06.471380)
