Metadata-Version: 2.4
Name: gb-io
Version: 0.3.8
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
License-File: COPYING
Summary: A Python interface to gb-io, a fast GenBank parser and serializer written in Rust.
Keywords: genbank,parser,sequence,record
Author-email: Martin Larralde <martin.larralde@embl.de>
Requires-Python: >=3.7
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/althonos/gb-io.py/
Project-URL: Bug Tracker, https://github.com/althonos/gb-io.py/issues
Project-URL: Changelog, https://github.com/althonos/gb-io.py/blob/master/CHANGELOG.md
Project-URL: Coverage, https://codecov.io/gh/althonos/gb-io.py/
Project-URL: Documentation, https://gb-io.readthedocs.io/
Project-URL: Builds, https://github.com/althonos/gb-io.py/actions/
Project-URL: PyPI, https://pypi.org/project/gb-io
Project-URL: PiWheels, https://www.piwheels.org/project/gb-io/

# 🧬🏦 `gb-io.py` [![Stars](https://img.shields.io/github/stars/althonos/gb-io.py.svg?style=social&maxAge=3600&label=Star)](https://github.com/althonos/gb-io.py/stargazers)

*A Python interface to [`gb-io`], a fast [GenBank] parser and serializer written in [Rust].*

[`gb-io`]: https://crates.io/crates/gb-io
[GenBank]: https://www.ncbi.nlm.nih.gov/genbank/
[Rust]: https://www.rust-lang.org/

[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/gb-io.py/test.yml?branch=main&logo=github&style=flat-square&maxAge=300)](https://github.com/althonos/gb-io.py/actions)
[![Coverage](https://img.shields.io/codecov/c/gh/althonos/gb-io.py?style=flat-square&maxAge=3600)](https://codecov.io/gh/althonos/gb-io.py/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square&maxAge=2678400)](https://choosealicense.com/licenses/mit/)
[![PyPI](https://img.shields.io/pypi/v/gb-io.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/gb-io)
[![Bioconda](https://img.shields.io/conda/vn/bioconda/gb-io?style=flat-square&maxAge=3600)](https://anaconda.org/bioconda/gb-io)
[![AUR](https://img.shields.io/aur/version/python-gb-io?logo=archlinux&style=flat-square&maxAge=3600)](https://aur.archlinux.org/packages/python-gb-io)
[![Wheel](https://img.shields.io/pypi/wheel/gb-io.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/gb-io/#files)
[![Python Versions](https://img.shields.io/pypi/pyversions/gb-io.svg?style=flat-square&maxAge=3600)](https://pypi.org/project/gb-io/#files)
[![Python Implementations](https://img.shields.io/pypi/implementation/gb-io?style=flat-square&maxAge=3600&label=impl)](https://pypi.org/project/gb-io/#files)
[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/gb-io.py/)
[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400)](https://git.embl.de/larralde/gb-io.py/)
[![GitHub issues](https://img.shields.io/github/issues/althonos/gb-io.py.svg?style=flat-square&maxAge=600)](https://github.com/althonos/gb-io.py/issues)
[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square)](https://github.com/althonos/gb-io.py/blob/master/CHANGELOG.md)
[![Downloads](https://img.shields.io/pypi/dm/gb-io?style=flat-square&color=303f9f&maxAge=86400&label=downloads)](https://pepy.tech/project/gb-io)
[![Docs](https://img.shields.io/readthedocs/gb-io/latest?style=flat-square&maxAge=600)](https://gb-io.readthedocs.io)

## 🗺️ Overview

`gb-io.py` is a Python package that provides an interface to `gb-io`, a very
fast GenBank format parser implemented in Rust by [David Leslie](https://github.com/dlesl). 
It can reach much higher speed than the [Biopython](http://biopython.org/) or
the [scikit-bio](http://scikit-bio.org/) parsers.

This library has no external dependency and is available for all modern Python
versions (3.7+).

To improve performance, the library implements a *copy-on-access* pattern, 
so that data is only copied on the Python heap when it is actually being 
accessed, rather than on object creation. For instance, if the consumer
of the parser only requires the GenBank features and not the record sequence,
the sequence will not be copied to a Python `bytes` object.

## 🔧 Installing

Install the `gb-io` package directly from [PyPi](https://pypi.org/project/gb-io)
which hosts pre-compiled wheels that can be installed with `pip`:
```console
$ pip install gb-io
```

Wheels are provided for common platforms, such as x86-64 Linux, Windows and 
MacOS, as well as Aarch64 Linux and MacOS.  If no wheel is available, the source 
distribution will be downloaded, and a local copy of the Rust compiler will be 
downloaded to build the package, unless it is already installed on the host machine.

## 📖 Documentation

A complete [API reference](https://gb-io.readthedocs.io/en/stable/api.html)
can be found in the [online documentation](https://gb-io.readthedocs.io/),
or directly from the command line using
[`pydoc`](https://docs.python.org/3/library/pydoc.html):
```console
$ pydoc gb_io
```

## 💡 Usage

Use the `gb_io.load` function to obtain a list of all GenBank records in a file:
```python
records = gb_io.load("tests/data/AY048670.1.gb")
```

Reading from a file-like object is supported as well, both in text and
binary mode:
```python
with open("tests/data/AY048670.1.gb") as file:
    records = gb_io.load(file)
```

It is also possible to iterate over each record in the file without having
to load the entirety of the file contents to memory with the `gb_io.iter`
method, which returns an iterator instead of a list:
```python
for record in gb_io.iter("tests/data/AY048670.1.gb"):
    print(record.name, record.sequence[:10])
```

You can use the `gb_io.dump` method to write one or more records to a file
(either given as a path, or a file-like handle):
```python
with open("tests/data/AY048670.1.gb", "wb") as file:
    gb_io.dump(records, file)
```

## 📝 Example

The following small script will extract all the CDS features from a GenBank
file, and write them in FASTA format to an output file:
```python
import gb_io

with open("tests/data/AY048670.1.faa", "w") as dst:
    for record in gb_io.iter("tests/data/AY048670.1.gb"):
        for feature in filter(lambda feat: feat.kind == "CDS", record.features):
            qualifiers = {q.key:q.value for q in feature.qualifiers}
            dst.write(">{}\n".format(qualifiers["protein_id"][0]))
            dst.write("{}\n".format(qualifiers["translation"][0]))
```

Compared to similar implementations using `Bio.SeqIO.parse`, `Bio.GenBank.parse`
and `Bio.GenBank.Scanner.GenBankScanner.parse_cds_features`, the performance is
the following:

|               | `gb_io.iter`  | `GenBankScanner` | `GenBank.parse` | `SeqIO.parse` |
| ------------- | ------------- | ---------------- | --------------- | ------------- |
| Time (s)      | **2.264**     | 7.982            | 15.259          | 19.351        |
| Speed (MiB/s) | **136.5**     | 37.1             | 20.5            | 16.2          |
| Speedup       | **x8.55**     | x2.42            | x1.27           | -             |



## 💭 Feedback

### ⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the [GitHub issue
tracker](https://github.com/althonos/gb-io.py/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.

### 🏗️ Contributing

Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/gb-io.py/blob/main/CONTRIBUTING.md)
for more details.

## ⚖️ License

This library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).
The `gb-io` Rust crate package was written by [David Leslie](https://github.com/dlesl)
and is licensed under the terms of the [MIT License](https://choosealicense.com/licenses/mit/).
This package vendors the source of several additional packages that are
licensed under the [Apache-2.0](https://choosealicense.com/licenses/apache-2.0/),
[MIT](https://choosealicense.com/licenses/mit/) or
[BSD-3-Clause](https://choosealicense.com/licenses/bsd-3-clause/) licenses;
see the license file distributed with the source copy of each vendored
dependency for more information.

*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `gb-io` authors](https://github.com/dlesl). It was developed
by [Martin Larralde](https://github.com/althonos/) during his PhD project
at the [European Molecular Biology Laboratory](https://www.embl.de/) in
the [Zeller team](https://github.com/zellerlab).*

