Metadata-Version: 2.4
Name: deepchopper
Version: 1.2.9
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Requires-Dist: torch>=2.6.0
Requires-Dist: torchvision>=0.21.0
Requires-Dist: lightning>=2.1.2
Requires-Dist: torchmetrics>=1.2.1
Requires-Dist: rich>=13.7.0
Requires-Dist: transformers>=4.37.2
Requires-Dist: safetensors>=0.4.2
Requires-Dist: datasets>=3.0.0
Requires-Dist: evaluate>=0.4.3
Requires-Dist: typer>=0.12.0,<0.13.0
Requires-Dist: click>=8.1.0,<8.2.0
Requires-Dist: gradio==5.0.1
Requires-Dist: fastapi==0.112.2
Requires-Dist: scikit-learn>=1.5.2
Requires-Dist: hydra-core>=1.3.2
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: deepchopper-cli>=1.2.6
License-File: LICENSE
Summary: A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing
Keywords: deep learning,bioinformatics,rust
Home-Page: https://serde.rs
Author-email: Yangyang Li <<yangyang.li@northwestern.edu>>, Ting-you Wang <<tywang@northwestern.edu>>
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://github.com/ylab-hi/DeepChopper
Project-URL: documentation, https://github.com/ylab-hi/DeepChopper
Project-URL: repository, https://github.com/ylab-hi/DeepChopper
Project-URL: changelog, https://github.com/ylab-hi/DeepChopper/README.md

# <img src="./documentation/logo.webp" alt="logo" height="100"/> **DeepChopper** [![social](https://img.shields.io/github/stars/ylab-hi/DeepChopper?style=social)](https://github.com/ylab-hi/DeepChopper/stargazers)

[![pypi](https://img.shields.io/pypi/v/deepchopper.svg)](https://pypi.python.org/pypi/deepchopper)
[![PyPI - Wheel](https://img.shields.io/pypi/wheel/deepchopper)](https://pypi.org/project/deepchopper/#files)
[![license](https://img.shields.io/pypi/l/deepchopper.svg)](https://github.com/ylab-hi/DeepChopper/blob/main/LICENSE)
[![pypi version](https://img.shields.io/pypi/pyversions/deepchopper.svg)](https://pypi.python.org/pypi/deepbiop)
[![platform](https://img.shields.io/badge/platform-linux%20%7C%20osx%20%7C%20win-blue)](https://pypi.org/project/deepchopper/#files)
[![Actions status](https://github.com/ylab-hi/DeepChopper/actions/workflows/release-python.yml/badge.svg)](https://github.com/ylab-hi/DeepChopper/actions)
[![Space](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg)](https://huggingface.co/spaces/yangliz5/deepchopper)

<!--toc:start-->

- [ **DeepChopper** ](#-deepchopper-)
  - [🚀 Quick Start: Try DeepChopper Online](#-quick-start-try-deepchopper-online)
  - [📦 Installation](#-installation)
    - [Compatibility and Support](#compatibility-and-support)
      - [PyPI Support](#pypi-support)
  - [🛠️ Usage](#%EF%B8%8F-usage)
    - [Command-Line Interface](#command-line-interface)
    - [Python Library](#python-library)
  - [📚 Cite](#-cite)
  - [🤝 Contribution](#-contribution)
    - [Build Environment](#build-environment)
    - [Install Dependencies](#install-dependencies)
  - [📬 Support](#-support)

<!--toc:end-->

🧬 DeepChopper leverages a language model to accurately detect and chop artificial sequences that may cause chimeric reads, ensuring higher quality and more reliable sequencing results.
By integrating seamlessly with existing workflows, DeepChopper provides a robust solution for researchers and bioinformaticians working with Nanopore direct-RNA sequencing data.

📘 **FEATURED:** We provide a comprehensive tutorial that includes an example dataset in our [full documentation](./documentation/tutorial.md).

## 🚀 Quick Start: Try DeepChopper Online

Experience DeepChopper instantly through our user-friendly web interface. No installation required!
Simply click the button below to launch the web application and start exploring DeepChopper's capabilities:

[![Open in Hugging Face Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg)](https://huggingface.co/spaces/yangliz5/deepchopper)

**What you can do online:**

- 📤 Upload your sequencing data
- 🔬 Run DeepChopper's analysis
- 📊 Visualize results
- 🎛️ Experiment with different parameters

Perfect for quick tests or demonstrations! However, for extensive analyses or custom workflows, we recommend installing DeepChopper locally.

> ⚠️ Note: The online version is limited to one FASTQ record at a time and may not be suitable for large-scale projects.

## 📦 Installation

DeepChopper can be installed using pip, the Python package installer.
Follow these steps to install:

1. Ensure you have Python 3.10 or later installed on your system.

2. Create a virtual environment (recommended):

   ```bash
   python -m venv deepchopper_env
   source deepchopper_env/bin/activate  # On Windows use `deepchopper_env\Scripts\activate`
   ```

3. Install DeepChopper:

   ```bash
   pip install deepchopper
   ```

4. Verify the installation:

   ```bash
   deepchopper --help
   ```

### Compatibility and Support

DeepChopper is designed to work across various platforms and Python versions.
Below are the compatibility matrices for PyPI installations:

#### [PyPI Support][pypi]

| Python Version | Linux x86_64 | macOS Intel | macOS Apple Silicon | Windows x86_64 |
| :------------: | :----------: | :---------: | :-----------------: | :------------: |
| 3.10 | ✅ | ✅ | ✅ | ✅ |
| 3.11 | ✅ | ✅ | ✅ | ✅ |
| 3.12 | ✅ | ✅ | ✅ | ✅ |

🆘 Trouble installing? Check our [Troubleshooting Guide](https://github.com/ylab-hi/DeepChopper/blob/main/documentation/tutorial.md#troubleshooting) or [open an issue](https://github.com/ylab-hi/DeepChopper/issues).

## 🛠️ Usage

For a comprehensive guide, check out our [full tutorial](./documentation/tutorial.md).
Here's a quick overview:

### Command-Line Interface

DeepChopper offers three main commands: `encode`, `predict`, and `chop`.

1. **Encode** your input data:

   ```bash
   deepchopper encode <input.fq>
   ```

2. **Predict** chimera artifacts:

   ```bash
   deepchopper predict <input.parquet> --output predictions
   ```

   Using GPUs? Add the `--gpus` flag:

   ```bash
   deepchopper predict <input.parquet> --output predictions --gpus 2
   ```

3. **Chop** chimera artifacts:

   ```bash
   deepchopper chop <predictions> raw.fq
   ```

   **Memory Optimization:** For large datasets (>5M reads), use the `--chunk-size` parameter to control memory usage:

   ```bash
   # Low memory (~1-2GB): Slower but memory-efficient
   deepchopper chop <predictions> raw.fq --chunk-size 1000

   # Balanced (default, ~5-10GB): Good balance of speed and memory
   deepchopper chop <predictions> raw.fq --chunk-size 10000

   # High performance (~20-50GB): Fastest, requires more memory
   deepchopper chop <predictions> raw.fq --chunk-size 50000
   ```

   The chop command uses **streaming mode** to minimize memory usage. Instead of loading all reads into memory at once (which can require 100GB+ for 20M reads), it processes records in configurable chunks and writes results incrementally.

Want a GUI? Launch the web interface (note: limited to one FASTQ record at a time):

```bash
deepchopper web
```

### Python Library

Integrate DeepChopper into your Python scripts:

```python
import deepchopper

model = deepchopper.DeepChopper.from_pretrained("yangliz5/deepchopper")
# Your analysis code here
```

## 📚 Cite

If DeepChopper aids your research, please cite [our paper](https://www.biorxiv.org/content/10.1101/2024.10.23.619929v2):

```bibtex
@article {Li2024.10.23.619929,
        author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
        title = {A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing},
        elocation-id = {2024.10.23.619929},
        year = {2024},
        doi = {10.1101/2024.10.23.619929},
        publisher = {Cold Spring Harbor Laboratory},
        abstract = {Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) data can confound transcriptome analyses, yet no existing tools are capable of detecting and removing them due to limitations in basecalling models. We present DeepChopper, a genomic language model that accurately identifies and eliminates adapter sequences within base-called dRNA-seq reads, effectively removing chimeric read artifacts. DeepChopper significantly improves critical downstream analyses, including transcript annotation and gene fusion detection, enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research. Competing Interests: The authors have declared no competing interests.},
        URL = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929},
        eprint = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929.full.pdf},
        journal = {bioRxiv}
}
```

## 🤝 Contribution

We welcome contributions! Here's how to set up your development environment:

### Build Environment

```bash
git clone https://github.com/ylab-hi/DeepChopper.git
cd DeepChopper
conda env create -n environment.yaml
conda activate deepchopper
```

### Install Dependencies

```bash
pip install pipx
pipx install --suffix @master git+https://github.com/python-poetry/poetry.git@master
poetry@master install
```

🎉 Ready to contribute? Check out our [Contribution Guidelines](./CONTRIBUTING.md) to get started!

## 📬 Support

Need help? Have questions?

- 📖 Check our [Documentation](./documentation/tutorial.md)
- 🐛 [Report issues](https://github.com/ylab-hi/DeepChopper/issues)

______________________________________________________________________

DeepChopper is developed with ❤️ by the YLab team.
Happy sequencing! 🧬🔬

[pypi]: https://pypi.python.org/pypi/deepchopper

