Metadata-Version: 2.4
Name: DaisyBlast
Version: 0.2.0
Summary: A Python tool to find, plot, and export synteny blocks from all-vs-all BLAST.
Author-email: Erin Young <eriny@utah.gov>
License: MIT License
        
        Copyright (c) 2025 Young
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/erinyoung/daisyblast
Project-URL: Repository, https://github.com/erinyoung/daisyblast
Keywords: synteny,blast,bioinformatics,genomics,visualization
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Environment :: Console
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: biopython
Requires-Dist: matplotlib
Requires-Dist: pycirclize
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Dynamic: license-file

<div align="center">
  <img src="assets/logo.png" alt="DaisyBlast Logo" width="220">
  <h1>DaisyBlast</h1>
  
  <p>
    <strong>Multi-sample synteny detection via transitive BLAST chaining.</strong>
  </p>

  <a href="https://pypi.org/project/daisyblast/">
    <img src="https://img.shields.io/pypi/v/daisyblast?color=blue&label=pypi%20package" alt="PyPI version">
  </a>
  <a href="https://github.com/erinyoung/daisyblast/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License">
  </a>
</div>

<br>

**DaisyBlast** is a CLI tool for detecting and visualizing synteny blocks (collinear, homologous genomic segments) across multiple FASTA inputs. It performs an all-vs-all BLAST search, “shatters” alignments into non-overlapping windows, and groups them into syntenic blocks using a graph-based approach.

## The Problem
BLAST is the gold standard for pairwise nucleotide comparison (A ↔ B), but analyzing multiple samples requires a broader view. 

DaisyBlast **daisy-chains** these isolated hits using a Union-Find graph algorithm to enforce transitivity. If **A** aligns with **B**, and **B** aligns with **C**, DaisyBlast unifies all three into a single **Synteny Group**. This enables the visualization of conserved structure across *n* inputs, moving beyond simple pairwise limitations.

## Features

- **Automated Pipeline**
  Run BLAST, parse hits, identify synteny groups, and generate plots—all from one command.

- **Graph-Based Grouping**
  Union-Find logic chains collinear hits into multi-sample synteny blocks.

- **Comprehensive Visualizations**
  - **Synteny maps:** Linear and Circos-style circular plots.
  - **Dotplots:** Pairwise and combined alignment geometry.
  - **Coverage summaries:** NCBI-style stacked alignments.

- **Robust to Fragmentation**
  Uses a “shattering” algorithm to create clean, non-overlapping windows from complex, overlapping BLAST outputs.

---

## Installation

### Prerequisites
- Python ≥ 3.8
- NCBI BLAST+ (`makeblastdb` and `blastn` must be in your PATH)

### Option 1: Install from PyPI (Recommended)

```bash
pip install daisyblast
```

### Option 2: Install from source
```bash
git clone https://github.com/erinyoung/daisyblast.git
cd daisyblast
pip install .
```

### Verify Installation
```bash
daisyblast --help
```

This should show the help menu.

```bash
usage: daisyblast [-h] -i INPUT [INPUT ...] [-o OUTPUT_DIR] [-e EVALUE] [--min_pident MIN_PIDENT] [--min_length MIN_LENGTH] [-n NUM_GROUPS]

DaisyBlast: A tool to find and visualize synteny blocks from a single multi-FASTA file.

options:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
                        One or more input FASTA files (e.g., contig1.fa contig2.fa).
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to save output .bed and .png files. (Default: daisyblast_results)
  -e EVALUE, --evalue EVALUE
                        E-value cutoff for the self-BLAST search. (Default: 1e-10)
  --min_pident MIN_PIDENT
                        Minimum percent identity for a BLAST hit. (Default: 90.0)
  --min_length MIN_LENGTH
                        Minimum alignment length *after* splitting hits. (Default: 200)
  -n NUM_GROUPS, --num_groups NUM_GROUPS
                        Maximum number of groups in final bedfile (Default: 20)
```
---

# Usage

```bash
daisyblast -i data/contig1.fasta data/contig2.fasta -o results_dir
```

## Quick Start
You can test the installation using the sample data provided in the repository:

```bash
# Run on included test files
daisyblast -i tests/data/test_1.fasta tests/data/test_2.fasta -o test_results
```

---

## Output Overview

### 1. Synteny Maps (Grouped Blocks)
High-level views of conserved regions. Each color corresponds to a unique Synteny Group shared across sequences.

* **Circular Plot:** The query sequence forms the outer ring; colored blocks indicate a blast hit shared by two or more input sequences.
  
  ![Circular Image with Synteny Groups](assets/test_3.fasta___3_circular.png)

* **Linear Map:** Synteny blocks plotted along genomic coordinates.
  
  ![Linear Image with Synteny Groups](assets/test_3.fasta___3_linear.png)

### 2. Alignment Geometry (Dotplots)
Visualizes raw BLAST hits before grouping. Use these to detect inversions (downward diagonals) or indels (gaps).

* **Combined dotplots:** All subjects vs. one query on a single plot.
![Combined Dot Plot](assets/Combined_test_3.fasta_3.png)

* **Pairwise dotplots:** One panel per sequence pair.
![Pairwise Dot Plot](assets/Pair_test_3.fasta_3_vs_test_5.fasta_2.png)


### 3. Coverage Summaries
NCBI-style stacked bar charts showing alignment depth and scoring.
* **Red/Pink:** High-scoring hits (>80 bitscore)
* **Blue/Black:** Lower scoring hits

![Combined Blast Hits](assets/Summary_test_3.fasta___3.png)

### 4. Data Files

| File | Description |
|------|-------------|
| `final_groups.txt` | Final synteny group assignments (`Group_ID Sequence Start End`) |
| `divided.bed` | Shattered genomic windows used in analysis |
| `blast_hits.txt` | Raw BLAST format 6 output |
| `trimmed_blast.tsv` | BLAST hits trimmed to window boundaries |

---

## How It Works

| Step | Task | Reason |
|:----:|:-----|:-------|
| **1** | **Rename** | All headers are adjusted to `${filename}__${original_header}` to ensure unique IDs and prevent collisions when using multiple input files. |
| **2** | **BLAST** | Performs an all-vs-all `blastn` search across all input FASTA sequences. |
| **3** | **Shatter** | Parses BLAST hits and breaks genomes into discrete, non-overlapping windows. <br> *Why?* If Overlap(A,B) and Overlap(B,C) differ in size, shattering creates a common denominator window to allow clean comparison. |
| **4** | **Trim** | Crops each BLAST alignment so it fits strictly within its corresponding shattered window. |
| **5** | **Group** | Applies a Union-Find graph algorithm to chain windows into synteny blocks. <br> *Logic:* If A ↔ B and B ↔ C, DaisyBlast groups A, B, and C together. |
| **6** | **Visualize** | Generates synteny maps, dotplots, and coverage summaries using `matplotlib` and `pycirclize`. |

---

## AI Attribution
Please note that portions of this codebase were written with the assistance of **Google Gemini** to accelerate development. The package logo was also AI-generated using Gemini's image creation tools.

## Citation
If you use DaisyBlast in your research, please cite:

> **DaisyBlast: Multi-sample synteny detection via transitive BLAST chaining.**
> GitHub repository: https://github.com/erinyoung/daisyblast

## License
Distributed under the MIT License. See `LICENSE` for more information.
