Ver 1.3.6 (20250225) ---------------------------------------
Minor fix. dfast_results.json has been renamed as dfast_record, which is generated using the DR_tools module. (Python>=3.10, experimental implementation)

Ver 1.3.5 (20250203) ---------------------------------------
Minor fix. dfast_results.json will be generated in the result file, which contains information required for DDBJ submission (experimental implementation)

Ver 1.3.4 (20241220) ---------------------------------------
Minor fix. phone and fax are no longer required in DDBJ submission file, so they have been removed.

Ver 1.3.3 (20241007) ---------------------------------------
Minor fix. Inference will not be described when cyanobase_aa is used as the reference database.

Ver 1.3.2 (20240824) ---------------------------------------
Minor fix for DDBJ submission. ff_definition has been changed.


Ver 1.3.1 (20240530) ---------------------------------------
The "country" qualifier in the DDBJ submission file has been changed to "geo_loc_name" to be compliant to the INSDC requirements (https://www.ddbj.nig.ac.jp/news/en/2024-05-09-e.html)


Ver 1.3.0 (20240509) ---------------------------------------
Preliminary implementation for AMR/VFG annotation.
Use '--amr' to enable this function.



Ver 1.2.21 (20240208) ---------------------------------------
Adjusted to changes in the SeqFeature specifications of Biopython.


Ver 1.2.19 (20230515) ---------------------------------------
Accepts a gzip-compressed FASTA file as input.

20230404 ---------------------------------------

The reference protein data have been updated. The DFAST default reference database now consists of mainly from RefSeq WP proteins.
The latest version is "20230404".
You can update it with the following command.
dfast_file_downloader.py --protein dfast


Ver 1.2.0 (20190124) ---------------------------------------
 

# Import annotation from GFF

    Genes annotated in the GFF format can be imported with using --gff option.
    By default, CDS features are imported. Other features such as tRNA and rRNA can be imported by changing the config file.
    The sequence names in the genome FASTA file and those in the GFF files must be identical.
    If you use --gff option, --use_original_name option is automatically set to enabled.

    Note that this option is preliminary, so the result may not be compliant to DDBJ data submission format.

    dfast --genome example/sample.genome.fna --gff example/sample.prodigal.gff 

    default configuration is as below:
        {
            # Preliminary implementation.
            "tool_name": "GFF_import",
            "enabled": False,
            "options": {
                # "targets": ["CDS", "tRNA", "rRNA"],
                "targets": ["CDS"],  # feature types to be imported
                "gff_file_name": None,
            },
        },


# Database name and the pident (percent of identity) threshold for homology search can be specified.
    The corresponding part in the config file is as follows:
        {   
            # Sequence homology search against the default DB.
            "component_name": "DBsearch",
            "enabled": True,
            "options": {
                # "cpu": 2,  # Uncomment this to set the component-specific number of CPUs, or the global setting is used.
                "skipAnnotatedFeatures": True,
                "evalue_cutoff": 1e-6,
                "qcov_cutoff": 75,
                "scov_cutoff": 75,
                "pident_cutoff": 0,   # NEW! cutoff for identity percentage (default:0)
                "aligner": "ghostx", 
                "aligner_options": {},  # Normally, leave this empty. (Current version does not use this option.)
                "database": "@@APP_ROOT@@/db/protein/DFAST-default.ref",
                "db_name": "DFAST-default",    # NEW! Database name will appear in /note qualifier
            }
        }    
    
    Accordingly, the output format is slightly changed.
        (Before)
                     /note="Partial hit; WP_003131759.1
                     methylphosphotriester-DNA alkyltransferase (Lactococcus
                     lactis subsp. lactis Il1403) [pid:52.0%, q_cov:99.0%,
                     s_cov:54.8%, Eval:6.9e-26]"

        (After)
                     /note="DFAST-default:WP_003131759.1
                     methylphosphotriester-DNA alkyltransferase (Lactococcus
                     lactis subsp. lactis Il1403) [pid:52.0%, q_cov:99.0%,
                     s_cov:54.8%, Eval:6.9e-26, partial hit]"

                     /note="DFAST-default:WP_000531736.1 cadmium resistance
                     protein CadD (Streptococcus agalactiae 2603V/R) [pid:57.6%,
                     q_cov:100.0%, s_cov:99.5%, Eval:7.2e-64, low identity]"
    
    The default cutoff value for pident is 0(%), so no filtering is performed.


# Command line option for cutoff values for default homology search (--threshold)
    format: --threshold pident,q_cov,s_cov,e_value
    default: 0,75,75,1e-6
    example: --threshold 80,,,1e-10
             (Default values are used if not specified)
    

# Command line option for cutoff values for additional homology search (--database)
    In the previous versions, --database option simply accepts the path to the reference database.
        e.g. --database db/protein/DFAST-ECOLI.ref
    Now, database name and cutoff values can be specified.
        Format: db_path[,db_name[,pident,q_cov,s_cov,e_value]]
        e.g. --database your_database.ref,your_DBNAME
        e.g. --database your_database.ref,your_DBNAME,90,80,80,1e-10
    
    If you have a reference database from very closely-related species, using more stringent thresholds is recommendable.
    (Predicted protein sequences are queried against the additional database first, then against the default database.)      
        e.g. dfast --genome your.genome.fna --threshold 0,75,75,1e-6 --database db/protein/DFAST-ECOLI.ref,ECOLI,90,85,85,1e-10
    
    
# New commandline options:
    --no_cds    Disable CDS prediction
    
    --no_rrna    Disable rRNA prediction
    
    --no_trna    Disable tRNA prediction
    
    --no_crispr    Disable CRISPR prediction