RNAlib-2.2.10
Functions to Read/Write several File Formats for RNA Sequences, Structures, and Alignments
+ Collaboration diagram for Functions to Read/Write several File Formats for RNA Sequences, Structures, and Alignments:

Files

file  file_formats.h
 Functions dealing with file formats for RNA sequences, and structures.
 
file  file_formats_msa.h
 Functions dealing with file formats for Multiple Sequence Alignments (MSA)
 
file  ribo.h
 Parse RiboSum Scoring Matrices for Covariance Scoring of Alignments.
 

Macros

#define VRNA_OPTION_MULTILINE   32U
 Tell a function that an input is assumed to span several lines. More...
 
#define VRNA_CONSTRAINT_MULTILINE   32U
 parse multiline constraint More...
 
#define VRNA_FILE_FORMAT_MSA_CLUSTAL   1U
 Option flag indicating ClustalW formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM   2U
 Option flag indicating Stockholm 1.0 formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_FASTA   4U
 Option flag indicating FASTA (Pearson) formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_MAF   8U
 Option flag indicating MAF formatted files. More...
 
#define VRNA_FILE_FORMAT_MSA_DEFAULT
 Option flag indicating the set of default file formats. More...
 
#define VRNA_FILE_FORMAT_MSA_NOCHECK   4096U
 Option flag to disable validation of the alignment. More...
 
#define VRNA_FILE_FORMAT_MSA_UNKNOWN   8192U
 Return flag of vrna_file_msa_detect_format() to indicate unknown or malformatted alignment. More...
 

Functions

void vrna_file_helixlist (const char *seq, const char *db, float energy, FILE *file)
 Print a secondary structure as helix list. More...
 
void vrna_file_connect (const char *seq, const char *db, float energy, const char *identifier, FILE *file)
 Print a secondary structure as connect table. More...
 
void vrna_file_bpseq (const char *seq, const char *db, FILE *file)
 Print a secondary structure in bpseq format. More...
 
void vrna_file_json (const char *seq, const char *db, double energy, const char *identifier, FILE *file)
 Print a secondary structure in jsonformat. More...
 
unsigned int vrna_file_fasta_read_record (char **header, char **sequence, char ***rest, FILE *file, unsigned int options)
 Get a (fasta) data set from a file or stdin. More...
 
char * vrna_extract_record_rest_structure (const char **lines, unsigned int length, unsigned int option)
 Extract a dot-bracket structure string from (multiline)character array. More...
 
int vrna_file_SHAPE_read (const char *file_name, int length, double default_value, char *sequence, double *values)
 Read data from a given SHAPE reactivity input file. More...
 
vrna_plist_tvrna_file_constraints_read (const char *filename, unsigned int length, unsigned int options)
 Read constraints from an input file. More...
 
void vrna_extract_record_rest_constraint (char **cstruc, const char **lines, unsigned int option)
 Extract a hard constraint encoded as pseudo dot-bracket string. More...
 
unsigned int read_record (char **header, char **sequence, char ***rest, unsigned int options)
 Get a data record from stdin. More...
 
int vrna_file_msa_read (const char *filename, char ***names, char ***aln, char **id, char **structure, unsigned int options)
 Read a multiple sequence alignment from file. More...
 
int vrna_file_msa_read_record (FILE *fp, char ***names, char ***aln, char **id, char **structure, unsigned int options)
 Read a multiple sequence alignment from file handle. More...
 
unsigned int vrna_file_msa_detect_format (const char *filename, unsigned int options)
 Detect the format of a multiple sequence alignment file. More...
 
float ** readribosum (char *name)
 Read a RiboSum or other user-defined Scoring Matrix and Store into global Memory.
 

Detailed Description

Macro Definition Documentation

#define VRNA_OPTION_MULTILINE   32U

#include <ViennaRNA/file_formats.h>

Tell a function that an input is assumed to span several lines.

If used as input-option a function might also be returning this state telling that it has read data from multiple lines.

See also
vrna_extract_record_rest_structure(), vrna_file_fasta_read_record()
#define VRNA_CONSTRAINT_MULTILINE   32U
#define VRNA_FILE_FORMAT_MSA_CLUSTAL   1U

#include <ViennaRNA/file_formats_msa.h>

Option flag indicating ClustalW formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM   2U

#include <ViennaRNA/file_formats_msa.h>

Option flag indicating Stockholm 1.0 formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_FASTA   4U

#include <ViennaRNA/file_formats_msa.h>

Option flag indicating FASTA (Pearson) formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_MAF   8U

#include <ViennaRNA/file_formats_msa.h>

Option flag indicating MAF formatted files.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_DEFAULT

#include <ViennaRNA/file_formats_msa.h>

Value:
( \
VRNA_FILE_FORMAT_MSA_CLUSTAL \
)
#define VRNA_FILE_FORMAT_MSA_MAF
Option flag indicating MAF formatted files.
Definition: file_formats_msa.h:38
#define VRNA_FILE_FORMAT_MSA_FASTA
Option flag indicating FASTA (Pearson) formatted files.
Definition: file_formats_msa.h:32
#define VRNA_FILE_FORMAT_MSA_STOCKHOLM
Option flag indicating Stockholm 1.0 formatted files.
Definition: file_formats_msa.h:26

Option flag indicating the set of default file formats.

See also
vrna_file_msa_read(), vrna_file_msa_read_record(), vrna_file_msa_detect_format()
#define VRNA_FILE_FORMAT_MSA_NOCHECK   4096U

#include <ViennaRNA/file_formats_msa.h>

Option flag to disable validation of the alignment.

See also
vrna_file_msa_read(), vrna_file_msa_read_record()
#define VRNA_FILE_FORMAT_MSA_UNKNOWN   8192U

#include <ViennaRNA/file_formats_msa.h>

Return flag of vrna_file_msa_detect_format() to indicate unknown or malformatted alignment.

See also
vrna_file_msa_detect_format()

Function Documentation

void vrna_file_helixlist ( const char *  seq,
const char *  db,
float  energy,
FILE *  file 
)

#include <ViennaRNA/file_formats.h>

Print a secondary structure as helix list.

Parameters
seqThe RNA sequence
dbThe structure in dot-bracket format
energyFree energy of the structure in kcal/mol
fileThe file handle used to print to (print defaults to 'stdout' if(file == NULL) )
void vrna_file_connect ( const char *  seq,
const char *  db,
float  energy,
const char *  identifier,
FILE *  file 
)

#include <ViennaRNA/file_formats.h>

Print a secondary structure as connect table.

Connect table file format looks like this:

300  ENERGY = 7.0  example
  1 G       0    2   22    1
  2 G       1    3   21    2

where the headerline is followed by 6 columns with:

  1. Base number: index n
  2. Base (A, C, G, T, U, X)
  3. Index n-1 (0 if first nucleotide)
  4. Index n+1 (0 if last nucleotide)
  5. Number of the base to which n is paired. No pairing is indicated by 0 (zero).
  6. Natural numbering.
Parameters
seqThe RNA sequence
dbThe structure in dot-bracket format
energyThe free energy of the structure
identifierAn optional identifier for the sequence
fileThe file handle used to print to (print defaults to 'stdout' if(file == NULL) )
void vrna_file_bpseq ( const char *  seq,
const char *  db,
FILE *  file 
)

#include <ViennaRNA/file_formats.h>

Print a secondary structure in bpseq format.

Parameters
seqThe RNA sequence
dbThe structure in dot-bracket format
fileThe file handle used to print to (print defaults to 'stdout' if(file == NULL) )
void vrna_file_json ( const char *  seq,
const char *  db,
double  energy,
const char *  identifier,
FILE *  file 
)

#include <ViennaRNA/file_formats.h>

Print a secondary structure in jsonformat.

Parameters
seqThe RNA sequence
dbThe structure in dot-bracket format
energyThe free energy
identifierAn identifier for the sequence
fileThe file handle used to print to (print defaults to 'stdout' if(file == NULL) )
unsigned int vrna_file_fasta_read_record ( char **  header,
char **  sequence,
char ***  rest,
FILE *  file,
unsigned int  options 
)

#include <ViennaRNA/file_formats.h>

Get a (fasta) data set from a file or stdin.

This function may be used to obtain complete datasets from a filehandle or stdin. A dataset is always defined to contain at least a sequence. If data starts with a fasta header, i.e. a line like

>some header info 

then vrna_file_fasta_read_record() will assume that the sequence that follows the header may span over several lines. To disable this behavior and to assign a single line to the argument 'sequence' one can pass VRNA_INPUT_NO_SPAN in the 'options' argument. If no fasta header is read in the beginning of a data block, a sequence must not span over multiple lines!
Unless the options VRNA_INPUT_NOSKIP_COMMENTS or VRNA_INPUT_NOSKIP_BLANK_LINES are passed, a sequence may be interrupted by lines starting with a comment character or empty lines.
A sequence is regarded as completely read if it was either assumed to not span over multiple lines, a secondary structure or structure constraint follows the sequence on the next line, or a new header marks the beginning of a new sequence...
All lines following the sequence (this includes comments) that do not initiate a new dataset according to the above definition are available through the line-array 'rest'. Here one can usually find the structure constraint or other information belonging to the current dataset. Filling of 'rest' may be prevented by passing VRNA_INPUT_NO_REST to the options argument.

Note
This function will exit any program with an error message if no sequence could be read!
This function is NOT threadsafe! It uses a global variable to store information about the next data block.

The main purpose of this function is to be able to easily parse blocks of data in the header of a loop where all calculations for the appropriate data is done inside the loop. The loop may be then left on certain return values, e.g.:

1 char *id, *seq, **rest;
2 int i;
3 id = seq = NULL;
4 rest = NULL;
5 while(!(vrna_file_fasta_read_record(&id, &seq, &rest, NULL, 0) & (VRNA_INPUT_ERROR | VRNA_INPUT_QUIT))){
6  if(id) printf("%s\n", id);
7  printf("%s\n", seq);
8  if(rest)
9  for(i=0;rest[i];i++){
10  printf("%s\n", rest[i]);
11  free(rest[i]);
12  }
13  free(rest);
14  free(seq);
15  free(id);
16 }

In the example above, the while loop will be terminated when vrna_file_fasta_read_record() returns either an error, EOF, or a user initiated quit request.
As long as data is read from stdin (we are passing NULL as the file pointer), the id is printed if it is available for the current block of data. The sequence will be printed in any case and if some more lines belong to the current block of data each line will be printed as well.

Note
Do not forget to free the memory occupied by header, sequence and rest!
Parameters
headerA pointer which will be set such that it points to the header of the record
sequenceA pointer which will be set such that it points to the sequence of the record
restA pointer which will be set such that it points to an array of lines which also belong to the record
fileA file handle to read from (if NULL, this function reads from stdin)
optionsSome options which may be passed to alter the behavior of the function, use 0 for no options
Returns
A flag with information about what the function actually did read
char* vrna_extract_record_rest_structure ( const char **  lines,
unsigned int  length,
unsigned int  option 
)

#include <ViennaRNA/file_formats.h>

Extract a dot-bracket structure string from (multiline)character array.

This function extracts a dot-bracket structure string from the 'rest' array as returned by vrna_file_fasta_read_record() and returns it. All occurences of comments within the 'lines' array will be skipped as long as they do not break the structure string. If no structure could be read, this function returns NULL.

Precondition
The argument 'lines' has to be a 2-dimensional character array as obtained by vrna_file_fasta_read_record()
See also
vrna_file_fasta_read_record()
Parameters
linesThe (multiline) character array to be parsed
lengthThe assumed length of the dot-bracket string (passing a value < 1 results in no length limit)
optionSome options which may be passed to alter the behavior of the function, use 0 for no options
Returns
The dot-bracket string read from lines or NULL
int vrna_file_SHAPE_read ( const char *  file_name,
int  length,
double  default_value,
char *  sequence,
double *  values 
)

#include <ViennaRNA/file_formats.h>

Read data from a given SHAPE reactivity input file.

This function parses the informations from a given file and stores the result in the preallocated string sequence and the double array values.

Parameters
file_namePath to the constraints file
lengthLength of the sequence (file entries exceeding this limit will cause an error)
default_valueValue for missing indices
sequencePointer to an array used for storing the sequence obtained from the SHAPE reactivity file
valuesPointer to an array used for storing the values obtained from the SHAPE reactivity file
vrna_plist_t* vrna_file_constraints_read ( const char *  filename,
unsigned int  length,
unsigned int  options 
)

#include <ViennaRNA/file_formats.h>

Read constraints from an input file.

This function reads constraint definitions from a file and converts them into an array of vrna_plist_t data structures. The data fields of each individual returned plist entry may adopt the following configurations:

  • plist.i == plist.j $ \rightarrow $ single nucleotide constraint
  • plist.i != plist.j $ \rightarrow $ base pair constraint
  • plist.i == 0 $ \rightarrow $ End of list
void vrna_extract_record_rest_constraint ( char **  cstruc,
const char **  lines,
unsigned int  option 
)

#include <ViennaRNA/file_formats.h>

Extract a hard constraint encoded as pseudo dot-bracket string.

Deprecated:
Use vrna_extract_record_rest_structure() instead!
Precondition
The argument 'lines' has to be a 2-dimensional character array as obtained by vrna_file_fasta_read_record()
See also
vrna_file_fasta_read_record(), VRNA_CONSTRAINT_DB_PIPE, VRNA_CONSTRAINT_DB_DOT, VRNA_CONSTRAINT_DB_X VRNA_CONSTRAINT_DB_ANG_BRACK, VRNA_CONSTRAINT_DB_RND_BRACK
Parameters
cstrucA pointer to a character array that is used as pseudo dot-bracket output
linesA 2-dimensional character array with the extension lines from the FASTA input
optionThe option flags that define the behavior and recognition pattern of this function
unsigned int read_record ( char **  header,
char **  sequence,
char ***  rest,
unsigned int  options 
)

#include <ViennaRNA/file_formats.h>

Get a data record from stdin.

Deprecated:
This function is deprecated! Use vrna_file_fasta_read_record() as a replacment.
int vrna_file_msa_read ( const char *  filename,
char ***  names,
char ***  aln,
char **  id,
char **  structure,
unsigned int  options 
)

#include <ViennaRNA/file_formats_msa.h>

Read a multiple sequence alignment from file.

This function reads the (first) multiple sequence alignment from an input file. The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the set of alignment file formats that should be used to retrieve the data. If 0 is passed as option, the list of alignment file formats defaults to VRNA_FILE_FORMAT_MSA_DEFAULT.

Currently, the list of parsable multiple sequence alignment file formats consists of:

Note
After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options parameter.
See also
vrna_file_msa_read_record(), VRNA_FILE_FORMAT_MSA_CLUSTAL, VRNA_FILE_FORMAT_MSA_STOCKHOLM, VRNA_FILE_FORMAT_MSA_FASTA, VRNA_FILE_FORMAT_MSA_MAF, VRNA_FILE_FORMAT_MSA_DEFAULT, VRNA_FILE_FORMAT_MSA_NOCHECK
Parameters
filenameThe name of input file that contains the alignment
namesAn address to the pointer where sequence identifiers should be written to
alnAn address to the pointer where aligned sequences should be written to
idAn address to the pointer where the alignment ID should be written to (Maybe NULL)
structureAn address to the pointer where consensus structure information should be written to (Maybe NULL)
optionsOptions to manipulate the behavior of this function
Returns
The number of sequences in the alignment, or -1 if no alignment record could be found
int vrna_file_msa_read_record ( FILE *  fp,
char ***  names,
char ***  aln,
char **  id,
char **  structure,
unsigned int  options 
)

#include <ViennaRNA/file_formats_msa.h>

Read a multiple sequence alignment from file handle.

Similar to vrna_file_msa_read(), this function reads a multiple sequence alignment from an input file handle. Since using a file handle, this function is not limited to the first alignment record, but allows for looping over all alignments within the input.

The read alignment is split into the sequence id/name part and the actual sequence information and stored in memory as arrays of ids/names and sequences. If the alignment file format allows for additional information, such as an ID of the entire alignment or consensus structure information, this data is retrieved as well and made available. The options parameter allows to specify the alignment file format used to retrieve the data. A single format must be specified here, see vrna_file_msa_detect_format() for helping to determine the correct MSA file format.

Currently, the list of parsable multiple sequence alignment file formats consists of:

Note
After successfully reading an alignment, this function performs a validation of the data that includes uniqueness of the sequence identifiers, and equal sequence lengths. This check can be deactivated by passing VRNA_FILE_FORMAT_MSA_NOCHECK in the options parameter.
See also
vrna_file_msa_read(), vrna_file_msa_detect_format(), VRNA_FILE_FORMAT_MSA_CLUSTAL, VRNA_FILE_FORMAT_MSA_STOCKHOLM, VRNA_FILE_FORMAT_MSA_FASTA, VRNA_FILE_FORMAT_MSA_MAF, VRNA_FILE_FORMAT_MSA_DEFAULT, VRNA_FILE_FORMAT_MSA_NOCHECK
Parameters
fpThe file pointer the data will be retrieved from
namesAn address to the pointer where sequence identifiers should be written to
alnAn address to the pointer where aligned sequences should be written to
idAn address to the pointer where the alignment ID should be written to (Maybe NULL)
structureAn address to the pointer where consensus structure information should be written to (Maybe NULL)
optionsOptions to manipulate the behavior of this function
Returns
The number of sequences in the alignment, or -1 if no alignment record could be found
unsigned int vrna_file_msa_detect_format ( const char *  filename,
unsigned int  options 
)

#include <ViennaRNA/file_formats_msa.h>

Detect the format of a multiple sequence alignment file.

This function attempts to determine the format of a file that supposedly contains a multiple sequence alignment (MSA). This is useful in cases where a MSA file contains more than a single record and therefore vrna_file_msa_read() can not be applied, since it only retrieves the first. Here, one can try to guess the correct file format using this function and then loop over the file, record by record using one of the low-level record retrieval functions for the corresponding MSA file format.

Note
This function parses the entire first record within the specified file. As a result, it returns VRNA_FILE_FORMAT_MSA_UNKNOWN not only if it can't detect the file's format, but also in cases where the file doesn't contain sequences!
See also
vrna_file_msa_read(), vrna_file_stockholm_read_record(), vrna_file_clustal_read_record(), vrna_file_fasta_read_record()
Parameters
filenameThe name of input file that contains the alignment
optionsOptions to manipulate the behavior of this function
Returns
The MSA file format, or VRNA_FILE_FORMAT_MSA_UNKNOWN