halapi
hierarchichalalignmentformatapi
 All Classes Namespaces Functions Pages
Public Types | Public Member Functions | Friends | List of all members
hal::ColumnIterator Class Referenceabstract

#include <halColumnIterator.h>

Public Types

typedef std::vector
< hal::DNAIteratorConstPtr > 
DNASet
 
typedef std::map< const
hal::Sequence *, DNASet
*, SequenceLess > 
ColumnMap
 

Public Member Functions

virtual void toRight () const =0
 
virtual void toSite (hal_index_t columnIndex, hal_index_t lastIndex, bool clearCache=false) const =0
 
virtual bool lastColumn () const =0
 
virtual const hal::GenomegetReferenceGenome () const =0
 
virtual const hal::SequencegetReferenceSequence () const =0
 
virtual hal_index_t getReferenceSequencePosition () const =0
 
virtual const ColumnMap * getColumnMap () const =0
 
virtual hal_index_t getArrayIndex () const =0
 
virtual void defragment () const =0
 
virtual bool isCanonicalOnRef () const =0
 
virtual void print (std::ostream &os) const =0
 

Friends

class counted_ptr< ColumnIterator >
 
class counted_ptr< const ColumnIterator >
 

Detailed Description

Interface Column iterator for allowing traditional maf-like (left-to-right) parsing of a hal alignment. Columns are iterated with respect to a specified reference genome. This isn't the most efficient way to explore the hal structure, which is designed for bottom-up and/or top-down traversal.

Member Function Documentation

virtual void hal::ColumnIterator::defragment ( ) const
pure virtual

As we iterate along, we keep a column map entry for each sequence visited. This works out pretty well except for extreme cases (such as iterating over entire fly genomes where we can accumulate 10s of thousands of empty entries for all the different scaffolds when in truth we only need a handful at any given time). Under these circumstances, calling this method every 1M bases or so will help reduce memory as well as speed up queries on the column map. Perhaps this should eventually be built in and made transparent?

virtual hal_index_t hal::ColumnIterator::getArrayIndex ( ) const
pure virtual

Get the index of the column in the reference genome's array

virtual const ColumnMap* hal::ColumnIterator::getColumnMap ( ) const
pure virtual

Get a pointer to the column map

virtual const hal::Genome* hal::ColumnIterator::getReferenceGenome ( ) const
pure virtual

Get a pointer to the reference genome for the column iterator

virtual const hal::Sequence* hal::ColumnIterator::getReferenceSequence ( ) const
pure virtual

Get a pointer to the reference sequence for the column iterator

virtual hal_index_t hal::ColumnIterator::getReferenceSequencePosition ( ) const
pure virtual

Get the position in the reference sequence NOTE Seems to be returning the next position, rather than the current. Must go back and review but it is concerning.

virtual bool hal::ColumnIterator::isCanonicalOnRef ( ) const
pure virtual

Check whether the column iterator's left-most reference coordinate is within the iterator's range, ie is "canonical". This can be used to ensure that the same reference position does not get sampled by different iterators covering distinct ranges. If there are no duplications, then this function will always return true.

virtual bool hal::ColumnIterator::lastColumn ( ) const
pure virtual

Use this method to bound iteration loops. When the column iterator is retrieved from the sequence or genome, the last column is specfied. toRight() cna then be called until lastColumn is true.

virtual void hal::ColumnIterator::print ( std::ostream &  os) const
pure virtual

Print contents of column iterator

virtual void hal::ColumnIterator::toRight ( ) const
pure virtual

Move column iterator one column to the right along reference genoem sequence

virtual void hal::ColumnIterator::toSite ( hal_index_t  columnIndex,
hal_index_t  lastIndex,
bool  clearCache = false 
) const
pure virtual

Move column iterator to arbitrary site in genome – effectively resetting the iterator (convenience function to avoid creation of new iterators in some cases).

Parameters
columnIndexposition of column in forward genome coordinates
lastIndexlast column position (for iteration). must be greater than columnIndex
clearCacheclear the cache that prevents columns from being visited twice. If not set to true, then its possible the iterator ends up not at "columnIndex" but at the next unvisited column.

The documentation for this class was generated from the following file: