#include <halColumnIterator.h>
Public Types | |
typedef std::vector < hal::DNAIteratorConstPtr > | DNASet |
typedef std::map< const hal::Sequence *, DNASet *, SequenceLess > | ColumnMap |
Public Member Functions | |
virtual void | toRight () const =0 |
virtual void | toSite (hal_index_t columnIndex, hal_index_t lastIndex, bool clearCache=false) const =0 |
virtual bool | lastColumn () const =0 |
virtual const hal::Genome * | getReferenceGenome () const =0 |
virtual const hal::Sequence * | getReferenceSequence () const =0 |
virtual hal_index_t | getReferenceSequencePosition () const =0 |
virtual const ColumnMap * | getColumnMap () const =0 |
virtual hal_index_t | getArrayIndex () const =0 |
virtual void | defragment () const =0 |
virtual bool | isCanonicalOnRef () const =0 |
virtual void | print (std::ostream &os) const =0 |
Friends | |
class | counted_ptr< ColumnIterator > |
class | counted_ptr< const ColumnIterator > |
Interface Column iterator for allowing traditional maf-like (left-to-right) parsing of a hal alignment. Columns are iterated with respect to a specified reference genome. This isn't the most efficient way to explore the hal structure, which is designed for bottom-up and/or top-down traversal.
|
pure virtual |
As we iterate along, we keep a column map entry for each sequence visited. This works out pretty well except for extreme cases (such as iterating over entire fly genomes where we can accumulate 10s of thousands of empty entries for all the different scaffolds when in truth we only need a handful at any given time). Under these circumstances, calling this method every 1M bases or so will help reduce memory as well as speed up queries on the column map. Perhaps this should eventually be built in and made transparent?
|
pure virtual |
Get the index of the column in the reference genome's array
|
pure virtual |
Get a pointer to the column map
|
pure virtual |
Get a pointer to the reference genome for the column iterator
|
pure virtual |
Get a pointer to the reference sequence for the column iterator
|
pure virtual |
Get the position in the reference sequence NOTE Seems to be returning the next position, rather than the current. Must go back and review but it is concerning.
|
pure virtual |
Check whether the column iterator's left-most reference coordinate is within the iterator's range, ie is "canonical". This can be used to ensure that the same reference position does not get sampled by different iterators covering distinct ranges. If there are no duplications, then this function will always return true.
|
pure virtual |
Use this method to bound iteration loops. When the column iterator is retrieved from the sequence or genome, the last column is specfied. toRight() cna then be called until lastColumn is true.
|
pure virtual |
Print contents of column iterator
|
pure virtual |
Move column iterator one column to the right along reference genoem sequence
|
pure virtual |
Move column iterator to arbitrary site in genome – effectively resetting the iterator (convenience function to avoid creation of new iterators in some cases).
columnIndex | position of column in forward genome coordinates |
lastIndex | last column position (for iteration). must be greater than columnIndex |
clearCache | clear the cache that prevents columns from being visited twice. If not set to true, then its possible the iterator ends up not at "columnIndex" but at the next unvisited column. |