TabularMSA.
consensus
()[source]¶Compute the majority consensus sequence for this MSA.
State: Experimental as of 0.4.1.
The majority consensus sequence contains the most common character at each position in this MSA. Ties will be broken in an arbitrary manner.
Returns: | Sequence
|
---|
Notes
The majority consensus sequence will use this MSA’s default gap
character (dtype.default_gap_char
) to represent gap majority at a
position, regardless of the gap characters present at that position.
Different gap characters at a position are not treated as distinct characters. All gap characters at a position contribute to that position’s gap consensus.
Examples
>>> from skbio import DNA, TabularMSA
>>> sequences = [DNA('AC---'),
... DNA('AT-C.'),
... DNA('TT-CG')]
>>> msa = TabularMSA(sequences,
... positional_metadata={'prob': [2, 1, 2, 3, 5]})
>>> msa.consensus()
DNA
-----------------------------
Positional metadata:
'prob': <dtype: int64>
Stats:
length: 5
has gaps: True
has degenerates: False
has non-degenerates: True
GC-content: 33.33%
-----------------------------
0 AT-C-
Note that the last position in the MSA has more than one type of gap
character. These are not treated as distinct characters; both types of
gap characters contribute to the position’s consensus. Also note that
DNA.default_gap_char
is used to represent gap majority at a
position ('-'
).