Additional utilities¶
This module defines miscellaneous utility functions that is public to users.

calcTree
(names, distance_matrix, method='upgma', linkage=False)[source]¶ Given a distance matrix, it creates an returns a tree structure.
Parameters:  names (list,
ndarray
) – a list of names  distance_matrix (
ndarray
) – a square matrix with length of ensemble. If numbers does not match names it will raise an error  method (str) – method used for constructing the tree. Acceptable options are
"upgma"
,"nj"
, or methods supported bylinkage()
such as"single"
,"average"
,"ward"
, etc. Default is"upgma"
 linkage (bool) – whether the linkage matrix is returned. Note that NJ trees do not support linkage
 names (list,

clusterMatrix
(distance_matrix=None, similarity_matrix=None, labels=None, return_linkage=None, **kwargs)[source]¶ Cluster a distance matrix using scipy.cluster.hierarchy and return the sorted matrix, indices used for sorting, sorted labels (if labels are passed), and linkage matrix (if return_linkage is True).
Parameters:  distance_matrix (
ndarray
) – an NbyN matrix containing some measure of distance such as 1.  seqid_matrix (Hamming distance), rmsds, or distances in PCA space  similarity_matrix (
ndarray
) – an NbyN matrix containing some measure of similarity such as sequence identity, modemode overlap, or spectral overlap. Each element will be subtracted from 1. to get distance, so make sure this is reasonable.  labels (list) – labels for each matrix row that can be returned sorted
 no_plot (bool) – if True, don’t plot the dendrogram. default is True
 reversed (bool) – if set to True, then the sorting indices will be reversed.
Other arguments for
linkage()
anddendrogram()
can also be provided and will be taken as kwargs. distance_matrix (

showLines
(*args, **kwargs)[source]¶ Show 1D data using
plot()
.Parameters:  x (
ndarray
) – (optional) x coordinates. x can be an 1D array or a 2D matrix of column vectors.  y (
ndarray
) – data array. y can be an 1D array or a 2D matrix of column vectors.  dy (
ndarray
) – an array of variances of y which will be plotted as a band along y. It should have the same shape with y.  lower (
ndarray
) – an array of lower bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with upper.  upper (
ndarray
) – an array of upper bounds which will be plotted as a band along y. It should have the same shape with y and should be paired with lower.  alpha (float) – the transparency of the band(s) for plotting dy.
 beta (float) – the transparency of the band(s) for plotting miny and maxy.
 ticklabels (list) – userdefined tick labels for xaxis.
 x (

showMatrix
(matrix, x_array=None, y_array=None, **kwargs)[source]¶ Show a matrix using
imshow()
. Curves on x and yaxis can be added.Parameters:  matrix (
ndarray
) – matrix to be displayed  x_array (
ndarray
) – data to be plotted above the matrix  y_array (
ndarray
) – data to be plotted on the left side of the matrix  percentile (float) – a percentile threshold to remove outliers, i.e. only showing data within pth to 100pth percentile
 interactive (bool) – turn on or off the interactive options
 xtickrotation (float) – how much to rotate the xticklabels in degrees default is 0
 matrix (

reorderMatrix
(names, matrix, tree, axis=None)[source]¶ Reorder a matrix based on a tree and return the reordered matrix and indices for reordering other things.
Parameters:  names (list) – a list of names associated with the rows of the matrix These names must match the ones used to generate the tree
 matrix (
ndarray
) – any square matrix  tree (
Tree
) – any tree fromcalcTree()
 axis (int) – along which axis the matrix should be reordered. Default is None which reorder along all the axes

findSubgroups
(tree, c, method='naive', **kwargs)[source]¶ Divide tree into subgroups using a criterion method and a cutoff c. Returns a list of lists with labels divided into subgroups.

getLinkage
(names, tree)[source]¶ Obtain the
linkage()
matrix encodingtree
.Parameters:  names (list,
ndarray
) – a list of names, the order determines the values in the linkage matrix  tree (
Tree
) – tree to be converted
 names (list,

getTreeFromLinkage
(names, linkage)[source]¶ Obtain the tree encoded by
linkage
.Parameters:  names (list,
ndarray
) – a list of names, the order should correspond to the values in linkage  linkage (
ndarray
) – linkage matrix
 names (list,

clusterSubfamilies
(similarities, n_clusters=0, linkage='all', method='tsne', cutoff=0.0, **kwargs)[source]¶ Perform clustering based on members of the ensemble projected into lower a reduced dimension.
Parameters:  similarities (
ndarray
) – a matrix of similarities for each structure in the ensemble, such as RMSDmatrix, dynamicsbased spectral overlap, sequence similarity  n_clusters (int) – the number of clusters to generate. If 0, will scan a range of number of clusters and return the best one based on highest silhouette score. Default is 0.
 linkage (str, list, tuple,
ndarray
) – if all, will test all linkage types (ward, average, complete, single). Otherwise will use only the one(s) given as input. Default is all.  method (str) – if set to spectral, will generate a Kirchoff matrix based on the cutoff value given and use that as input as clustering instead of the values themselves. Default is tsne.
 cutoff (float) – only used if method is set to spectral. This value is used for generating the Kirchoff matrix to use for generating clusters when doing spectral clustering. Default is 0.0.
 similarities (