Pfam Access Functions

This module defines functions for interfacing Pfam database.

searchPfam(query, **kwargs)[source]

Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.

Parameters:
  • query (str) – UniProt ID or PDB identifier with or without a chain identifier, e.g. '1mkp' or '1mkpA'. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database
  • timeout (int) – timeout for blocking connection attempt in seconds, default is 60
fetchPfamMSA(acc, alignment='seed', compressed=False, **kwargs)[source]

Returns a path to the downloaded Pfam MSA file.

Parameters:
  • acc (str) – Pfam ID or Accession Code
  • alignment – alignment type, one of 'full', 'seed' (default), 'ncbi', 'metagenomics', 'rp15', 'rp35', 'rp55', 'rp75' or 'uniprot' where rp stands for representative proteomes. InterPro Pfam seems to only have seed alignments easily accessible in most cases
  • compressed – gzip the downloaded MSA file, default is False
  • timeout – timeout for blocking connection attempt in seconds, default is 60
  • outname (str) – out filename, default is input 'acc_alignment.format'
  • folder (str) – output folder, default is '.'
parsePfamPDBs(query, data=[], **kwargs)[source]

Returns a list of AtomGroup objects containing sections of chains that correspond to a particular PFAM domain family. These are defined by alignment start and end residue numbers.

Parameters:
  • query (str) – Pfam ID, UniProt ID or PDB ID If a PDB ID is provided the corresponding UniProt ID is used. If this returns multiple matches then start or end must also be provided. This query is also used for label refinement of the Pfam domain MSA.
  • data (list) – If given the data list from the Pfam mapping table will be output through this argument.
  • start (int) – Residue number for defining the start of the domain. The PFAM domain that starts closest to this will be selected. Default is 1
  • end (int) – Residue number for defining the end of the domain. The PFAM domain that ends closest to this will be selected.