PDB File Header¶
This module defines functions for parsing header data from PDB files.
- 
class Chemical(resname)[source]¶
- A data structure for storing information on chemical components (or heterogens) in PDB structures. - A - Chemicalinstance has the following attributes:- Attribute - Type - Description (RECORD TYPE) - resname - str - residue name (or chemical component identifier) (HET) - name - str - chemical name (HETNAM) - chain - str - chain identifier (HET) - resnum - int - residue (or sequence) number (HET) - icode - str - insertion code (HET) - natoms - int - number of atoms present in the structure (HET) - description - str - description of the chemical component (HET) - synonyms - list - synonyms (HETSYN) - formula - str - chemical formula (FORMUL) - pdbentry - str - PDB entry that chemical data is extracted from - Chemical class instances can be obtained as follows: - In [1]: from prody import * In [2]: chemical = parsePDBHeader('1zz2', 'chemicals')[0] In [3]: chemical Out[3]: <Chemical: B11 (1ZZ2_A_362)> In [4]: chemical.name Out[4]: 'N-[3-(4-FLUOROPHENOXY)PHENYL]-4-[(2-HYDROXYBENZYL)AMINO]PIPERIDINE-1-SULFONAMIDE' In [5]: chemical.natoms Out[5]: 33 In [6]: len(chemical) Out[6]: 33 - 
chain¶
- chain identifier 
 - 
description¶
- description of the chemical component 
 - 
formula¶
- chemical formula 
 - 
icode¶
- insertion code 
 - 
name¶
- chemical name 
 - 
natoms¶
- number of atoms present in the structure 
 - 
pdbentry¶
- PDB entry that chemical data is extracted from 
 - 
resname¶
- residue name (or chemical component identifier) 
 - 
resnum¶
- residue (or sequence) number 
 - 
synonyms¶
- list of synonyms 
 
- 
- 
class Polymer(chid)[source]¶
- A data structure for storing information on polymer components (protein or nucleic) of PDB structures. - A - Polymerinstance has the following attributes:- Attribute - Type - Description (RECORD TYPE) - chid - str - chain identifier - name - str - name of the polymer (macromolecule) (COMPND) - fragment - str - specifies a domain or region of the molecule (COMPND) - synonyms - list - synonyms for the polymer (COMPND) - ec - list - associated Enzyme Commission numbers (COMPND) - engineered - bool - indicates that the polymer was produced using recombinant technology or by purely chemical synthesis (COMPND) - mutation - bool - indicates presence of a mutation (COMPND) - comments - str - additional comments - sequence - str - polymer chain sequence (SEQRES) - dbrefs - list - sequence database records (DBREF[1|2] and SEQADV), see - DBRef- modified - list modified residues (MODRES)when modified residues are present, each will be represented as:- (resname, chid, resnum, icode, stdname, comment)- pdbentry - str - PDB entry that polymer data is extracted from - Polymer class instances can be obtained as follows: - In [1]: polymer = parsePDBHeader('2k39', 'polymers')[0] In [2]: polymer Out[2]: <Polymer: UBIQUITIN (2K39_A)> In [3]: polymer.pdbentry Out[3]: '2K39' In [4]: polymer.chid Out[4]: 'A' In [5]: polymer.name Out[5]: 'UBIQUITIN' In [6]: polymer.sequence Out[6]: 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG' In [7]: len(polymer.sequence) Out[7]: 76 In [8]: len(polymer) Out[8]: 76 In [9]: dbref = polymer.dbrefs[0] In [10]: dbref.database Out[10]: 'UniProt' In [11]: dbref.accession Out[11]: 'P62972' In [12]: dbref.idcode Out[12]: 'UBIQ_XENLA' - 
chid¶
- chain identifier 
 - 
comments¶
- additional comments 
 - 
dbrefs¶
- sequence database reference records 
 - 
ec¶
- list of associated Enzyme Commission numbers 
 - 
engineered¶
- indicates that the molecule was produced using recombinant technology or by purely chemical synthesis 
 - 
fragment¶
- specifies a domain or region of the molecule 
 - 
modified¶
- modified residues 
 - 
mutation¶
- indicates presence of a mutation 
 - 
name¶
- name of the polymer (macromolecule) 
 - 
pdbentry¶
- PDB entry that polymer data is extracted from 
 - 
sequence¶
- polymer chain sequence 
 - 
synonyms¶
- list of synonyms for the molecule 
 
- 
- 
class DBRef[source]¶
- A data structure for storing reference to sequence databases for polymer components in PDB structures. Information if parsed from DBREF[1|2] and SEQADV records in PDB header. - 
accession¶
- database accession code 
 - 
database¶
- sequence database, one of UniProt, GenBank, Norine, UNIMES, or PDB 
 - 
dbabbr¶
- database abbreviation, one of UNP, GB, NORINE, UNIMES, or PDB 
 - 
diff¶
- list of differences between PDB and database sequences, - (resname, resnum, icode, dbResname, dbResnum, comment)
 - 
first¶
- initial residue numbers, - (resnum, icode, dbnum)
 - 
idcode¶
- database identification code, i.e. entry name in UniProt 
 - 
last¶
- ending residue numbers, - (resnum, icode, dbnum)
 
- 
- 
parsePDBHeader(pdb, *keys)[source]¶
- Returns header data dictionary for pdb. This function is equivalent to - parsePDB(pdb, header=True, model=0, meta=False), likewise pdb may be an identifier or a filename.- List of header records that are parsed. - Record type - Dictionary key(s) - Description - HEADER classificationdeposition_dateidentifiermolecule classificationdeposition datePDB identifier- TITLE - title - title for the experiment or analysis - SPLIT - split - list of PDB entries that make up the whole structure when combined with this one - COMPND - polymers - see - Polymer- EXPDTA - experiment - information about the experiment - NUMMDL - n_models - number of models - MDLTYP - model_type - additional structural annotation - AUTHOR - authors - list of contributors - JRNL - reference - reference information dictionary:
- authors: list of authors
- title: title of the article
- editors: list of editors
- issn:
- reference: journal, vol, issue, etc.
- publisher: publisher information
- pmid: pubmed identifier
- doi: digital object identifier
 
 - DBREF[1|2] - polymers - see - Polymerand- DBRef- SEQADV - polymers - see - Polymer- SEQRES - polymers - see - Polymer- MODRES - polymers - see - Polymer- HELIX - polymers - see - Polymer- SHEET - polymers - see - Polymer- HET - chemicals - see - Chemical- HETNAM - chemicals - see - Chemical- HETSYN - chemicals - see - Chemical- FORMUL - chemicals - see - Chemical- REMARK 2 - resolution - resolution of structures, when applicable - REMARK 4 - version - PDB file version - REMARK 350 - biomoltrans - biomolecular transformation lines (unprocessed) - REMARK 900 - related_entries - related entries in the PDB or EMDB - Header records that are not parsed are: OBSLTE, CAVEAT, SOURCE, KEYWDS, REVDAT, SPRSDE, SSBOND, LINK, CISPEP, CRYST1, ORIGX1, ORIGX2, ORIGX3, MTRIX1, MTRIX2, MTRIX3, and REMARK X not mentioned above. 
- 
assignSecstr(header, atoms, coil=True)[source]¶
- Assign secondary structure from header dictionary to atoms. header must be a dictionary parsed using the - parsePDB(). atoms may be an instance of- AtomGroup,- Selection,- Chainor- Residue. ProDy can be configured to automatically parse and assign secondary structure information using- confProDy(auto_secondary=True)command. See also- confProDy()function.- The Dictionary of Protein Secondary Structure, in short DSSP, type single letter code assignments are used: - G = 3-turn helix (310 helix). Min length 3 residues.
- H = 4-turn helix (alpha helix). Min length 4 residues.
- I = 5-turn helix (pi helix). Min length 5 residues.
- T = hydrogen bonded turn (3, 4 or 5 turn)
- E = extended strand in parallel and/or anti-parallel beta-sheet conformation. Min length 2 residues.
- B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation)
- S = bend (the only non-hydrogen-bond based assignment).
- C = residues not in one of above conformations.
 - See http://en.wikipedia.org/wiki/Protein_secondary_structure#The_DSSP_code for more details. - Following PDB helix classes are omitted: - Right-handed omega (2, class number)
- Right-handed gamma (4)
- Left-handed alpha (6)
- Left-handed omega (7)
- Left-handed gamma (8)
- 2 - 7 ribbon/helix (9)
- Polyproline (10)
 - Secondary structures are assigned to all atoms in a residue. Amino acid residues without any secondary structure assignments in the header section will be assigned coil (C) conformation. This can be prevented by passing - coil=Falseargument.
- 
buildBiomolecules(header, atoms, biomol=None)[source]¶
- Returns atoms after applying biomolecular transformations from header dictionary. Biomolecular transformations are applied to all coordinate sets in the molecule. - Some PDB files contain transformations for more than 1 biomolecules. A specific set of transformations can be choosen using biomol argument. Transformation sets are identified by numbers, e.g. - "1",- "2", ...- If multiple biomolecular transformations are provided in the header dictionary, biomolecules will be returned as - AtomGroupinstances in a- list().- If the resulting biomolecule has more than 26 chains, the molecular assembly will be split into multiple - AtomGroupinstances each containing at most 26 chains. These- AtomGroupinstances will be returned in a tuple.- Note that atoms in biomolecules are ordered according to chain identifiers. When multiple chains in a biomolecule have the same chain identifier, they are given different segment names to distinguish them. 
