PDB files

Release Notes

v2.5.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Zhang S, Krieger JM, Zhang Y, Kaya C, Kaynak B, Mikulska-Ruminska K, Doruker P, Li H, Bahar I ProDy 2.0: Increased scale and scope after 10 years of protein dynamics modelling with Python
Bioinformatics 2021 37(20):3657-3659.

PDB files¶

This examples demonstrates how to use the flexible PDB fetcher, fetchPDB(). Valid inputs are PDB identifier, e.g 2k39, or a list of PDB identifiers, e.g. ["2k39", "1mkp", "1etc"]. Compressed PDB files (pdb.gz) will be saved to the current working directory or a target folder.

Fetch PDB files¶

Single file¶

We start by importing everything from the ProDy package:

In [1]: from prody import *

The function will return a filename if the download is successful.

In [2]: filename = fetchPDB('5uoj')

In [3]: filename
Out[3]: '5uoj.pdb.gz'

Multiple files¶

This function also accepts a list of PDB identifiers:

In [4]: filenames = fetchPDB(['5uoj', '1r39', '@!~#'])

In [5]: filenames
Out[5]: ['5uoj.pdb.gz', '1r39.pdb.gz', None]

For failed downloads, None will be returned (or the list will contain None item).

Also note that in this case we passed a folder name. Files are saved in this folder, after it is created if it did not exist.

ProDy will give you a report of download results and return a list of filenames. The report will be printed on the screen, which in this case would be:

@> 5uoj (./5uoj.pdb.gz) is found in the target directory.
@> @!~# is not a valid identifier.
@> 1r39 downloaded (./1r39.pdb.gz)
@> PDB download completed (1 found, 1 downloaded, 1 failed).

Parse PDB files¶

ProDy offers a fast and flexible PDB parser, parsePDB(). Parser can be used to read well defined subsets of atoms, specific chains or models (in NMR structures) to boost the performance. This example shows how to use the flexible parsing options.

Three types of input are accepted from user:

PDB file path, e.g. "../1MKP.pdb"

compressed (gzipped) PDB file path, e.g. "5uoj.pdb.gz"

PDB identifier, e.g. 2k39

Output is an AtomGroup instance that stores atomic data and can be used as input to functions and classes for dynamics analysis.

Parse a file¶

You can parse PDB files by passing a filename (gzipped files are handled). We do so after downloading a PDB file (see Fetch PDB files for more information):

In [6]: fetchPDB('5uoj')
Out[6]: '5uoj.pdb.gz'

In [7]: atoms = parsePDB('5uoj')

In [8]: atoms
Out[8]: <AtomGroup: 5uoj (3138 atoms)>

Parser returns an AtomGroup instance.

Also note that the time it took to parse the file is printed on the screen. This includes the time that it takes to evaluate coordinate lines and build an AtomGroup instance and excludes the time spent on reading the file from disk.

Use an identifier¶

PDB files can be parsed by passing simply an identifier. Parser will look for a PDB file that matches the given identifier in the current working directory. If a matching file is not found, ProDy will downloaded it from PDB FTP server automatically and saved it in the current working directory.

In [9]: atoms = parsePDB('1mkp')

In [10]: atoms
Out[10]: <AtomGroup: 1mkp (1183 atoms)>

Subsets of atoms¶

Parser can be used to parse backbone or Cα atoms:

In [11]: backbone = parsePDB('1mkp', subset='bb')

In [12]: backbone
Out[12]: <AtomGroup: 1mkp_bb (576 atoms)>

In [13]: calpha = parsePDB('1mkp', subset='ca')

In [14]: calpha
Out[14]: <AtomGroup: 1mkp_ca (144 atoms)>

Specific chains¶

Parser can be used to parse a specific chain from a PDB file:

In [15]: chA = parsePDB('3mkb', chain='A')

In [16]: chA
Out[16]: <AtomGroup: 3mkbA (1198 atoms)>

In [17]: chC = parsePDB('3mkb', chain='C')

In [18]: chC
Out[18]: <AtomGroup: 3mkbC (1189 atoms)>

Multiple chains can also be parsed in the same way:

In [19]: chAC = parsePDB('3mkb', chain='AC')

In [20]: chAC
Out[20]: <AtomGroup: 3mkbAC (2387 atoms)>

Specific models¶

Parser can be used to parse a specific model from a file:

In [21]: model1 = parsePDB('2k39', model=10)

In [22]: model1
Out[22]: <AtomGroup: 2k39 (1231 atoms)>

Alternate locations¶

When a PDB file contains alternate locations for some of the atoms, by default alternate locations with indicator A are parsed.

In [23]: altlocA = parsePDB('1ejg')

In [24]: altlocA
Out[24]: <AtomGroup: 1ejg (637 atoms)>

Specific alternate locations can be parsed as follows:

In [25]: altlocB = parsePDB('1ejg', altloc='B')

In [26]: altlocB
Out[26]: <AtomGroup: 1ejg (634 atoms)>

Note that in this case number of atoms are different between the two atom groups. This is because the residue types of atoms with alternate locations are different.

Also, all alternate locations can be parsed as follows:

In [27]: all_altlocs = parsePDB('1ejg', altloc=True)

In [28]: all_altlocs
Out[28]: <AtomGroup: 1ejg (637 atoms; active #0 of 3 coordsets)>

Note that this time parser returned three coordinate sets. One for each alternate location indicator found in this file (A, B, C). When parsing multiple alternate locations, parser will expect for the same residue type for each atom with an alternate location. If residue names differ, a warning message will be printed.

Composite arguments¶

Parser can be used to parse coordinates from a specific model for a subset of atoms of a specific chain:

In [29]: composite = parsePDB('2k39', model=10, chain='A', subset='ca')

In [30]: composite
Out[30]: <AtomGroup: 2k39A_ca (76 atoms)>

Header data¶

PDB parser can be used to extract header data in a dict from PDB files as follows:

In [31]: atoms, header = parsePDB('1ubi', header=True)

In [32]: list(header)
Out[32]: 
['A',
 'related_entries',
 'sheet',
 'classification',
 'reference',
 'title',
 'sheet_range',
 'polymers',
 'resolution',
 'space_group',
 'helix_range',
 'chemicals',
 'experiment',
 'helix',
 'version',
 'authors',
 'identifier',
 'deposition_date',
 'biomoltrans']

In [33]: header['experiment']
Out[33]: 'X-RAY DIFFRACTION'

In [34]: header['resolution']
Out[34]: 1.8

It is also possible to parse only header data by passing model=0 as an argument:

In [35]: header = parsePDB('1ubi', header=True, model=0)

or using parsePDBHeader() function:

In [36]: header = parsePDBHeader('1ubi')

Write PDB file¶

PDB files can be written using writePDB() function. This example shows how to write PDB files for AtomGroup instances and subsets of atoms.

Write all atoms¶

All atoms in an AtomGroup can be written in PDB format as follows:

In [37]: writePDB('MKP3.pdb', atoms)
Out[37]: 'MKP3.pdb'

Upon successful writing of PDB file, filename is returned.

Write a subset¶

It is also possible to write subsets of atoms in PDB format:

In [38]: alpha_carbons = atoms.select('calpha')

In [39]: writePDB('1mkp_ca.pdb', alpha_carbons)
Out[39]: '1mkp_ca.pdb'

In [40]: backbone = atoms.select('backbone')

In [41]: writePDB('1mkp_bb.pdb', backbone)
Out[41]: '1mkp_bb.pdb'