# Structure Comparison¶

This section shows how to find identical or similar protein chains in two PDB files and align them.

`proteins`

module contains functions for matching and mapping
chains. Results can be used for RMSD fitting and PCA analysis.

Output will be `AtomMap`

instances that can be used as input
to ProDy classes and functions.

## Match chains¶

We start by importing everything from the ProDy package:

```
In [1]: from prody import *
In [2]: from pylab import *
In [3]: ion()
```

Matching chains is useful when comparing two structures. We will find matching chains in two different HIV Reverse Transcriptase structures.

First we define a function that prints information on paired (matched) chains:

```
In [4]: def printMatch(match):
...: print('Chain 1 : {}'.format(match[0]))
...: print('Chain 2 : {}'.format(match[1]))
...: print('Length : {}'.format(len(match[0])))
...: print('Seq identity: {}'.format(match[2]))
...: print('Seq overlap : {}'.format(match[3]))
...: print('RMSD : {}\n'.format(calcRMSD(match[0], match[1])))
...:
```

Now let’s parse bound RT structure 1vrt and unbound structure 1dlo:

```
In [5]: bound = parsePDB('1vrt')
In [6]: unbound = parsePDB('1dlo')
```

Let’s verify that these structures are not aligned:

```
In [7]: showProtein(unbound, bound);
In [8]: legend();
```

We find matching chains as follows:

```
In [9]: matches = matchChains(bound, unbound)
In [10]: for match in matches:
....: printMatch(match)
....:
Chain 1 : AtomMap Chain B from 1vrt -> Chain B from 1dlo
Chain 2 : AtomMap Chain B from 1dlo -> Chain B from 1vrt
Length : 400
Seq identity: 99.2518703242
Seq overlap : 96
RMSD : 110.45149192
Chain 1 : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Chain 2 : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Length : 524
Seq identity: 99.0458015267
Seq overlap : 94
RMSD : 142.084163869
```

This resulted in two matches. Chains A and B of two structures are paired. These chains contain only Cα atoms:

```
In [11]: match[0][0].iscalpha
Out[11]: True
In [12]: match[0][1].iscalpha
Out[12]: True
```

For a structural alignment based on both chains, we merge these matches as follows:

```
In [13]: bound_ca = matches[0][0] + matches[1][0]
In [14]: bound_ca
Out[14]: <AtomMap: (AtomMap Chain B from 1vrt -> Chain B from 1dlo) + (AtomMap Chain A from 1vrt -> Chain A from 1dlo) from 1vrt (924 atoms)>
In [15]: unbound_ca = matches[0][1] + matches[1][1]
In [16]: unbound_ca
Out[16]: <AtomMap: (AtomMap Chain B from 1dlo -> Chain B from 1vrt) + (AtomMap Chain A from 1dlo -> Chain A from 1vrt) from 1dlo (924 atoms)>
```

Let’s calculate RMSD:

```
In [17]: calcRMSD(bound_ca, unbound_ca)
Out[17]: 129.34348658001392
```

We find the transformation that minimizes RMSD between these two selections and apply it to unbound structure:

```
In [18]: calcTransformation(unbound_ca, bound_ca).apply(unbound);
In [19]: calcRMSD(bound_ca, unbound_ca)
Out[19]: 6.002074746562535
```

Let’s see the aligned structures now:

```
In [20]: showProtein(unbound, bound);
In [21]: legend();
```

By default, `matchChains()`

function matches Cα atoms.
*subset* argument allows for matching larger numbers of atoms.
We can match backbone atoms as follows:

```
In [22]: matches = matchChains(bound, unbound, subset='bb')
In [23]: for match in matches:
....: printMatch(match)
....:
Chain 1 : AtomMap Chain B from 1vrt -> Chain B from 1dlo
Chain 2 : AtomMap Chain B from 1dlo -> Chain B from 1vrt
Length : 1600
Seq identity: 99.2518703242
Seq overlap : 96
RMSD : 1.71102621571
Chain 1 : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Chain 2 : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Length : 2096
Seq identity: 99.0458015267
Seq overlap : 94
RMSD : 7.78386812028
```

Or, we can match all atoms as follows:

```
In [24]: matches = matchChains(bound, unbound, subset='all')
In [25]: for match in matches:
....: printMatch(match)
....:
Chain 1 : AtomMap Chain B from 1vrt -> Chain B from 1dlo
Chain 2 : AtomMap Chain B from 1dlo -> Chain B from 1vrt
Length : 3225
Seq identity: 99.2518703242
Seq overlap : 96
RMSD : 2.20947196284
Chain 1 : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Chain 2 : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Length : 4159
Seq identity: 99.0458015267
Seq overlap : 94
RMSD : 7.83814068858
```

## Map onto a chain¶

Mapping is different from matching. When chains are matched, all matching
atoms are returned as `AtomMap`

instances. When atoms
are mapped onto a *chain*, missing atoms are replaced by dummy atoms. The
length of the mapping is equal to the length of *chain*. Mapping is used
particularly useful in assembling coordinate data for the analysis of
heterogeneous datasets (see Ensemble Analysis).

Let’s map bound structure onto unbound chain A (subunit p66):

```
In [26]: def printMapping(mapping):
....: print('Mapped chain : {}'.format(mapping[0]))
....: print('Target chain : {}'.format(mapping[1]))
....: print('Mapping length : {}'.format(len(mapping[0])))
....: print('# of mapped atoms: {}'.format(mapping[0].numMapped()))
....: print('# of dummy atoms : {}'.format(mapping[0].numDummies()))
....: print('Sequence identity: {}'.format(mapping[2]))
....: print('Sequence overlap : {}\n'.format(mapping[3]))
....:
```

```
In [27]: unbound_hv = unbound.getHierView()
In [28]: unbound_A = unbound_hv['A']
In [29]: mappings = mapOntoChain(bound, unbound_A)
In [30]: for mapping in mappings:
....: printMapping(mapping)
....:
Mapped chain : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Mapping length : 4370
# of mapped atoms: 4159
# of dummy atoms : 211
Sequence identity: 99
Sequence overlap : 94
Mapped chain : AtomMap Chain B from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain B from 1vrt
Mapping length : 4370
# of mapped atoms: 3209
# of dummy atoms : 1161
Sequence identity: 99
Sequence overlap : 72
```

`mapOntoChain()`

mapped only Cα atoms. *subset* argument allows for
matching larger numbers of atoms. We can map backbone atoms as follows:

```
In [31]: mappings = mapOntoChain(bound, unbound_A, subset='bb')
In [32]: for mapping in mappings:
....: printMapping(mapping)
....:
Mapped chain : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Mapping length : 2224
# of mapped atoms: 2096
# of dummy atoms : 128
Sequence identity: 99
Sequence overlap : 94
Mapped chain : AtomMap Chain B from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain B from 1vrt
Mapping length : 2224
# of mapped atoms: 1604
# of dummy atoms : 620
Sequence identity: 99
Sequence overlap : 72
```

Or, we can map all atoms as follows:

```
In [33]: mappings = mapOntoChain(bound, unbound_A, subset='all')
In [34]: for mapping in mappings:
....: printMapping(mapping)
....:
Mapped chain : AtomMap Chain A from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain A from 1vrt
Mapping length : 4370
# of mapped atoms: 4159
# of dummy atoms : 211
Sequence identity: 99
Sequence overlap : 94
Mapped chain : AtomMap Chain B from 1vrt -> Chain A from 1dlo
Target chain : AtomMap Chain A from 1dlo -> Chain B from 1vrt
Mapping length : 4370
# of mapped atoms: 3209
# of dummy atoms : 1161
Sequence identity: 99
Sequence overlap : 72
```