Comparing PPIs
The PPIRef package provides wrappers for iAlign and US-align, as well as their scalable approximation iDist (used to construct the PPIRef dataset) for comparing PPI structures. Additionally it provides a sequence identity comparator to compare PPIs by their sequences.
📌 Using wrappers for iAlign and US-align requires their installation. Please refer to the Reference API documentation for details.
[2]:
from ppiref.comparison import IAlign, USalign, IDist, SequenceIdentityComparator, FoldseekMMComparator
from ppiref.extraction import PPIExtractor
from ppiref.definitions import PPIREF_TEST_DATA_DIR
# Suppress BioPython warnings
import warnings
from Bio import BiopythonWarning
warnings.simplefilter('ignore', BiopythonWarning)
# Suppress Graphein log
from loguru import logger
logger.disable('graphein')
Prepare near-duplicate PPIs from Figure 1 in the “Learning to design protein-protein interactions with enhanced generalization” paper.

[4]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(out_dir=ppi_dir, kind='heavy', radius=6., bsa=False)
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/1p7z.pdb', partners=['A', 'C'])
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/3p9r.pdb', partners=['B', 'D'])
ppis = [ppi_dir / 'p7/1p7z_A_C.pdb', ppi_dir / 'p9/3p9r_B_D.pdb']
Example 1. Compare PPIs with iAlign. iAlign is the original adaption of TM-align to protein-protein interfaces. TM-align is based on 3D alignment of protein structures. High IS-score and low P-value produced by iAlign indicate high similarity.
[4]:
ialign = IAlign()
ialign.compare(*ppis)
[4]:
{'PPI0': '1p7z_A_C',
'PPI1': '3p9r_B_D',
'IS-score': 0.95822,
'P-value': 8.22e-67,
'Z-score': 152.167,
'Number of aligned residues': 249,
'Number of aligned contacts': 347,
'RMSD': 0.37,
'Seq identity': 0.992}
Example 2. Compare PPIs with US-align. US-align is a more recent adaption of TM-align, designed as a universal comparison method for different kinds of macromolecules. High TM-scores in both directions (TM1 amd TM2) indicate high similarity.
[5]:
usalign = USalign()
usalign.compare(*ppis)
[5]:
{'PPI0': '1p7z_A_C',
'PPI1': '3p9r_B_D',
'TM1': 0.984,
'TM2': 0.984,
'RMSD': 0.35,
'ID1': 0.979,
'ID2': 0.979,
'IDali': 0.993,
'L1': 289,
'L2': 289,
'Lali': 285}
Example 3. Compare PPIs with Foldseek-MM. Foldseek-MM is designed to compare protein-protein complexes by applying Foldseek to all partners and finding the best-scoring alignment of the whole complexes. Here, we use the method to compare protein-protein interfaces, similar to Foldseek-MM in the interface mode. Similar to iAlign and US-align, Foldseek-MM produces a TM-score. The high TM-score indicates high similarity.
[6]:
foldseek_mm = FoldseekMMComparator()
foldseek_mm.compare(*ppis)
[6]:
{'PPI0': '1p7z_A_C',
'PPI1': '3p9r_B_D',
'Foldseek-MM TM-score (normalized by query PPI0 length)': 0.98084,
'Foldseek-MM TM-score (normalized by target PPI1 length)': 0.98084,
'Matched chains in the query PPI0 complex': 'A,C',
'Matched chains in the target PPI1 complex': 'D,B'}
Example 4. Compare by maximum pairwise sequence identity. High sequence identity indicates high similarity. Comparing PPIs based on sequences requires a path to the directory storing complete PDB files, used to extract the PPIs.
[6]:
seqid = SequenceIdentityComparator(pdb_dir=PPIREF_TEST_DATA_DIR / 'pdb')
seqid.compare(*ppis)
[6]:
{'PPI0': '1p7z_A_C',
'PPI1': '3p9r_B_D',
'Maximum pairwise sequence identity': 0.9944979367262724}
Example 5. Compare with iDist. iDist is an efficient approximation of 3D alignment-based methods. Low iDist distance indicates high similarity (below 0.04 is considered near-duplicate for 6A distance interfaces).
[7]:
idist = IDist()
idist.compare(*ppis)
[7]:
{'PPI0': '1p7z_A_C', 'PPI1': '3p9r_B_D', 'iDist': 0.0034661771664121184}
Example 6. Compare PPIs pairwise with iDist. Pairwise comparison in parallel is available for other methods as well but does not scale to large datasets.
[8]:
idist = IDist(max_workers=2)
idist.compare_all_against_all(ppis, ppis)
Embedding PPIs (2 processes): 0%| | 0/2 [00:00<?, ?it/s]Embedding PPIs (2 processes): 100%|██████████| 2/2 [00:04<00:00, 2.49s/it]
[8]:
| PPI0 | PPI1 | iDist | |
|---|---|---|---|
| 0 | 1p7z_A_C | 1p7z_A_C | 0.000000 |
| 1 | 1p7z_A_C | 3p9r_B_D | 0.003466 |
| 2 | 3p9r_B_D | 1p7z_A_C | 0.003466 |
| 3 | 3p9r_B_D | 3p9r_B_D | 0.000000 |