Comparing PPIs

The PPIRef package provides wrappers for iAlign and US-align, as well as their scalable approximation iDist (used to construct the PPIRef dataset) for comparing PPI structures. Additionally it provides a sequence identity comparator to compare PPIs by their sequences.

📌 Using wrappers for iAlign and US-align requires their installation. Please refer to the Reference API documentation for details.

[2]:
from ppiref.comparison import IAlign, USalign, IDist, SequenceIdentityComparator, FoldseekMMComparator
from ppiref.extraction import PPIExtractor
from ppiref.definitions import PPIREF_TEST_DATA_DIR

# Suppress BioPython warnings
import warnings
from Bio import BiopythonWarning
warnings.simplefilter('ignore', BiopythonWarning)

# Suppress Graphein log
from loguru import logger
logger.disable('graphein')

Prepare near-duplicate PPIs from Figure 1 in the “Learning to design protein-protein interactions with enhanced generalization” paper.

408cb1f0b0b84fd799317fa524d9a4e6

[4]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(out_dir=ppi_dir, kind='heavy', radius=6., bsa=False)
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/1p7z.pdb', partners=['A', 'C'])
extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/3p9r.pdb', partners=['B', 'D'])
ppis = [ppi_dir / 'p7/1p7z_A_C.pdb', ppi_dir / 'p9/3p9r_B_D.pdb']

Example 1. Compare PPIs with iAlign. iAlign is the original adaption of TM-align to protein-protein interfaces. TM-align is based on 3D alignment of protein structures. High IS-score and low P-value produced by iAlign indicate high similarity.

[4]:
ialign = IAlign()
ialign.compare(*ppis)
[4]:
{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'IS-score': 0.95822,
 'P-value': 8.22e-67,
 'Z-score': 152.167,
 'Number of aligned residues': 249,
 'Number of aligned contacts': 347,
 'RMSD': 0.37,
 'Seq identity': 0.992}

Example 2. Compare PPIs with US-align. US-align is a more recent adaption of TM-align, designed as a universal comparison method for different kinds of macromolecules. High TM-scores in both directions (TM1 amd TM2) indicate high similarity.

[5]:
usalign = USalign()
usalign.compare(*ppis)
[5]:
{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'TM1': 0.984,
 'TM2': 0.984,
 'RMSD': 0.35,
 'ID1': 0.979,
 'ID2': 0.979,
 'IDali': 0.993,
 'L1': 289,
 'L2': 289,
 'Lali': 285}

Example 3. Compare PPIs with Foldseek-MM. Foldseek-MM is designed to compare protein-protein complexes by applying Foldseek to all partners and finding the best-scoring alignment of the whole complexes. Here, we use the method to compare protein-protein interfaces, similar to Foldseek-MM in the interface mode. Similar to iAlign and US-align, Foldseek-MM produces a TM-score. The high TM-score indicates high similarity.

[6]:
foldseek_mm = FoldseekMMComparator()
foldseek_mm.compare(*ppis)
[6]:
{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'Foldseek-MM TM-score (normalized by query PPI0 length)': 0.98084,
 'Foldseek-MM TM-score (normalized by target PPI1 length)': 0.98084,
 'Matched chains in the query PPI0 complex': 'A,C',
 'Matched chains in the target PPI1 complex': 'D,B'}

Example 4. Compare by maximum pairwise sequence identity. High sequence identity indicates high similarity. Comparing PPIs based on sequences requires a path to the directory storing complete PDB files, used to extract the PPIs.

[6]:
seqid = SequenceIdentityComparator(pdb_dir=PPIREF_TEST_DATA_DIR / 'pdb')
seqid.compare(*ppis)
[6]:
{'PPI0': '1p7z_A_C',
 'PPI1': '3p9r_B_D',
 'Maximum pairwise sequence identity': 0.9944979367262724}

Example 5. Compare with iDist. iDist is an efficient approximation of 3D alignment-based methods. Low iDist distance indicates high similarity (below 0.04 is considered near-duplicate for 6A distance interfaces).

[7]:
idist = IDist()
idist.compare(*ppis)
[7]:
{'PPI0': '1p7z_A_C', 'PPI1': '3p9r_B_D', 'iDist': 0.0034661771664121184}

Example 6. Compare PPIs pairwise with iDist. Pairwise comparison in parallel is available for other methods as well but does not scale to large datasets.

[8]:
idist = IDist(max_workers=2)
idist.compare_all_against_all(ppis, ppis)
Embedding PPIs (2 processes):   0%|          | 0/2 [00:00<?, ?it/s]Embedding PPIs (2 processes): 100%|██████████| 2/2 [00:04<00:00,  2.49s/it]
[8]:
PPI0 PPI1 iDist
0 1p7z_A_C 1p7z_A_C 0.000000
1 1p7z_A_C 3p9r_B_D 0.003466
2 3p9r_B_D 1p7z_A_C 0.003466
3 3p9r_B_D 3p9r_B_D 0.000000