{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Comparing PPIs\n", "\n", "The PPIRef package provides wrappers for [iAlign](https://doi.org/10.1093/bioinformatics/btq404) and [US-align](https://www.biorxiv.org/content/10.1101/2022.04.18.488565v1), as well as their scalable approximation [iDist](https://arxiv.org/pdf/2310.18515.pdf) (used to construct the PPIRef dataset) for comparing PPI structures. Additionally it provides a sequence identity comparator to compare PPIs by their sequences.\n", "\n", "> 📌 Using wrappers for iAlign and US-align requires their installation. Please refer to the Reference API documentation for details." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from ppiref.comparison import IAlign, USalign, IDist, SequenceIdentityComparator, FoldseekMMComparator\n", "from ppiref.extraction import PPIExtractor\n", "from ppiref.definitions import PPIREF_TEST_DATA_DIR\n", "\n", "# Suppress BioPython warnings\n", "import warnings\n", "from Bio import BiopythonWarning\n", "warnings.simplefilter('ignore', BiopythonWarning)\n", "\n", "# Suppress Graphein log\n", "from loguru import logger\n", "logger.disable('graphein')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Prepare near-duplicate PPIs from Figure 1 in the [\"Learning to design protein-protein interactions with enhanced generalization\"](https://arxiv.org/pdf/2310.18515.pdf) paper.\n", "\n", "

\n", " \n", "

" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'\n", "extractor = PPIExtractor(out_dir=ppi_dir, kind='heavy', radius=6., bsa=False)\n", "extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/1p7z.pdb', partners=['A', 'C'])\n", "extractor.extract(PPIREF_TEST_DATA_DIR / 'pdb/3p9r.pdb', partners=['B', 'D'])\n", "ppis = [ppi_dir / 'p7/1p7z_A_C.pdb', ppi_dir / 'p9/3p9r_B_D.pdb']" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 1**. Compare PPIs with [iAlign](https://doi.org/10.1093/bioinformatics/btq404). iAlign is the original adaption of [TM-align](https://doi.org/10.1093/nar/gki524) to protein-protein interfaces. TM-align is based on 3D alignment of protein structures. High `IS-score` and low `P-value` produced by iAlign indicate high similarity." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PPI0': '1p7z_A_C',\n", " 'PPI1': '3p9r_B_D',\n", " 'IS-score': 0.95822,\n", " 'P-value': 8.22e-67,\n", " 'Z-score': 152.167,\n", " 'Number of aligned residues': 249,\n", " 'Number of aligned contacts': 347,\n", " 'RMSD': 0.37,\n", " 'Seq identity': 0.992}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ialign = IAlign()\n", "ialign.compare(*ppis)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 2.** Compare PPIs with [US-align](https://www.biorxiv.org/content/10.1101/2022.04.18.488565v1). US-align is a more recent adaption of [TM-align](https://doi.org/10.1093/nar/gki524), designed as a universal comparison method for different kinds of macromolecules. High TM-scores in both directions (`TM1` amd `TM2`) indicate high similarity." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PPI0': '1p7z_A_C',\n", " 'PPI1': '3p9r_B_D',\n", " 'TM1': 0.984,\n", " 'TM2': 0.984,\n", " 'RMSD': 0.35,\n", " 'ID1': 0.979,\n", " 'ID2': 0.979,\n", " 'IDali': 0.993,\n", " 'L1': 289,\n", " 'L2': 289,\n", " 'Lali': 285}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usalign = USalign()\n", "usalign.compare(*ppis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example 3.** Compare PPIs with [Foldseek-MM](https://www.biorxiv.org/content/10.1101/2024.04.14.589414v1). Foldseek-MM is designed to compare protein-protein complexes by applying Foldseek to all partners and finding the best-scoring alignment of the whole complexes. Here, we use the method to compare protein-protein interfaces, similar to Foldseek-MM in the [interface mode](https://github.com/steineggerlab/foldseek/pull/330). Similar to iAlign and US-align, Foldseek-MM produces a TM-score. The high TM-score indicates high similarity.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PPI0': '1p7z_A_C',\n", " 'PPI1': '3p9r_B_D',\n", " 'Foldseek-MM TM-score (normalized by query PPI0 length)': 0.98084,\n", " 'Foldseek-MM TM-score (normalized by target PPI1 length)': 0.98084,\n", " 'Matched chains in the query PPI0 complex': 'A,C',\n", " 'Matched chains in the target PPI1 complex': 'D,B'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foldseek_mm = FoldseekMMComparator()\n", "foldseek_mm.compare(*ppis)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 4.** Compare by maximum pairwise sequence identity. High sequence identity indicates high similarity. Comparing PPIs based on sequences requires a path to the directory storing complete PDB files, used to extract the PPIs." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PPI0': '1p7z_A_C',\n", " 'PPI1': '3p9r_B_D',\n", " 'Maximum pairwise sequence identity': 0.9944979367262724}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seqid = SequenceIdentityComparator(pdb_dir=PPIREF_TEST_DATA_DIR / 'pdb')\n", "seqid.compare(*ppis)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 5.** Compare with [iDist](https://arxiv.org/pdf/2310.18515.pdf). iDist is an efficient approximation of 3D alignment-based methods. Low iDist distance indicates high similarity (below 0.04 is considered near-duplicate for 6A distance interfaces)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'PPI0': '1p7z_A_C', 'PPI1': '3p9r_B_D', 'iDist': 0.0034661771664121184}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idist = IDist()\n", "idist.compare(*ppis)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 6.** Compare PPIs pairwise with iDist. Pairwise comparison in parallel is available for other methods as well but does not scale to large datasets." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Embedding PPIs (2 processes): 0%| | 0/2 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PPI0PPI1iDist
01p7z_A_C1p7z_A_C0.000000
11p7z_A_C3p9r_B_D0.003466
23p9r_B_D1p7z_A_C0.003466
33p9r_B_D3p9r_B_D0.000000
\n", "" ], "text/plain": [ " PPI0 PPI1 iDist\n", "0 1p7z_A_C 1p7z_A_C 0.000000\n", "1 1p7z_A_C 3p9r_B_D 0.003466\n", "2 3p9r_B_D 1p7z_A_C 0.003466\n", "3 3p9r_B_D 3p9r_B_D 0.000000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idist = IDist(max_workers=2)\n", "idist.compare_all_against_all(ppis, ppis)" ] } ], "metadata": { "kernelspec": { "display_name": "ppiref", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }