{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Retrieving PPIs\n", "\n", "The package enables to search the Protein Data Bank (PDB) for protein-protein interactions (PPIs) similar to your query PPI. The search can be performed based on the interface structure or protein sequence of interest." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from ppiref.comparison import IDist\n", "from ppiref.retrieval import MMSeqs2PPIRetriever\n", "from ppiref.definitions import PPIREF_DATA_DIR, PPIREF_TEST_DATA_DIR\n", "import pandas as pd\n", "\n", "# Suppress Graphein log\n", "from loguru import logger\n", "logger.disable('graphein')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we will use the near-duplicate homooligomeric PPIs that involve different sequences (taken from Figure 3 in the [\"Revealing data leakage in protein interaction benchmarks\"](https://arxiv.org/abs/2404.10457) paper). We will try to retrieve PPIs from the PDB that are similar to one of the entries (1k3f) aiming to retrieve another one (1k9s).\n", "\n", "
\n",
"
\n",
"
| \n", " | PPI | \n", "iDist | \n", "
|---|---|---|
| 0 | \n", "1k3f_C_E | \n", "0.000000 | \n", "
| 1 | \n", "1k3f_A_D | \n", "0.019316 | \n", "
| 2 | \n", "1u1g_C_D | \n", "0.029032 | \n", "
| 3 | \n", "1sj9_A_F | \n", "0.029668 | \n", "
| 4 | \n", "8a7d_C_Q | \n", "0.029722 | \n", "
| 5 | \n", "5efo_A_B | \n", "0.029956 | \n", "
| 6 | \n", "2hrd_A_F | \n", "0.030052 | \n", "
| 7 | \n", "1sj9_B_D | \n", "0.030148 | \n", "
| 8 | \n", "1u1e_C_D | \n", "0.030332 | \n", "
| 9 | \n", "1u1d_C_D | \n", "0.030373 | \n", "
| \n", " | PPI | \n", "Sequnce similarity | \n", "Chain | \n", "
|---|---|---|---|
| 0 | \n", "1u1c_A_B | \n", "1.0 | \n", "A | \n", "
| 1 | \n", "1u1c_A_C | \n", "1.0 | \n", "A | \n", "
| 2 | \n", "1rxs_M_m | \n", "1.0 | \n", "m | \n", "
| 3 | \n", "1rxs_N_m | \n", "1.0 | \n", "m | \n", "
| 4 | \n", "1rxu_E_F | \n", "1.0 | \n", "F | \n", "
| 5 | \n", "1rxu_A_F | \n", "1.0 | \n", "F | \n", "
| 6 | \n", "1u1e_C_D | \n", "1.0 | \n", "D | \n", "
| 7 | \n", "1u1e_D_E | \n", "1.0 | \n", "D | \n", "
| 8 | \n", "1rxs_M_o | \n", "1.0 | \n", "o | \n", "
| 9 | \n", "1rxs_O_o | \n", "1.0 | \n", "o | \n", "