{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Extracting PPIs\n", "\n", "The ``ppiref.extraction.PPIExtractor`` class enables extracting protein-protein interactions (PPIs) from PDB files based on inter-atomic distances. This is how the PPIRef dataset was created." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from ppiref.extraction import PPIExtractor\n", "from ppiref.definitions import PPIREF_TEST_DATA_DIR" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Prepare a .pdb file. In this example, we will use the [1bui.pdb](https://www.rcsb.org/structure/1bui) file from the Protein Data Bank which contains three interacting proteins: staphylokinase (chain C, pink), microplasmin (blue, chain A), and microplasmin (green, chain B). Further we will extract different types of protein-protein interfaces from the file.\n", "\n", "

\n", " \n", "

" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "pdb_file = PPIREF_TEST_DATA_DIR / 'pdb/1bui.pdb'" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Initialize PPI extractor based on 10A contacts between heavy atoms. Additionally, calculate buried surface area (BSA) of PPIs (slow)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'\n", "extractor = PPIExtractor(\n", " out_dir=ppi_dir,\n", " kind='heavy',\n", " radius=10.,\n", " bsa=True # buried surface area calculation is slow\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 1.** Extract all contact-based dimeric PPIs from a PDB file. This will extract three interfaces: A-C, A-B, and B-C." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "extractor.extract(pdb_file)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 2.** Extract all contact-based dimeric PPIs between a subset of chains from a PDB file. In this example, this will lead to the same result as in Example 1 but may be useful for complexes containing more chains." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "extractor.extract(pdb_file, partners=['A', 'B', 'C'])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 3.** Extract a contact-based PPI between two specified chains (dimer)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "extractor.extract(pdb_file, partners=['A', 'C'])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 4.** Extract a contact-based PPI between three specified chains (trimer)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'\n", "extractor = PPIExtractor(\n", " out_dir=ppi_dir,\n", " join=True # enables joining all pairwise dimeric interfaces into a single oligomeric interface\n", ")\n", "extractor.extract(pdb_file, partners=['A', 'B', 'C'])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 5.** Extract a complete dimer complex by setting high expansion radius around interface (for example purposes)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "ppi_complexes_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir_complexes'\n", "extractor_complexes = PPIExtractor(\n", " out_dir=ppi_complexes_dir,\n", " kind='heavy',\n", " radius=6.,\n", " expansion_radius=1_000_000.\n", ")\n", "extractor_complexes.extract(pdb_file, partners=['A', 'C'])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "**Example 6.** Extract all PPIs from all .pdb files in a directory in parallel." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Collecting input files: 100%|██████████| 8/8 [00:00<00:00, 3921.28it/s]\n", "Filtering input files with pattern '.*\\.pdb: 100%|██████████| 8/8 [00:00<00:00, 11052.18it/s]\n", "Filtering processed files: 100%|██████████| 8/8 [00:00<00:00, 85163.53it/s]\n", " 0%| | 0/1 [00:00