Extracting PPIs

The ppiref.extraction.PPIExtractor class enables extracting protein-protein interactions (PPIs) from PDB files based on inter-atomic distances. This is how the PPIRef dataset was created.

[2]:
from ppiref.extraction import PPIExtractor
from ppiref.definitions import PPIREF_TEST_DATA_DIR

Prepare a .pdb file. In this example, we will use the 1bui.pdb file from the Protein Data Bank which contains three interacting proteins: staphylokinase (chain C, pink), microplasmin (blue, chain A), and microplasmin (green, chain B). Further we will extract different types of protein-protein interfaces from the file.

720254c871a24a6e974f4cdad298867e

[3]:
pdb_file = PPIREF_TEST_DATA_DIR / 'pdb/1bui.pdb'

Initialize PPI extractor based on 10A contacts between heavy atoms. Additionally, calculate buried surface area (BSA) of PPIs (slow).

[4]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(
    out_dir=ppi_dir,
    kind='heavy',
    radius=10.,
    bsa=True  # buried surface area calculation is slow
)

Example 1. Extract all contact-based dimeric PPIs from a PDB file. This will extract three interfaces: A-C, A-B, and B-C.

[5]:
extractor.extract(pdb_file)

Example 2. Extract all contact-based dimeric PPIs between a subset of chains from a PDB file. In this example, this will lead to the same result as in Example 1 but may be useful for complexes containing more chains.

[6]:
extractor.extract(pdb_file, partners=['A', 'B', 'C'])

Example 3. Extract a contact-based PPI between two specified chains (dimer).

[7]:
extractor.extract(pdb_file, partners=['A', 'C'])

Example 4. Extract a contact-based PPI between three specified chains (trimer).

[8]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(
    out_dir=ppi_dir,
    join=True  # enables joining all pairwise dimeric interfaces into a single oligomeric interface
)
extractor.extract(pdb_file, partners=['A', 'B', 'C'])

Example 5. Extract a complete dimer complex by setting high expansion radius around interface (for example purposes).

[9]:
ppi_complexes_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir_complexes'
extractor_complexes = PPIExtractor(
    out_dir=ppi_complexes_dir,
    kind='heavy',
    radius=6.,
    expansion_radius=1_000_000.
)
extractor_complexes.extract(pdb_file, partners=['A', 'C'])

Example 6. Extract all PPIs from all .pdb files in a directory in parallel.

[10]:
extractor = PPIExtractor(out_dir=ppi_dir, max_workers=2)
pdb_dir = PPIREF_TEST_DATA_DIR / 'pdb'
extractor.extract_parallel(pdb_dir)
Collecting input files: 100%|██████████| 8/8 [00:00<00:00, 3921.28it/s]
Filtering input files with pattern '.*\.pdb: 100%|██████████| 8/8 [00:00<00:00, 11052.18it/s]
Filtering processed files: 100%|██████████| 8/8 [00:00<00:00, 85163.53it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
[06/21/24 20:33:25] WARNING  To use the Graphein submodule      embeddings.py:34
                             graphein.protein.features.sequence
                             .embeddings, you need to install:
                             torch
                             To do so, use the following
                             command: conda install -c pytorch
                             torch
                    WARNING  To use the Graphein submodule      embeddings.py:45
                             graphein.protein.features.sequence
                             .embeddings, you need to install:
                             biovec
                             biovec cannot be installed via
                             conda
                             Alternatively, you can install
                             graphein with the extras:

                             pip install graphein[extras]
[06/21/24 20:33:26] WARNING  To use the Graphein submodule   visualisation.py:36
                             graphein.protein.visualisation,
                             you need to install: pytorch3d
                             To do so, use the following
                             command: conda install -c
                             pytorch3d pytorch3d
                    WARNING  To use the Graphein submodule          meshes.py:30
                             graphein.protein.meshes, you need to
                             install: pytorch3d
                             To do so, use the following command:
                             conda install -c pytorch3d pytorch3d
100%|██████████| 1/1 [00:06<00:00,  6.26s/it]

Print all the extracted files.

[11]:
for path in ppi_dir.rglob('*.pdb'):
    print(path.relative_to(ppi_dir.parent))
ppi_dir/k3/1k3f_B_D.pdb
ppi_dir/k3/1k3f_B_F.pdb
ppi_dir/k3/1k3f_D_E.pdb
ppi_dir/k3/1k3f_B_C.pdb
ppi_dir/k3/1k3f_D_F.pdb
ppi_dir/k3/1k3f_C_F.pdb
ppi_dir/k3/1k3f_A_D.pdb
ppi_dir/k3/1k3f_A_E.pdb
ppi_dir/k3/1k3f_C_E.pdb
ppi_dir/k3/1k3f_A_B.pdb
ppi_dir/k3/1k3f_E_F.pdb
ppi_dir/k3/1k3f_A_C.pdb
ppi_dir/p7/1p7z_B_D.pdb
ppi_dir/p7/1p7z_B_C.pdb
ppi_dir/p7/1p7z_C_D.pdb
ppi_dir/p7/1p7z_A_D.pdb
ppi_dir/p7/1p7z_A_B.pdb
ppi_dir/p9/3p9r_B_C.pdb
ppi_dir/p9/3p9r_A_C.pdb
ppi_dir/p9/3p9r_A_B.pdb
ppi_dir/p9/3p9r_C_D.pdb
ppi_dir/p9/3p9r_A_D.pdb
ppi_dir/0g/10gs_A_B.pdb
ppi_dir/a0/1a0n_A_B.pdb
ppi_dir/a0/1a02_F_J.pdb
ppi_dir/a0/1a02_F_N.pdb
ppi_dir/a0/1a02_J_N.pdb
ppi_dir/ah/1ahw_A_C.pdb
ppi_dir/ah/1ahw_E_F.pdb
ppi_dir/ah/1ahw_A_B.pdb
ppi_dir/ah/1ahw_A_F.pdb
ppi_dir/ah/1ahw_D_F.pdb
ppi_dir/ah/1ahw_B_C.pdb
ppi_dir/ah/1ahw_D_E.pdb
ppi_dir/bu/1bui_A_B_C.pdb
ppi_dir/bu/1bui_A_C.pdb
ppi_dir/bu/1bui_A_B.pdb
ppi_dir/bu/1bui_B_C.pdb