Extracting PPIs
The ppiref.extraction.PPIExtractor class enables extracting protein-protein interactions (PPIs) from PDB files based on inter-atomic distances. This is how the PPIRef dataset was created.
[2]:
from ppiref.extraction import PPIExtractor
from ppiref.definitions import PPIREF_TEST_DATA_DIR
Prepare a .pdb file. In this example, we will use the 1bui.pdb file from the Protein Data Bank which contains three interacting proteins: staphylokinase (chain C, pink), microplasmin (blue, chain A), and microplasmin (green, chain B). Further we will extract different types of protein-protein interfaces from the file.

[3]:
pdb_file = PPIREF_TEST_DATA_DIR / 'pdb/1bui.pdb'
Initialize PPI extractor based on 10A contacts between heavy atoms. Additionally, calculate buried surface area (BSA) of PPIs (slow).
[4]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(
out_dir=ppi_dir,
kind='heavy',
radius=10.,
bsa=True # buried surface area calculation is slow
)
Example 1. Extract all contact-based dimeric PPIs from a PDB file. This will extract three interfaces: A-C, A-B, and B-C.
[5]:
extractor.extract(pdb_file)
Example 2. Extract all contact-based dimeric PPIs between a subset of chains from a PDB file. In this example, this will lead to the same result as in Example 1 but may be useful for complexes containing more chains.
[6]:
extractor.extract(pdb_file, partners=['A', 'B', 'C'])
Example 3. Extract a contact-based PPI between two specified chains (dimer).
[7]:
extractor.extract(pdb_file, partners=['A', 'C'])
Example 4. Extract a contact-based PPI between three specified chains (trimer).
[8]:
ppi_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir'
extractor = PPIExtractor(
out_dir=ppi_dir,
join=True # enables joining all pairwise dimeric interfaces into a single oligomeric interface
)
extractor.extract(pdb_file, partners=['A', 'B', 'C'])
Example 5. Extract a complete dimer complex by setting high expansion radius around interface (for example purposes).
[9]:
ppi_complexes_dir = PPIREF_TEST_DATA_DIR / 'ppi_dir_complexes'
extractor_complexes = PPIExtractor(
out_dir=ppi_complexes_dir,
kind='heavy',
radius=6.,
expansion_radius=1_000_000.
)
extractor_complexes.extract(pdb_file, partners=['A', 'C'])
Example 6. Extract all PPIs from all .pdb files in a directory in parallel.
[10]:
extractor = PPIExtractor(out_dir=ppi_dir, max_workers=2)
pdb_dir = PPIREF_TEST_DATA_DIR / 'pdb'
extractor.extract_parallel(pdb_dir)
Collecting input files: 100%|██████████| 8/8 [00:00<00:00, 3921.28it/s]
Filtering input files with pattern '.*\.pdb: 100%|██████████| 8/8 [00:00<00:00, 11052.18it/s]
Filtering processed files: 100%|██████████| 8/8 [00:00<00:00, 85163.53it/s]
0%| | 0/1 [00:00<?, ?it/s]
[06/21/24 20:33:25] WARNING To use the Graphein submodule embeddings.py:34
graphein.protein.features.sequence
.embeddings, you need to install:
torch
To do so, use the following
command: conda install -c pytorch
torch
WARNING To use the Graphein submodule embeddings.py:45
graphein.protein.features.sequence
.embeddings, you need to install:
biovec
biovec cannot be installed via
conda
Alternatively, you can install
graphein with the extras:
pip install graphein[extras]
[06/21/24 20:33:26] WARNING To use the Graphein submodule visualisation.py:36
graphein.protein.visualisation,
you need to install: pytorch3d
To do so, use the following
command: conda install -c
pytorch3d pytorch3d
WARNING To use the Graphein submodule meshes.py:30
graphein.protein.meshes, you need to
install: pytorch3d
To do so, use the following command:
conda install -c pytorch3d pytorch3d
100%|██████████| 1/1 [00:06<00:00, 6.26s/it]
Print all the extracted files.
[11]:
for path in ppi_dir.rglob('*.pdb'):
print(path.relative_to(ppi_dir.parent))
ppi_dir/k3/1k3f_B_D.pdb
ppi_dir/k3/1k3f_B_F.pdb
ppi_dir/k3/1k3f_D_E.pdb
ppi_dir/k3/1k3f_B_C.pdb
ppi_dir/k3/1k3f_D_F.pdb
ppi_dir/k3/1k3f_C_F.pdb
ppi_dir/k3/1k3f_A_D.pdb
ppi_dir/k3/1k3f_A_E.pdb
ppi_dir/k3/1k3f_C_E.pdb
ppi_dir/k3/1k3f_A_B.pdb
ppi_dir/k3/1k3f_E_F.pdb
ppi_dir/k3/1k3f_A_C.pdb
ppi_dir/p7/1p7z_B_D.pdb
ppi_dir/p7/1p7z_B_C.pdb
ppi_dir/p7/1p7z_C_D.pdb
ppi_dir/p7/1p7z_A_D.pdb
ppi_dir/p7/1p7z_A_B.pdb
ppi_dir/p9/3p9r_B_C.pdb
ppi_dir/p9/3p9r_A_C.pdb
ppi_dir/p9/3p9r_A_B.pdb
ppi_dir/p9/3p9r_C_D.pdb
ppi_dir/p9/3p9r_A_D.pdb
ppi_dir/0g/10gs_A_B.pdb
ppi_dir/a0/1a0n_A_B.pdb
ppi_dir/a0/1a02_F_J.pdb
ppi_dir/a0/1a02_F_N.pdb
ppi_dir/a0/1a02_J_N.pdb
ppi_dir/ah/1ahw_A_C.pdb
ppi_dir/ah/1ahw_E_F.pdb
ppi_dir/ah/1ahw_A_B.pdb
ppi_dir/ah/1ahw_A_F.pdb
ppi_dir/ah/1ahw_D_F.pdb
ppi_dir/ah/1ahw_B_C.pdb
ppi_dir/ah/1ahw_D_E.pdb
ppi_dir/bu/1bui_A_B_C.pdb
ppi_dir/bu/1bui_A_C.pdb
ppi_dir/bu/1bui_A_B.pdb
ppi_dir/bu/1bui_B_C.pdb