openfold
是一个用 Python 和 PyTorch 实现的 AlphaFold2 的开源复现版,旨在提升蛋白质结构预测的可复现性、可扩展性以及研究友好性。它允许研究者在不开源 DeepMind 原始代码的情况下,自由地进行蛋白结构预测的训练和推理,并支持自定义模型改进。
OpenFold 既可以作为 Python 包被其他 Python 项目引用和使用,也可以通过独立的脚本完成各种实际任务,具有很强的灵活性和实用性。
1. 软件安装
git clone https://github.com/aqlaboratory/openfold.git
cd openfoldconda env create -f environment.yml
2. 下载对齐数据库
bash scripts/download_alphafold_dbs.sh data/
3. 下载模型参数
bash scripts/download_openfold_params.sh openfold/resources
4. 结构预测
单体结构推理
python3 run_pretrained_openfold.py \fasta_dir \data/pdb_mmcif/mmcif_files/ \--uniref90_database_path data/uniref90/uniref90.fasta \--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \--pdb70_database_path data/pdb70/pdb70 \--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \--config_preset "model_1_ptm" \--model_device "cuda:0" \--output_dir ./ \--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_2.pt
多聚体结构推理
python3 run_pretrained_openfold.py \fasta_dir \data/pdb_mmcif/mmcif_files/ \--uniref90_database_path data/uniref90/uniref90.fasta \--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa \--pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt \--uniref30_database_path data/uniref30/UniRef30_2021_03 \--uniprot_database_path data/uniprot/uniprot.fasta \--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \--hmmsearch_binary_path lib/conda/envs/openfold_venv/bin/hmmsearch \--hmmbuild_binary_path lib/conda/envs/openfold_venv/bin/hmmbuild \--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \--config_preset "model_1_multimer_v3" \--model_device "cuda:0" \--output_dir ./
python 示例代码
import torch
from openfold.config import model_config
from openfold.data.data_modules import UnirefDataModule
from openfold.model.model import AlphaFold
from openfold.np.protein import to_pdb
from openfold.utils.import_weights import import_jax_weights_
from openfold.utils.feats import process_fasta
from pathlib import Path# === 1. 准备输入序列 ===
fasta_path = "example.fasta" # 输入文件,格式:
# >test_seq
# MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQGKKTKF...# === 2. 模型配置 ===
model_name = "model_1_ptm"
ckpt_path = "path/to/model_1_ptm.pt" # 下载的模型参数
config = model_config(model_name)
model = AlphaFold(config)
params = torch.load(ckpt_path)
model.load_state_dict(params["model_state_dict"])
model.eval()# === 3. 预处理序列 ===
feature_dict = process_fasta(fasta_path, is_multimer=False)
batch = {k: torch.as_tensor(v).unsqueeze(0) for k, v in feature_dict.items()}
for k in batch:batch[k] = batch[k].to("cuda:0")model = model.to("cuda:0")# === 4. 执行推理 ===
with torch.no_grad():output = model(batch)# === 5. 保存预测结构 ===
protein = to_pdb(output, chain_index=batch["asym_id"][0])
with open("predicted_structure.pdb", "w") as f:f.write(protein)
参考:
GitHub - aqlaboratory/openfold: Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
openfold/docs/source/original_readme.md at main · aqlaboratory/openfold · GitHub