Generally speaking, inverse folding models are built for predicting protein sequences from protein backbones. While it is often used as part of a protein design pipeline to generate multiple sequences for designed binders, their probabilistic nature makes them attractive for a variety of tasks, such as predicting the effect of residue substitutions.

Table of contents
  1. Methods
    1. ProteinMPNN
    2. LigandMPNN
    3. ESM-IF1
    4. PiFold
    5. GraDe_IF
    6. ProRefiner
    7. nanand2/proteins
  2. Benchmarks
    1. ProteinInvBench
  3. Additional resources

Methods

ProteinMPNN

ProteinMPNN: Robust deep learning–based protein sequence design using ProteinMPNN

Paper GitHub Colab HuggingFace YouTube

LigandMPNN

LigandMPNN: Atomic context-conditioned protein sequence design using LigandMPNN

Paper GitHub Colab YouTube

ESM-IF1

ESM-IF1: A high-level programming language for generative protein design

Paper GitHub Colab

This model predicts protein sequences from backbone atom coordinates, trained on AF2 predicted structures. The model consists of invariant geometric input processing layers followed by a sequence-to-sequence transformer and can predict sequences for partially masked structures.

PiFold

PiFold: Toward effective and efficient protein inverse folding

Paper GitHub Colab

GraDe_IF

GraDe_IF: Graph Denoising Diffusion for Inverse Protein Folding

Paper GitHub

ProRefiner

ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention

Paper GitHub Colab CodeOcean

nanand2/proteins

Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models

Paper GitHub YouTube

Benchmarks

ProteinInvBench

ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics

Paper GitHub

Additional resources

  • Knowledge-Design: Pushing the Limit of Protein Design via Knowledge Refinement