Generally speaking, inverse folding models are built for predicting protein sequences from protein backbones. While it is often used as part of a protein design pipeline to generate multiple sequences for designed binders, their probabilistic nature makes them attractive for a variety of tasks, such as predicting the effect of residue substitutions.

Table of contents

Methods
1. ProteinMPNN
2. LigandMPNN
3. ESM-IF1
4. PiFold
5. GraDe_IF
6. ProRefiner
7. nanand2/proteins
Benchmarks
1. ProteinInvBench
Additional resources

Methods

ProteinMPNN

ProteinMPNN: Robust deep learning–based protein sequence design using ProteinMPNN

Paper GitHub Colab HuggingFace YouTube

LigandMPNN

LigandMPNN: Atomic context-conditioned protein sequence design using LigandMPNN

Paper GitHub Colab YouTube

ESM-IF1

ESM-IF1: A high-level programming language for generative protein design

Paper GitHub Colab

This model predicts protein sequences from backbone atom coordinates, trained on AF2 predicted structures. The model consists of invariant geometric input processing layers followed by a sequence-to-sequence transformer and can predict sequences for partially masked structures.