Table of contents
Structure-based
FoldSeek
Fast and accurate protein structure search with Foldseek
Progres
Fast protein structure searching using structure graph embeddings
Generally speaking, averaging protein embedding vectors over a whole protein can lead to biases and a substential loss of information. This is, in part, because most proteins are composed of multiple domains and disordered regions that are subject to changes during evolution. Progres uses individual domains as query structures, that can be obtained using tools such as Merizo, SWORD2 and Chainsaw. The protein structure embeddings Progres uses for protein domain similarity search are based on a trained graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein structure.
Protein Language Model-based
PROST
Improved global protein homolog detection with major gains in function identification
PLMSearch
PLMSearch: Protein language model powers accurate and fast sequence search for remote homology
Paper GitHub PLMSearch PLMAlign