General research interests

Developing ML models: Learning from protein language models (PLMs) and predicted structures

Protein phosphorylation is a post-translational modification that can alter the structure and function of proteins. PhosBoost is a machine learning approach that leverages protein language models and gradient boosting trees to predict protein phosphorylation from experimentally derived data. PhosBoost offers improved performance when recall is prioritized while consistently providing more confident probability scores.

Developing workflows: Pan-genome protein-protein interaction networks

We deployed the PPI clustering algorithm ClusterONE to identify numerous predicted-PPI clusters that were functionally annotated using gene ontology (GO) functional enrichment, demonstrating a diverse range of enriched GO terms across different clusters. We show that the functionally annotated PPI clusters establish a useful framework for protein function prediction and prioritization of candidate genes of interest.

Analysis of protein-ligand interactions: Molecular docking and dynamic simulations

Using predicted protein structures of metabolomic enzymes with generative molecular docking is a powerful tool for understanding substrate specificity and promescuity that enables the metabolomic diversity observed in plants.

Metabolomic profiling of plant tissues: From extraction to multi-omic data analyses

Plant specialized metabolites play important roles in mediating beneficial interaction with microbes and insects and preventing damage from pests and disease. Extraction and profiling of plant metabolites is vital for untangling the basis for these complex ecological and agronomical interactions.

Association mapping: Discovering the genetic basis for plant phenotypic diversity

Connecting traits to causal genes through association studies can accelarate translational research and gain insights into the underlying biological processes.

Comparative genomic: Unraveling complex patterns of duplications, deletions, and mutations in syntenic loci

Comparative genomics and phylogenemocs approaches are useful for understanding complex gene duplication, deletion and rearrangement patterns. A better understanind of these events can shed a light on gene expression regulation and neofunctionalization across the genetic diversity of plants.