I’m interested in applying advanced neural network and machine learning models to study the brain and behavior. Currently, I’m working on two main projects. The first explores how odorous molecules interact with the mosquito olfactory system. To study this, I develop multimodal transformer models to predict odorant–olfactory receptor binding by integrating low-rank adapted protein and chemical foundation models. I’m also building custom chemical foundation models using graph neural networks (GNNs) specifically tailored for olfaction.
The second project focuses on using GNNs to study the collective behavior of schooling fish. In this work, I design GNNs that predict temporal, graph-structured data while simultaneously inferring dynamic edge relationships between individuals.
In my free time I like to make music, graphic design, and run.
ABSTRACT: Featurizing odorants to enable robust prediction of their properties is difficult due to the complex activation patterns that odorants evoke in the olfactory system. Structurally similar odorants can elicit distinct activation patterns in both the sensory periphery (i.e., at the receptor level) and downstream brain circuits (i.e., at a perceptual level). Despite efforts to design or discover features for odorants to better predict how they activate the olfactory system, we lack a universally accepted way to featurize odorants. In this work, we demonstrate that feature-based approaches that rely on pre-trained foundation models do not significantly outperform classical hand-designed features on odorant-receptor binding tasks, but that targeted foundation model fine-turning can increase model performance beyond these limits. To show this, we introduce a new model that creates olfaction-specific representations: LoRA-based Odorant-Receptor Affinity prediction with CROSS-attention (LORAX). We compare existing chemical foundation model representations to hand-designed physicochemical descriptors using feature-based methods and identify large information overlap between these representations, highlighting the necessity of fine-tuning to generate novel and superior odorant representations. We show that LORAX produces a feature space more closely aligned with olfactory neural representation, enabling it to outperform existing models on predictive tasks.
Preprint: https://www.biorxiv.org/content/10.1101/2025.11.04.686628v2
ABSTRACT: National Cancer Institute (NCI) Program for Natural Product Discovery is a new initiative aimed at creating new technologies for natural product-based drug discovery. Here, we present the development of a neural network-based bioinformatics platform for visualization and analysis of natural product high-throughput screening data using the NCI’s 60 human tumor cell anticancer drug screen. We demonstrate how the tool enables visualization of similar patterns of response that can be parsed both chemically and taxonomically, grouping NCI-60 biological profiles in one easy-to-use bioinformatics interface.
ABSTRACT: Nanoporous materials (NPMs) selectively adsorb and concentrate gases into their pores and thus could be used to store, capture, and sense many different gases. Modularly synthesized classes of NPMs, such as covalent organic frameworks (COFs), offer a large number of candidate structures for each adsorption task. A complete NPM-property table, containing measurements of relevant adsorption properties in candidate NPMs, would enable the matching of NPMs with adsorption tasks. However, in practice, the NPM-property matrix is only partially observed (incomplete); many different properties of many different NPMs have not been measured. The idea in this work is to leverage the observed (NPM, property) values to impute the missing ones. Similarly, commercial recommendation systems impute missing entries in an incomplete product–customer ratings matrix to recommend products to customers. We demonstrate a COF recommendation system to match COFs with adsorption tasks by training a low-rank model of an incomplete COF–adsorption-property matrix constructed from simulated uptakes of CH4, H2O, H2S, Xe, Kr, CO2, N2, O2, and H2 at various conditions. A low-rank model of the COF–adsorption-property matrix, fit to the observed (COF, adsorption property) values, provides (i) predictions of the missing (COF, adsorption property) values and (ii) a “map” of COFs, wherein COFs, represented as points, with similar (dissimilar) adsorption properties congregate (separate). The COF recommendation system is able to rank COFs reasonably well for most of the adsorption properties, but imputation performance diminishes precipitously when the fraction of missing entries exceeds 60%. The concepts in our COF recommendation system can be applied broadly to impute missing data pertaining to many different materials and properties.