I’m interested in applying advanced neural network and machine learning models to study the brain and behavior. Currently, I’m working on two main projects. The first explores how odorous molecules interact with the mosquito olfactory system. To study this, I develop multimodal transformer models to predict odorant–olfactory receptor binding by integrating low-rank adapted protein and chemical foundation models. I’m also building custom chemical foundation models using graph neural networks (GNNs) specifically tailored for olfaction.
The second project focuses on using GNNs to study the collective behavior of schooling fish. In this work, I design GNNs that predict temporal, graph-structured data while simultaneously inferring dynamic edge relationships between individuals.
In my free time I like to make music, graphic design, and run.
ABSTRACT: Foundation models are large neural networks pre-trained on unlabeled data that learn rich, generalizable embeddings that enhance performance on downstream tasks. These models are particularly valuable when data is limited, where they mitigate overfitting and improve generalization. There has been significant progress in the development and application of foundation models to problems in biochemistry but these insights have largely been overlooked in neuroscience. We present results adapting biochemical foundation models to problems in olfaction. Several aspects of olfactory neuroscience support their use. Olfactory datasets that record how an odorant interacts with olfactory receptors (ORs) are small, on the order of hundreds of odorants, which stands in stark contrast to the hypothesized billions of odorants that exist. This discrepancy suggests chemical and/or protein foundation models, trained on millions of unlabeled chemicals or proteins respectively, can aid prediction of which odorants and ORs will pair. We adapted several chemical and protein foundation models to the task of odorant-OR pair prediction across three datasets from different species, leading to three main findings. First, molecular information alone was insufficient to accurately predict olfactory receptor neuron activity, suggesting that individual neuron selectivity cannot be captured from molecular features alone. Second, integrating protein embeddings from protein foundation models drastically increases performance, suggesting that multi-modal models, which integrate chemical and protein data, are critical for accurate predictions. Third, although we applied several chemical foundation models, no single model achieved superior performance, suggesting additional improvements in self-supervised methods for constructing chemical foundation models could lead to improvements on this task.
ABSTRACT: National Cancer Institute (NCI) Program for Natural Product Discovery is a new initiative aimed at creating new technologies for natural product-based drug discovery. Here, we present the development of a neural network-based bioinformatics platform for visualization and analysis of natural product high-throughput screening data using the NCI’s 60 human tumor cell anticancer drug screen. We demonstrate how the tool enables visualization of similar patterns of response that can be parsed both chemically and taxonomically, grouping NCI-60 biological profiles in one easy-to-use bioinformatics interface.
ABSTRACT: Nanoporous materials (NPMs) selectively adsorb and concentrate gases into their pores and thus could be used to store, capture, and sense many different gases. Modularly synthesized classes of NPMs, such as covalent organic frameworks (COFs), offer a large number of candidate structures for each adsorption task. A complete NPM-property table, containing measurements of relevant adsorption properties in candidate NPMs, would enable the matching of NPMs with adsorption tasks. However, in practice, the NPM-property matrix is only partially observed (incomplete); many different properties of many different NPMs have not been measured. The idea in this work is to leverage the observed (NPM, property) values to impute the missing ones. Similarly, commercial recommendation systems impute missing entries in an incomplete product–customer ratings matrix to recommend products to customers. We demonstrate a COF recommendation system to match COFs with adsorption tasks by training a low-rank model of an incomplete COF–adsorption-property matrix constructed from simulated uptakes of CH4, H2O, H2S, Xe, Kr, CO2, N2, O2, and H2 at various conditions. A low-rank model of the COF–adsorption-property matrix, fit to the observed (COF, adsorption property) values, provides (i) predictions of the missing (COF, adsorption property) values and (ii) a “map” of COFs, wherein COFs, represented as points, with similar (dissimilar) adsorption properties congregate (separate). The COF recommendation system is able to rank COFs reasonably well for most of the adsorption properties, but imputation performance diminishes precipitously when the fraction of missing entries exceeds 60%. The concepts in our COF recommendation system can be applied broadly to impute missing data pertaining to many different materials and properties.