publications
2024
- CogSciComparing Abstraction in Humans and Large Language Models Using Multimodal Serial ReproductionSreejan Kumar, Raja Marjieh, Byron Zhang, and 5 more authorsProceedings of the Annual Meeting of the Cognitive Science Society, 2024
2023
- FrontiersCSSQ: a ChIP-seq signal quantifier pipelineAshwath Kumar, Michael Y. Hu, Yajun Mei, and 1 more authorFrontiers in Cell and Developmental Biology, 2023
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) has revolutionized the studies of epigenomes and the massive increase in ChIP-seq datasets calls for robust and user-friendly computational tools for quantitative ChIP-seq. Quantitative ChIP-seq comparisons have been challenging due to noisiness and variations inherent to ChIP-seq and epigenomes. By employing innovative statistical approaches specially catered to ChIP-seq data distribution and sophisticated simulations along with extensive benchmarking studies, we developed and validated CSSQ as a nimble statistical analysis pipeline capable of differential binding analysis across ChIP-seq datasets with high confidence and sensitivity and low false discovery rate with any defined regions. CSSQ models ChIP-seq data as a finite mixture of Gaussians faithfully that reflects ChIP-seq data distribution. By a combination of Anscombe transformation, k-means clustering, estimated maximum normalization, CSSQ minimizes noise and bias from experimental variations. Further, CSSQ utilizes a non-parametric approach and incorporates comparisons under the null hypothesis by unaudited column permutation to perform robust statistical tests to account for fewer replicates of ChIP-seq datasets. In sum, we present CSSQ as a powerful statistical computational pipeline tailored for ChIP-seq data quantitation and a timely addition to the tool kits of differential binding analysis to decipher epigenomes.
- TMLRLatent State Models of Training DynamicsMichael Y. Hu, Angelica Chen, Naomi Saphra, and 1 more authorTransactions on Machine Learning Research, 2023
2022
- NeurIPSUsing Natural Language and Program Abstractions to Instill Human Inductive Biases in MachinesSreejan Kumar, Carlos G. Correa, Ishita Dasgupta, and 7 more authorsNeurIPS, 2022
Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key.
- ACL WorkshopUsing Natural Language to Guide Meta-Learning Agents towards Human-like Inductive BiasesSreejan Kumar, Ishita Dasgupta, Michael Hu, and 6 more authorsIn ACL Workshop on Learning with Natural Language Supervision, 2022
- Nucleic AcidsrRNA expansion segment 7 in eukaryotes: from Signature Fold to tentaclesMarcin Biesiada, Michael Y. Hu, Loren Dean Williams, and 2 more authorsNucleic Acids Research, Oct 2022
The ribosomal core is universally conserved across the tree of life. However, eukaryotic ribosomes contain diverse rRNA expansion segments (ESs) on their surfaces. Sites of ES insertions are predicted from sites of insertion of micro-ESs in archaea. Expansion segment 7 (ES7) is one of the most diverse regions of the ribosome, emanating from a short stem loop and ranging to over 750 nucleotides in mammals. We present secondary and full-atom 3D structures of ES7 from species spanning eukaryotic diversity. Our results are based on experimental 3D structures, the accretion model of ribosomal evolution, phylogenetic relationships, multiple sequence alignments, RNA folding algorithms and 3D modeling by RNAComposer. ES7 contains a distinct motif, the ‘ES7 Signature Fold’, which is generally invariant in 2D topology and 3D structure in all eukaryotic ribosomes. We establish a model in which ES7 developed over evolution through a series of elementary and recursive growth events. The data are sufficient to support an atomic-level accretion path for rRNA growth. The non-monophyletic distribution of some ES7 features across the phylogeny suggests acquisition via convergent processes. And finally, illustrating the power of our approach, we constructed the 2D and 3D structure of the entire LSU rRNA of Mus musculus.
2021
- NeurIPSSafe Reinforcement Learning with Natural Language ConstraintsTsung-Yen Yang, Michael Hu, Yinlam Chow, and 2 more authorsNeurIPS, Oct 2021
While safe reinforcement learning (RL) holds great promise for many practical applications like robotics or autonomous cars, current approaches require specifying constraints in mathematical form. Such specifications demand domain expertise, limiting the adoption of safe RL. In this paper, we propose learning to interpret natural language constraints for safe RL. To this end, we first introduce HazardWorld, a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text. We then develop an agent with a modular architecture that can interpret and adhere to such textual constraints while learning new tasks. Our model consists of (1) a constraint interpreter that encodes textual constraints into spatial and temporal representations of forbidden states, and (2) a policy network that uses these representations to produce a policy achieving minimal constraint violations during training. Across different domains in HazardWorld, we show that our method achieves higher rewards (up to11x) and fewer constraint violations (by 1.8x) compared to existing approaches. However, in terms of absolute performance, HazardWorld still poses significant challenges for agents to learn efficiently, motivating the need for future work.
2019
- J Mol BioG-Quadruplexes in Human Ribosomal RNASanti Mestre-Fos, Petar I. Penev, Suttipong Suttapitugsakul, and 6 more authorsJournal of Molecular Biology, Oct 2019
rRNA is the single most abundant polymer in most cells. Mammalian rRNAs are nearly twice as large as those of prokaryotes. Differences in rRNA size are due to expansion segments, which contain extended tentacles in metazoans. Here we show that the terminus of an rRNA tentacle of Homo sapiens contains 10 tandem G-tracts that form highly stable G-quadruplexes in vitro. We characterized rRNA of the H. sapiens large ribosomal subunit by computation, circular dichroism, UV melting, fluorescent probes, nuclease accessibility, electrophoretic mobility shifts, and blotting. We investigated Expansion Segment 7 (ES7), oligomers derived from ES7, intact 28S rRNA, 80S ribosomes, and polysomes. We used mass spectrometry to identify proteins that bind to rRNA G-quadruplexes in cell lysates. These proteins include helicases (DDX3, CNBP, DDX21, DDX17) and heterogeneous nuclear ribonucleoproteins. Finally, by multiple sequence alignments, we observe that G-quadruplex-forming sequences are a general feature of LSU rRNA of Chordata but not, as far as we can tell, of other species. Chordata ribosomes present polymorphic tentacles with the potential to switch between inter- and intramolecular G-quadruplexes. To our knowledge, G-quadruplexes have not been reported previously in ribosomes.
- Cell ReportsUltra-High-Frequency Reprogramming of Individual Long-Term Hematopoietic Stem Cells Yields Low Somatic Variant Induced Pluripotent Stem CellsKai Wang, Anthony K. Guzman, Zi Yan, and 10 more authorsCell Reports, Oct 2019
Summary Efficiency of reprogramming of human cells into induced pluripotent stem cells (iPSCs) has remained low. We report that individual adult human CD49f+ long-term hematopoietic stem cells (LT-HSCs) can be reprogrammed into iPSCs at close to 50% efficiency using Sendai virus transduction. This exquisite sensitivity to reprogramming is specific to LT-HSCs, since it progressively decreases in committed progenitors. LT-HSC reprogramming can follow multiple paths and is most efficient when transduction is performed after the cells have exited G0. Sequencing of 75 paired skin fibroblasts/LT-HSC samples collected from nine individuals revealed that LT-HSCs contain a lower load of somatic single-nucleotide variants (SNVs) and indels than skin fibroblasts and accumulate about 12 SNVs/year. Mutation analysis revealed that LT-HSCs and fibroblasts have very different somatic mutation signatures and that somatic mutations in iPSCs generally exist prior to reprogramming. LT-HSCs may become the preferred cell source for the production of clinical-grade iPSCs.
2017
- BiochemistryEukaryotic Ribosomal Expansion Segments as Antimicrobial TargetsLizzette M. Gómez Ramos, Natalya N. Degtyareva, Nicholas A. Kovacs, and 8 more authorsBiochemistry, Oct 2017PMID: 28895721
Diversity in eukaryotic rRNA structure and function offers possibilities of therapeutic targets. Unlike ribosomes of prokaryotes, eukaryotic ribosomes contain species-specific rRNA expansion segments (ESs) with idiosyncratic structures and functions that are essential and specific to some organisms. Here we investigate expansion segment 7 (ES7), one of the largest and most variable expansions of the eukaryotic ribosome. We hypothesize that ES7 of the pathogenic fungi Candida albicans (ES7CA) could be a prototypic drug target. We show that isolated ES7CA folds reversibly to a native-like state. We developed a fluorescence displacement assay using an RNA binding fluorescent probe, F-neo. F-neo binds tightly to ES7CA with a Kd of 2.5 × 1e–9 M but binds weakly to ES7 of humans (ES7HS) with a Kd estimated to be greater than 7 μM. The fluorescence displacement assay was used to investigate the affinities of a library of peptidic aminosugar conjugates (PAs) for ES7CA. For conjugates with highest affinities for ES7CA (NeoRH, NeoFH, and NeoYH), the lowest dose needed to induce mortality in C. albicans (minimum inhibitory concentration, MIC) was determined. PAs with the lowest MIC values were tested for cytotoxicity in HEK293T cells. Molecules with high affinity for ES7CAin vitro induce mortality in C. albicans but not in HEK293T cells. The results are consistent with the hypothesis that ESs represent useful targets for chemotherapeutics directed against eukaryotic pathogens.