I'm a recent graduate student from Columbia University's the Master's of Science in Data Science (MSDS) program. I'm currently looking for a full-time job so if you are looking for a bioinformatics scientist, computational biologist, or data scientist, feel free to contact me through any of the links on this page. I did a computational biology internship at Roche Diagnostics and I'm currently working as a Research Assistant at the Jovanovic Lab at Columbia University. I try to share all my code/learnings on GitHub. However, I'm not able to share code related to ongoing papers/projects in the Jovanovic Lab or code related to projects I worked on at Roche Diagnostics.
Improved Size-exclusion chromatography toolkit (SECAT) by ~40% through optimizations and multiprocessing support for non-parametric Loess normalization.
A space where I apply my data science frameworks and skills to contemporary questions.
Research Assistant | Jan 2022 — Current
Jovanovic Lab @ Columbia University
Working with protein turnover data to identify differentially expressed proteins and differential complex turnover in primary versus metastatic pancreatic cancer cells using Python with the following packages: bokeh, networkx, scipy, pandas, matplotlib, numpy.
Optimized Size-Exclusion Chromatography Toolkit (SECAT) by reducing memory intense data processing and adding asynchronous conconcurrency with Python multiprocessing package, and improved the user experience and developer experience via conda environments.
Performing outlier analysis for differentially phosphorylated proteins in primary pancreatic cancer cells versus metastatic liver cancer cells using R and Python with the following packages: blacksheep, gseapy, scipy, pandas, numpy, and ssGSEA2.0.
Computational Biology Intern | May 2022 — Aug 2022
Roche Molecular Diagnostics, Research and Early Development
Implemented convex optimization based gene prioritization with PyTorch neural network module to explore novel polygenic risk scoring for Alzheimer’s Disease using 3 data modalities. Worked on a multi-discplinary approach using molecular biology, computer science, and data science.
Independently developed graph neural network classifier to determine whether an individual has Alzheimer’s Disease based on heterogenous graph with 4 node modalities and 3 edge modalities.
Created machine learning operations (ML Ops) infrastructure to automate graph neural network training, validation, testing, and inference via PyTorch Lightning.
Capstone Project Member | Sep 2022 — Dec 2022
Columbia University (Data Science Institute)
Transformed Jupyter notebooks for scRNA preprocessing and analysis into Python scripts to improve the repro- ducibility and scalability of NGS analysis.
Created Docker containers for CellPhoneDB, Tangram, and SCRAN/Liger analyses to improve the portability of the NGS analyses across Windows, Linux Servers, and MacOS.
Performed CPU and memory profiling via scalene and used insights to make several memory and file I/O improvements including: vectorizing counts normalization code to improve speed by 98.1%, switching from csv to parquet file format re- ducing file size by up to 97%, and dynamically switching between regular and compressed sparse row (CSR) representations based on sparsity of counts matrices.