Applied Statistics graduate skilled with Pharma Data Analysis, big data management, machine learning, statistics & mathematics. Strong experience in building business analytics dashboards by implementing python codes.
⦿ Performed territory-level analysis on 2021 United States sales data and reported the summary of territory-level sales, patients, HCPs, call plans, call activities, and attainments data using SQL and Excel.
⦿ Identified influential HCPs for targeting by implementing a business-rule-based model on IQVIA APLD data and using SQL.
⦿ Conducted Grid Analysis on patients, summarized and reported the number of patients in each combination of LOT and regimen using SQL and Excel.
⦿ Conducted NPP Analysis on a novel product’s marketing data and created various data visualizations to display the number of HCPs reached/engaged and the marketing costs in each type of marketing channel, channel-vendor, and channel-vendor-product using Python programming.
⦿ Implemented Python code for building business analytics dashboards and parameterized the code to enable automatic monthly updates.
Digital Health & Wellness Firm,
Data Analyst Intern (Jul 2020 – Oct 2020)
⦿ Analyzed customers’ behaviors, features, demographics, and KPIs such as drop-off rate, average session time.
⦿ Created data visualizations such as maps and line graphs, reported on the findings in customer analysis.
⦿ Created and managed multiple data pipelines to AWS S3 buckets and managed data warehouses including Azure Blob Storage, Azure Data Lake, utilized MS SQL Server, AWS Redshift, Glue to query large scale data.
⦿ Automated the ETL process and integrated large-scale data from Mixpanel, Mailchimp, and Pipeliner into Azure and AWS by creating workflows in Tray.io.
⦿ Processed large-scale data and performed data mapping, prepared data for machine learning modeling, performed data mining and created data visualizations using Enterprise Power BI to study user behavior.
Data Scientist Co-op (Jan 2019 – May 2019)
⦿ Conducted lead scoring by training machine learning models including Logistic Regression, Random Forest, Extreme Gradient Boosting and Neural Network using Python. Best model has an AUC of 0.87.
⦿ Conducted exploratory data analysis, identified the most effective marketing channel, analyzed the response rates of different marketing activities; created various data visualizations and reported the findings.
⦿ Manipulated data with 1 million rows, cleansed data, normalized data and prepared for modeling.
⦿ Presented project and reported development progress in written and spoken formats to both data scientists and non-technical managers.
⦿ Predicted the probability of a patient getting breast cancer by training Random Forest, Support Vector Machines, Decision Tree, and K-Means Clustering models in Python sklearn.
Machine Learning Modeling for Letter Recognition, DePaul University (Jan 2020 – Apr 2020)
⦿ Trained a series of Decision Tree models and K-Nearest Neighbor models with different configurations on data, conducted model evaluation and selected the best model according to the misclassification matrix in Python
Bachelor of Science: Applied Mathematics
University of Washington (2013 – 2018)