Profile

Data Science

Associate Consultant, Data Science C00272

Published On

Senior Associate Consultant with interest in exploring recent advances in data science, AIML, and programming. Holds a Ph.D. from The Pennsylvania State University with a major in Industrial Engineering & Operations Research and two minors in Statistics and Mathematics.

Industry and Project Experience

This image has an empty alt attribute; its file name is Divider-1.jpg
KMK Consulting, Inc., Associate Manager, Data Science Jan 2022 – Present

Pharma Consulting Firm, Senior Decision Scientist (R&D) Aug 2020 – Jan 2022

⦿ Improved the accuracy of a new-customer discovery model by 50% and the speed by 2000% through simplifying the overcomplicated classification model, removing improper statistical tests, adding regularization terms, and combining cross-validation with downsampling.
⦿ Reduced AWS EC2 server cost by cutting its usage time to 33% by implementing an in-house map-reduce strategy in R that enabled three data-intensive machine learning projects to run simultaneously.
⦿ End-to-end automated several payer-stream standard reports for Pharma B using R Shiny. For some of the reports, the refreshing time has been reduced from hours to minutes. Other reports even begin to “breathe” on their own such that no more human intervention will ever be needed. Our clients told us they loved these reports because they are not only automated but also improved with more concise and cleaner KPI representations. These positive changes enabled our clients to make quicker and smarter decisions.
⦿ Developed a set of R packages for Pharma B, facilitating new members to pick up work quickly. For example, these packages enabled us to connect to databases, pull up commonly used SQL snippets, refresh standard reports, generate beautiful dashboards, and make use of third-party APIs with single lines of code. These packages have been helping us stay away from repetitive tasks and focus on more creative ones.
⦿ Improved the performance and user experience of a legacy payer-contracting software, written in R Shiny and SQL, by adding custom R packages as patches, refining the underlying infrastructure, and introducing new features. All client complaints went away after these changes.
⦿ Developed a Natural Language Processing (NLP) application in R to help Pharma C’s HR department identify risky items.

⦿ Helped to cultivate a data-centric culture within a firm by recommending learning resources on data science and software engineering.
⦿ Won the 2020 Best Trainee Award.

Pharma Consulting Firm, Senior Decision Scientist (Consulting) Aug 2020 – Present

⦿ Secured a new brand supporting the project from Pharma A by demonstrating extensive analytical skills and experiences.
⦿ Successfully delivered a doctors & hospitals segmentation project to Pharma B by collaborating with a team of eight.
⦿ Led a payer-stream consulting team of five to support Pharma B. Timely responded to big volumes of client requests. Effectively communicated with clients to understand underlying business problems, proactively cleared roadblocks, and assigned tasks to offshore colleagues to form a seamless global workflow. Quickly took feedback from clients, iterated, and delivered satisfactory results either on time or ahead of time.
⦿ Helped team members keep working efficiently with workload-control techniques. Broke client requests into actionable sub-tasks, prioritized them, and focused on the bottleneck problems. Managed client expectations and re-prioritized ongoing requests when the team was working at its full capacity, protecting team members’ mental health and creativity, resulting in high-quality deliverables that clients appreciated.
⦿ Extensive knowledge of commercial Life Sciences data sets, such as IQVIA NSP, DDD, Xponent (PlanTrak), NPA, LAAD, and Veeva promotional.

University, Graduate Research Assistant 2016 – 2020

⦿ Developed a set of algorithms to compute nonparametric and distribution-free confidence regions on the optima of multi-dimension functions fitted from data, and (non)parametric Bayesian credible regions on the same. The accompanying R package, OptimaRegion, has been used by nearly 20,000 users worldwide to this date. These algorithms also have the potential to systematically solve the hyperparameter tunning problem in the machine learning field by combining parallelizable grid search methods and metamodel-based, sequential search methods.
⦿ Investigated the US Industrial Engineering & Operations Research faculty hiring network with a latent variable exponential random graph model.
⦿ Applied statistical process control and shape analysis to point-cloud data to improve the quality of 3D-Printing products.

University, Graduate Researcher 2014 – 2015

⦿ Used Java to formulate the Toyota Kanban production system as a Markov Chain and maximized its profits with Simulated Annealing.
⦿ Built a database with SQL for a medical equipment company based in China to manage sales transactions.

University, Instructor of Stochastic Models in Operations Research (IE 425) 2019 – 2020

⦿ One of the three Ph.D. instructors in this world-renowned IE department, among which I was in charge of the only core course.
⦿ Provided 132 senior undergraduates with a solid analytical foundation in conditional probability, Poisson processes, Markov chains, queuing theory, inventory theory, dynamic programming, and basic Bayesian statistics via MCMC, by collaborating with two teaching assistants and utilizing independently developed lectures, computer simulations, homework assignments, and exams.
⦿ Mastered how to explain complicated concepts in simple ways.

Technical Skills

This image has an empty alt attribute; its file name is Divider-1.jpg
Advanced R:
o Machine learning and statistical inference: tidymodels, torch, stats, infer, rstanarm, rjags
o Time series and forecasting: forecast, tsibble, fable
o High-performance computing: foreach, future, Rfast, Rcpp
o Data engineering: DBI, sparklyr, dplyr, data.table, arrow
o Data visulization and communication: ggplot2, rmarkdown, flexdashboard, highcharter, DT, gt
o Software engineering: devtools, shiny, plumber, repository and library management, app deployment, version control
SQL: logical execution order, temp tables/views, CTEs, window functions, performance improvement
Python: pyspark, pandas, numpy, scikit-learn, keras
Hands-on experience with distributed computing tools – Databricks (Apache Spark), H2O, AWS.
Data sets: IQVIA NSP, DDD, Xponent (PlanTrak), NPA, LAAD, and Veeva promotionals.

Education

This image has an empty alt attribute; its file name is Divider-1.jpg
Ph.D., Industrial Engineering & Operations Research, Minors in Statistics and Mathematics
The Pennsylvania State University (2015 – 2020)
MS, Industrial Engineering
University of Missouri – Columbia, Columbia, MO (2013 – 2015)
BS, Electrical Engineering, BA, Financial Management
East China University of Science and Technology Shanghai (2013)
Get The Latest Updates

Subscribe To Our Monthly Newsletter

No spam, only the content you’ll want to read.

Details about how we process your Information is available in our 

Privacy Policy

Watch our Latest Webinar On-demand

The Art of Leadership in the Emerging Biopharmaceutical Industry

Trends, Challenges, and Approaches

Featured Speakers