MADHUMITHA GANESAN
Data Scientist, Machine Learning, Data Engineer
Website | LinkedIn | E: gmadhu89@gmail.com | M: 312-871-9027 Santa Clara, California - 95050
SUMMARY
Data Scientist / Data Engineer with 11 years of professional experience in building data-intensive applications, solving complex
modelling and scalability problems in diverse business domains. Proficient in data processing (ETL), predictive data modeling, machine
learning pipeline implementation, and visualizations. Enthusiastic, fast learner focused on delivering efficient and measurable results.
WORK EXPERIENCE
Coupa Software San Mateo, CA
Coupa is a technology provider for business spend management products including supply chain design planning, procurement
Data Scientist Applied Research Feb 2022 - Present
Developed a recommendation engine to generate network prescriptions that increased cost saving by 16% across multiple
supply chain customer models
Developed regression models (XGBoost/RF/Light GBM) to predict transportation cost for a lane. Improved model accuracy by
7% using effective tuning and statistical data analysis best practices
Implemented Shapley attribution to report key features influencing inventory costs resulting in explainable model results
Deployed end to end recommendation pipeline on AWS
Researched and prototyped heuristic approaches to prescribe lower cost paths for a supply chain network
Applied time series analysis to forecast supply chain demand for optimized planning and management of inventory
Steelcase Inc. Atlanta, GA
Steelcase is a manufacturer of furniture, seating, and storage systems for office spaces
Data Scientist Intern May 2021 Aug 2021
Collaborated with quality and engineering teams to define product metrics for feature engineering a classification model
Created Random Forest defect detection model that reduced defects by 84 basis points for corporate custom sales orders
Cognizant Technology Solutions (CCC Information Services) Chicago, IL
Cognizant is a multi-national technology services and consulting company
Associate Data Scientist Jan 2019 - Oct 2020
Analyzed telematics (driving patterns) data and prototyped models (SVM/RF) that flagged possible auto collisions to aid insurance
companies in setting optimal premium pricing. The prototype improved customer retention by 3%
Developed a machine learning model to predict valuation of an auto claim from images of damaged vehicle
Senior Data Engineer Jul 2010 - Dec 2018
Led and managed migration of enterprise Datawarehouse from Redbrick to Oracle 11g overseeing a team of 8 members
Designed and Implemented data pipelines to process 70 million insurance claims from multiple data sources using PySpark, and
store in HDFS. This pipeline was faster than the traditional process by 50%
Improvised data pipeline architecture to include logging, checkpointing for re-runs and monitoring for faster debugging
Optimized PySpark transformations by efficient caching, broadcasting and usage of expressions
Partnered discussions with product management to develop conceptual, logical, and physical data models for the client’s
automobile claim workflow
Parallelized ETL workflow that ingested transactional data from web application with over 8000 daily active users. This reduced
process completion time by 4 hours
Initiated and implemented a reusable data pipeline for historical data backfilling
Implemented dashboards and custom visualizations comparing KPI’s against industry benchmarks that increased customer
subscriptions by 6%
SKILLS
Machine Learning / Programming: MLFlow, ML Ops, numpy, pandas, scikit-learn, PyTorch, Python, R, Unix Shell Scripting
Data Science: Hypothesis Testing, Statistical Data Analysis, Analytical Modeling, Model Deployment, Assessment and Validation
Big Data Engineering: Hadoop (HDFS), Kafka, Hive, Sqoop, PySpark, SQL, PL/SQL, Sybase, Oracle
ETL/Business Intelligence: Tableau, D3, IBM DataStage, Pentaho, MicroStrategy, SQL Server Reporting Services, Data Modeling
Product Management Framework / Versioning / Deployment: Agile/Scrum, Waterfall, Jira, Version One, Git, Jenkins
Cloud Technologies: AWS S3, EC2
Business Domains: Auto Insurance, Life Insurance, Supply Chain, Manufacturing
EDUCATION
Georgia Institute of Technology, Atlanta, USA December, 2021
Master of Science, Computational Data Analytics GPA: 3.9
Anna University, Chennai, India May, 2010
Bachelor of Engineering, Electronics and Communication
PROJECTS AND CERTIFICATIONS
Gamble Game Simulation: Programmed a gamble game of die rolls and performed Monte-Carlo simulation and distribution tests
Big Data Pipeline for COVID-19 Analysis: End-to-end data pipeline using NiFi, AWS, Kafka, PySpark, and HDFS to analyze trends
and impacts on Tableau with a COVID-19 dataset
CCA 175: Cloudera Spark and Hadoop Certified Developer (2020)