MADHUMITHA GANESAN
Data Scientist, Machine Learning, Data Engineer
Website | LinkedIn | E: gmadhu89@gmail.com | M: 312-871-9027 Santa Clara, California - 95050
SUMMARY
Data Scientist / Data Engineer with 11 years of professional experience in building data-intensive applications, solving complex
modelling and scalability problems in diverse business domains. Proficient in data processing (ETL), predictive data modeling, machine
learning pipeline implementation, and visualizations. Enthusiastic, fast learner focused on delivering efficient and measurable results.
WORK EXPERIENCE
Coupa Software San Mateo, CA
Coupa is a technology provider for business spend management products including supply chain design planning, procurement
Data Scientist – Applied Research Feb 2022 - Present
• Developed a recommendation engine to generate network prescriptions that increased cost saving by 16% across multiple
supply chain customer models
• Developed regression models (XGBoost/RF/Light GBM) to predict transportation cost for a lane. Improved model accuracy by
7% using effective tuning and statistical data analysis best practices
• Implemented Shapley attribution to report key features influencing inventory costs resulting in explainable model results
• Deployed end to end recommendation pipeline on AWS
• Researched and prototyped heuristic approaches to prescribe lower cost paths for a supply chain network
• Applied time series analysis to forecast supply chain demand for optimized planning and management of inventory
Steelcase Inc. Atlanta, GA
Steelcase is a manufacturer of furniture, seating, and storage systems for office spaces
Data Scientist Intern May 2021 – Aug 2021
• Collaborated with quality and engineering teams to define product metrics for feature engineering a classification model
• Created Random Forest defect detection model that reduced defects by 84 basis points for corporate custom sales orders
Cognizant Technology Solutions (CCC Information Services) Chicago, IL
Cognizant is a multi-national technology services and consulting company
Associate Data Scientist Jan 2019 - Oct 2020
• Analyzed telematics (driving patterns) data and prototyped models (SVM/RF) that flagged possible auto collisions to aid insurance
companies in setting optimal premium pricing. The prototype improved customer retention by 3%
• Developed a machine learning model to predict valuation of an auto claim from images of damaged vehicle
Senior Data Engineer Jul 2010 - Dec 2018
• Led and managed migration of enterprise Datawarehouse from Redbrick to Oracle 11g overseeing a team of 8 members
• Designed and Implemented data pipelines to process 70 million insurance claims from multiple data sources using PySpark, and
store in HDFS. This pipeline was faster than the traditional process by 50%
• Improvised data pipeline architecture to include logging, checkpointing for re-runs and monitoring for faster debugging
• Optimized PySpark transformations by efficient caching, broadcasting and usage of expressions
• Partnered discussions with product management to develop conceptual, logical, and physical data models for the client’s
automobile claim workflow
• Parallelized ETL workflow that ingested transactional data from web application with over 8000 daily active users. This reduced
process completion time by 4 hours
• Initiated and implemented a reusable data pipeline for historical data backfilling
• Implemented dashboards and custom visualizations comparing KPI’s against industry benchmarks that increased customer
subscriptions by 6%
SKILLS
Machine Learning / Programming: MLFlow, ML Ops, numpy, pandas, scikit-learn, PyTorch, Python, R, Unix Shell Scripting
Data Science: Hypothesis Testing, Statistical Data Analysis, Analytical Modeling, Model Deployment, Assessment and Validation
Big Data Engineering: Hadoop (HDFS), Kafka, Hive, Sqoop, PySpark, SQL, PL/SQL, Sybase, Oracle
ETL/Business Intelligence: Tableau, D3, IBM DataStage, Pentaho, MicroStrategy, SQL Server Reporting Services, Data Modeling
Product Management Framework / Versioning / Deployment: Agile/Scrum, Waterfall, Jira, Version One, Git, Jenkins
Cloud Technologies: AWS S3, EC2
Business Domains: Auto Insurance, Life Insurance, Supply Chain, Manufacturing
EDUCATION
Georgia Institute of Technology, Atlanta, USA December, 2021
Master of Science, Computational Data Analytics GPA: 3.9
Anna University, Chennai, India May, 2010
Bachelor of Engineering, Electronics and Communication
PROJECTS AND CERTIFICATIONS
• Gamble Game Simulation: Programmed a gamble game of die rolls and performed Monte-Carlo simulation and distribution tests
• Big Data Pipeline for COVID-19 Analysis: End-to-end data pipeline using NiFi, AWS, Kafka, PySpark, and HDFS to analyze trends
and impacts on Tableau with a COVID-19 dataset
• CCA 175: Cloudera Spark and Hadoop Certified Developer (2020)