MADHUMITHA GANESAN

Data Scientist, Machine Learning, Data Engineer

Website | LinkedIn | E: gmadhu89@gmail.com | M: 312-871-9027 Santa Clara, California - 95050

SUMMARY

Data Scientist / Data Engineer with 11 years of professional experience in building data-intensive applications, solving complex

modelling and scalability problems in diverse business domains. Proficient in data processing (ETL), predictive data modeling, machine

learning pipeline implementation, and visualizations. Enthusiastic, fast learner focused on delivering efficient and measurable results.

WORK EXPERIENCE

Coupa Software San Mateo, CA

Coupa is a technology provider for business spend management products including supply chain design planning, procurement

Data Scientist – Applied Research Feb 2022 - Present

• Developed a recommendation engine to generate network prescriptions that increased cost saving by 16% across multiple

supply chain customer models

• Developed regression models (XGBoost/RF/Light GBM) to predict transportation cost for a lane. Improved model accuracy by

7% using effective tuning and statistical data analysis best practices

• Implemented Shapley attribution to report key features influencing inventory costs resulting in explainable model results

• Deployed end to end recommendation pipeline on AWS

• Researched and prototyped heuristic approaches to prescribe lower cost paths for a supply chain network

• Applied time series analysis to forecast supply chain demand for optimized planning and management of inventory

Steelcase Inc. Atlanta, GA

Steelcase is a manufacturer of furniture, seating, and storage systems for office spaces

Data Scientist Intern May 2021 – Aug 2021

• Collaborated with quality and engineering teams to define product metrics for feature engineering a classification model

• Created Random Forest defect detection model that reduced defects by 84 basis points for corporate custom sales orders

Cognizant Technology Solutions (CCC Information Services) Chicago, IL

Cognizant is a multi-national technology services and consulting company

Associate Data Scientist Jan 2019 - Oct 2020

• Analyzed telematics (driving patterns) data and prototyped models (SVM/RF) that flagged possible auto collisions to aid insurance

companies in setting optimal premium pricing. The prototype improved customer retention by 3%

• Developed a machine learning model to predict valuation of an auto claim from images of damaged vehicle

Senior Data Engineer Jul 2010 - Dec 2018

• Led and managed migration of enterprise Datawarehouse from Redbrick to Oracle 11g overseeing a team of 8 members

• Designed and Implemented data pipelines to process 70 million insurance claims from multiple data sources using PySpark, and

store in HDFS. This pipeline was faster than the traditional process by 50%

• Improvised data pipeline architecture to include logging, checkpointing for re-runs and monitoring for faster debugging

• Optimized PySpark transformations by efficient caching, broadcasting and usage of expressions

• Partnered discussions with product management to develop conceptual, logical, and physical data models for the client’s

automobile claim workflow

• Parallelized ETL workflow that ingested transactional data from web application with over 8000 daily active users. This reduced

process completion time by 4 hours

• Initiated and implemented a reusable data pipeline for historical data backfilling

• Implemented dashboards and custom visualizations comparing KPI’s against industry benchmarks that increased customer

subscriptions by 6%

SKILLS

Machine Learning / Programming: MLFlow, ML Ops, numpy, pandas, scikit-learn, PyTorch, Python, R, Unix Shell Scripting

Data Science: Hypothesis Testing, Statistical Data Analysis, Analytical Modeling, Model Deployment, Assessment and Validation

Big Data Engineering: Hadoop (HDFS), Kafka, Hive, Sqoop, PySpark, SQL, PL/SQL, Sybase, Oracle

ETL/Business Intelligence: Tableau, D3, IBM DataStage, Pentaho, MicroStrategy, SQL Server Reporting Services, Data Modeling

Product Management Framework / Versioning / Deployment: Agile/Scrum, Waterfall, Jira, Version One, Git, Jenkins

Cloud Technologies: AWS S3, EC2

Business Domains: Auto Insurance, Life Insurance, Supply Chain, Manufacturing

EDUCATION

Georgia Institute of Technology, Atlanta, USA December, 2021

Master of Science, Computational Data Analytics GPA: 3.9

Anna University, Chennai, India May, 2010

Bachelor of Engineering, Electronics and Communication

PROJECTS AND CERTIFICATIONS

• Gamble Game Simulation: Programmed a gamble game of die rolls and performed Monte-Carlo simulation and distribution tests

• Big Data Pipeline for COVID-19 Analysis: End-to-end data pipeline using NiFi, AWS, Kafka, PySpark, and HDFS to analyze trends

and impacts on Tableau with a COVID-19 dataset

• CCA 175: Cloudera Spark and Hadoop Certified Developer (2020)