About Me

Data Scientist with 2+ years of experience shipping ML models, data pipelines, and analytics solutions that drive real business impact. At Paisabazaar (India's largest credit marketplace), I built XGBoost and CatBoost models on 500K+ samples that increased customer acquisition by 28% and reduced loan defaults by 32%, and helped scale a financial product from $1.2M to $11M in 3 months. I've processed 30M+ daily records, built PySpark data marts, and deployed end-to-end ML systems on AWS ECS Fargate. MS in Business Analytics from Northeastern University (GPA 3.8).

0+ Years Experience

0+ Projects Shipped

0% Customer Acq. Lift

0% Default Reduction

3.8/4.0 GPA @ Northeastern

Open to Work, Data Scientist · ML Engineer · Business Analyst

Skills

Languages

Data manipulation, scripting, and database querying

Frameworks & Libraries

Machine learning, deep learning, and AI frameworks

BI & Visualization

Interactive dashboards, KPI tracking, and data storytelling

Tools

Version control, IDEs, and development environments

Cloud & Data

Cloud infrastructure, data warehousing, and ETL pipelines

Education & Experience

For more information, have a look at my curriculum vitae .

Northeastern University September 2024 - December 2025

Graduate Teaching Assistant

Python SQL R Gemini Claude BigQuery Snowflake
Board Americas June 2025 - August 2025

Analytics Consultant Intern

Board ERP FP&A Data Modeling ETL
Northeastern University September 2024 - December 2025

M.S. Business Analytics (GPA 3.8)
Varahe Analytics March 2024 - June 2024

Data Engineer

Python AWS PostgreSQL MongoDB Tableau Selenium
Indira Gandhi National Open University August 2022 - August 2023

PG Diploma Computer Science
Paisabazaar February 2022 - March 2024

Data Scientist

Python XGBoost CatBoost PySpark Hive Power BI
University of Delhi July 2018 - July 2021

B.Sc. Physics

Projects

⭐ Featured

XGBoost FastAPI AWS ECS MLflow Docker

Healthcare Fraud Detection

End-to-end ML pipeline for Medicare fraud detection across 5,400+ providers. Engineered 44 predictive features, Optuna hyperparameter tuning, FastAPI + Gradio UI, and CI/CD to AWS ECS Fargate.

🎯 95.5% ROC-AUC 🔍 88% Recall

View Code

GCP Terraform dbt Airflow Kestra

NYC Taxi Data Pipeline

Ingested 20M+ taxi trip records into GCS and BigQuery. Provisioned GCP infrastructure with Terraform, built 10+ Kestra workflow orchestrations, and developed dbt models for analytics-ready schemas.

📊 20M+ Records 💰 91% Query Cost Reduction

View Code

LightGBM SHAP Scikit-learn

Insurance Cross-Sell Prediction

LightGBM classification model on 380K customer records with SMOTE for class imbalance. Applied SHAP for model interpretation, identifying key predictors of purchase intent.

📈 85% AUC 🛒 18% Conversion Lift

View Code

Airflow AWS S3 Selenium

ETL Pipeline Orchestration

Scalable ETL pipeline using Selenium + Pandas, storing 1000+ daily records in AWS S3. Automated with Airflow DAGs on EC2 with error handling and retry logic.

✅ 99% Reliability 🔄 Daily Auto-Refresh

View Code

PuLP Streamlit LP Optimization

Supply Chain Optimization

Linear programming model using PuLP to optimize shipping across a 3-warehouse network. Interactive Streamlit dashboard with what-if analysis for scenario planning.

💰 25% Cost Reduction 🏭 3 Warehouses

View Code

Gemini API LangChain LaTeX

Resume Tailoring Automation

Automated resume customization using Gemini API with keyword extraction and ATS optimization. End-to-end pipeline from JD parsing to PDF generation via LaTeX.

⚡ 90% Time Saved 🤖 AI-Powered

Private

Scikit-learn Docker TMDB API

Movie Recommendation System

Content-based engine using KNN with cosine similarity on sparse feature vectors from 3000+ movies. Dockerized Streamlit app with CI/CD via GitHub Actions and TMDB API integration.

🎬 3000+ Movies 🚀 CI/CD Deployed

View Code

SARIMA Monte Carlo Pandas

Inventory Optimization Forecasting

12-week purchase order forecasting model for retail using ensemble SARIMA and Monte Carlo simulation. Risk-based reorder points across 508 ASINs with statistical validation.

🎯 92% Accuracy 📉 35% Fewer Stockouts

Private

Contact

MS Business Analytics from Northeastern (GPA 3.8) with 2+ years at Paisabazaar building ML models on 500K+ samples and scaling products to $11M. Actively seeking Data Scientist, ML Engineer, and Business Analyst roles. Based in Boston, MA - open to relocation.

Let's Connect

Hi, I'm Varun Nayyar,

About Me

Skills

Languages

Frameworks & Libraries

BI & Visualization

Tools

Cloud & Data

Education & Experience

Projects

Healthcare Fraud Detection

NYC Taxi Data Pipeline

Insurance Cross-Sell Prediction

ETL Pipeline Orchestration

Supply Chain Optimization

Resume Tailoring Automation

Movie Recommendation System

Inventory Optimization Forecasting

Contact