Hi, I'm Varun Nayyar,

MS Business Analytics @ Northeastern · Scaled a product from $1.2M to $11M with ML

Scroll

About Me

Varun Nayyar

Data Scientist with 2+ years of experience shipping ML models, data pipelines, and analytics solutions that drive real business impact. At Paisabazaar (India's largest credit marketplace), I built XGBoost and CatBoost models on 500K+ samples that increased customer acquisition by 28% and reduced loan defaults by 32%, and helped scale a financial product from $1.2M to $11M in 3 months. I've processed 30M+ daily records, built PySpark data marts, and deployed end-to-end ML systems on AWS ECS Fargate. MS in Business Analytics from Northeastern University (GPA 3.8).

0+ Years Experience
0+ Projects Shipped
0% Customer Acq. Lift
0% Default Reduction
3.8/4.0 GPA @ Northeastern
Open to Work, Data Scientist · ML Engineer · Business Analyst

Skills

Languages

Python R SQL Spark Java JSON

Data manipulation, scripting, and database querying

Frameworks & Libraries

NumPy Pandas Scikit-learn XGBoost LightGBM PyTorch HuggingFace LangChain

Machine learning, deep learning, and AI frameworks

BI & Visualization

Tableau Power BI Looker Matplotlib Seaborn Alteryx

Interactive dashboards, KPI tracking, and data storytelling

Tools

Git Bash Excel VS Code Jupyter Streamlit Board ERP

Version control, IDEs, and development environments

Cloud & Data

AWS GCP Snowflake BigQuery Databricks dbt Airflow

Cloud infrastructure, data warehousing, and ETL pipelines

Education & Experience

For more information, have a look at my curriculum vitae .

Projects

NYC Taxi Data Pipeline
GCP Terraform dbt Airflow Kestra

NYC Taxi Data Pipeline

Ingested 20M+ taxi trip records into GCS and BigQuery. Provisioned GCP infrastructure with Terraform, built 10+ Kestra workflow orchestrations, and developed dbt models for analytics-ready schemas.

📊 20M+ Records 💰 91% Query Cost Reduction
Insurance Cross-Sell Prediction
LightGBM SHAP Scikit-learn

Insurance Cross-Sell Prediction

LightGBM classification model on 380K customer records with SMOTE for class imbalance. Applied SHAP for model interpretation, identifying key predictors of purchase intent.

📈 85% AUC 🛒 18% Conversion Lift
ETL Pipeline Orchestration
Airflow AWS S3 Selenium

ETL Pipeline Orchestration

Scalable ETL pipeline using Selenium + Pandas, storing 1000+ daily records in AWS S3. Automated with Airflow DAGs on EC2 with error handling and retry logic.

✅ 99% Reliability 🔄 Daily Auto-Refresh
Supply Chain Optimization
PuLP Streamlit LP Optimization

Supply Chain Optimization

Linear programming model using PuLP to optimize shipping across a 3-warehouse network. Interactive Streamlit dashboard with what-if analysis for scenario planning.

💰 25% Cost Reduction 🏭 3 Warehouses
Resume Tailoring Automation
Gemini API LangChain LaTeX

Resume Tailoring Automation

Automated resume customization using Gemini API with keyword extraction and ATS optimization. End-to-end pipeline from JD parsing to PDF generation via LaTeX.

⚡ 90% Time Saved 🤖 AI-Powered
Movie Recommendation System
Scikit-learn Docker TMDB API

Movie Recommendation System

Content-based engine using KNN with cosine similarity on sparse feature vectors from 3000+ movies. Dockerized Streamlit app with CI/CD via GitHub Actions and TMDB API integration.

🎬 3000+ Movies 🚀 CI/CD Deployed
Inventory Optimization
SARIMA Monte Carlo Pandas

Inventory Optimization Forecasting

12-week purchase order forecasting model for retail using ensemble SARIMA and Monte Carlo simulation. Risk-based reorder points across 508 ASINs with statistical validation.

🎯 92% Accuracy 📉 35% Fewer Stockouts

Contact

MS Business Analytics from Northeastern (GPA 3.8) with 2+ years at Paisabazaar building ML models on 500K+ samples and scaling products to $11M. Actively seeking Data Scientist, ML Engineer, and Business Analyst roles. Based in Boston, MA - open to relocation.