Instacart Reorder Prediction
About Me
Pratham Kamble
London, UK
Tech + Data Science = Me.
-
I drive meaningful outcomes with every project I touch.
-
I simplify the complex so everyone can grasp it.
-
I create clear, beautiful data visuals.
Instacart Reorder Classification
Objective:
Predicts whether a previously purchased product will be reordered in a customer's next order
1. Huge Dataset: 1.3 million orders, 50,000 products, 3.4 million past total orders
2. Bananas are the most reordered product—everyone loves bananas!
3. Shopping peaks on weekends and between 10 AM–4 PM
4. Fresh fruits, vegetables, and dairy are the most frequently reordered aisles
5. Items added to the cart first are much more likely to be reordered
6. XGBoost model predicts reorders with 80% recall—very few missed repeat buys
7. Feature engineering (user habits + product loyalty) boosted accuracy to 71%
8. Neural network was slower and didn't beat XGBoost for this task
9. SHAP analysis: Product name, reorder history, and cart position are most important
10. Model struggles most with "sometimes" products—not always or never reordered
Future Improvement
- While XGBoost performed well, hyperparameter tuning could further improve performance. Techniques such as Bayesian optimization or grid search could refine learning rates, tree depth, and regularization parameters to enhance generalization.
- Threshold optimization may also help reduce misclassifications in borderline cases.
- Additionally, segmentation-based modeling, where separate models are trained for high-reorder and low-reorder products, could better capture different shopping behaviors.
Contact Me
Linkedin: www.linkedin.com/in/prathamskk/
Explore My Other Projects
-
Web Scraping BigQuery Data Pipeline Topic ModellingLookerK-MeansGCPVertex AIGemini
A powerful tool built for Sense Worldwide, an innovation consulting company, that collects and analyzes social media conversations to identify trends and patterns, presenting key findings through easy-to-use interactive charts and reports.
-
React Vite Firebase NoSQLGCP
A food ordering app that served 800+ orders and onboarded 600+ users in a single day, featuring real-time order tracking for our college festival.
-
Food Fiesta: Landing Website
HTML CSS Javascript ParcelBootstrap
A Vibrant website promoting our college's Food Fiesta event and our new food ordering app, with details about the festival, featured food items, and easy ways to order through the app.
-
XGBoost EDA Python Data VisualizationMachine Learning
Leveraged XGBoost and customer purchase history to predict product reorder probability with 70% accuracy, analyzing 3 million orders and 50,000 products to help stores manage inventory better and improve the shopping experience.
-
React Vite Typescript AWSDynamoDBVoice Transcription
A training platform that helps nurses practice and improve their patient handoff communication skills through practice scenarios, instant feedback, and progress tracking. Features voice recording capabilities that automatically convert speech to text for easier review.
-
Web Scraping Machine Learning Python PandasRegressionRandom ForestHyparameter Tuning
Built a predictive model analyzing 9000+ Udemy courses to forecast enrollment numbers using features like course pricing, content length, and instructor ratings. Used Random Forest regression to help course creators optimize their offerings.
-
AI Competitor Intelligence Tool
RAG Gen AI LLM MCPStreamlitRAG EvaluationSpark
Designed an AI RAG system to analyze ~3 million tweets, understanding social media customer support. Optimized Python pipeline by converting it to Spark, reducing processing time from 2 hours to 5mins! Built a user-friendly web interface for the tool using Streamlit.
-
OpenCV YOLOv8 Deep Learning PythonData AugmentationDataset Generation
Built a real-time object detection system at BARC Robotics using YOLOv8 and OpenCV. Calibrated cameras for position measurement and improved accuracy by training on real and synthetic images.
-
Azure Data Lake + ETL Pipeline
Azure Databricks ETL ADLS Gen2Data LakeSpark
A modern data platform on Azure cloud that processes e-commerce data through automated pipelines. Azure Data Factory and Databricks transform raw data into clean, organized layers. Data marts implemented through DBT.