Skip to content

Instacart Reorder Prediction

About Me

Pratham Kamble

London, UK

Tech + Data Science = Me.

  • I drive meaningful outcomes with every project I touch.

  • I simplify the complex so everyone can grasp it.

  • I create clear, beautiful data visuals.


Instacart Reorder Classification

Objective:

Predicts whether a previously purchased product will be reordered in a customer's next order


1. Huge Dataset: 1.3 million orders, 50,000 products, 3.4 million past total orders

Dataset Overview



2. Bananas are the most reordered product—everyone loves bananas!

Banana Popularity Chart


3. Shopping peaks on weekends and between 10 AM–4 PM

Order Timing Chart


4. Fresh fruits, vegetables, and dairy are the most frequently reordered aisles

Top Aisles Chart


5. Items added to the cart first are much more likely to be reordered

Add-to-Cart Order Chart


6. XGBoost model predicts reorders with 80% recall—very few missed repeat buys

XGBoost Performance


7. Feature engineering (user habits + product loyalty) boosted accuracy to 71%

Feature Engineering Impact


8. Neural network was slower and didn't beat XGBoost for this task

Model Comparison


9. SHAP analysis: Product name, reorder history, and cart position are most important

SHAP Feature Importance


10. Model struggles most with "sometimes" products—not always or never reordered

Error Analysis


Future Improvement

  • While XGBoost performed well, hyperparameter tuning could further improve performance. Techniques such as Bayesian optimization or grid search could refine learning rates, tree depth, and regularization parameters to enhance generalization.
  • Threshold optimization may also help reduce misclassifications in borderline cases.
  • Additionally, segmentation-based modeling, where separate models are trained for high-reorder and low-reorder products, could better capture different shopping behaviors.
View Full Report Here


Contact Me

☎️: +44 78189 61950

📧: prathamskk@gmail.com

Linkedin: www.linkedin.com/in/prathamskk/


Explore My Other Projects

  • Project banner SLT: Social Listening Tool

    Web Scraping BigQuery Data Pipeline Topic ModellingLookerK-MeansGCPVertex AIGemini


    A powerful tool built for Sense Worldwide, an innovation consulting company, that collects and analyzes social media conversations to identify trends and patterns, presenting key findings through easy-to-use interactive charts and reports.


    View Project

  • zaika teaser Zaika: A Food Ordering App

    React Vite Firebase NoSQLGCP


    A food ordering app that served 800+ orders and onboarded 600+ users in a single day, featuring real-time order tracking for our college festival.


    View Project

  • Food Fiesta website

    Food Fiesta: Landing Website

    HTML CSS Javascript ParcelBootstrap


    A Vibrant website promoting our college's Food Fiesta event and our new food ordering app, with details about the festival, featured food items, and easy ways to order through the app.


    View Project

  • Project banner Instacart Reorder Prediction

    XGBoost EDA Python Data VisualizationMachine Learning


    Leveraged XGBoost and customer purchase history to predict product reorder probability with 70% accuracy, analyzing 3 million orders and 50,000 products to help stores manage inventory better and improve the shopping experience.


    View Project

  • LearnSBAR promo LearnSBAR: Training Platform

    React Vite Typescript AWSDynamoDBVoice Transcription


    A training platform that helps nurses practice and improve their patient handoff communication skills through practice scenarios, instant feedback, and progress tracking. Features voice recording capabilities that automatically convert speech to text for easier review.


    View Project

  • Project banner Udemy Enrollment Prediction

    Web Scraping Machine Learning Python PandasRegressionRandom ForestHyparameter Tuning


    Built a predictive model analyzing 9000+ Udemy courses to forecast enrollment numbers using features like course pricing, content length, and instructor ratings. Used Random Forest regression to help course creators optimize their offerings.


    View Project

  • Project banner AI Competitor Intelligence Tool

    RAG Gen AI LLM MCPStreamlitRAG EvaluationSpark


    Designed an AI RAG system to analyze ~3 million tweets, understanding social media customer support. Optimized Python pipeline by converting it to Spark, reducing processing time from 2 hours to 5mins! Built a user-friendly web interface for the tool using Streamlit.


    View Project

  • Project banner Real Time Object Detection

    OpenCV YOLOv8 Deep Learning PythonData AugmentationDataset Generation


    Built a real-time object detection system at BARC Robotics using YOLOv8 and OpenCV. Calibrated cameras for position measurement and improved accuracy by training on real and synthetic images.


    View Project

  • Project banner Azure Data Lake + ETL Pipeline

    Azure Databricks ETL ADLS Gen2Data LakeSpark


    A modern data platform on Azure cloud that processes e-commerce data through automated pipelines. Azure Data Factory and Databricks transform raw data into clean, organized layers. Data marts implemented through DBT.


    View Project