Skip to content

About Me

Pratham Kamble

London, UK

Tech + Data Science = Me.

  • I drive meaningful outcomes with every project I touch.

  • I simplify the complex so everyone can grasp it.

  • I create clear, beautiful data visuals.


AI-Powered Competitor Intelligence Tool

Project Overview

When I started this project, my goal was to build an AI tool that could automatically find competitors and analyze how companies interact with their customers online. Here's how I did it, step by step:


System Architecture & Design

First, I designed the system's architecture. I decided to use two main AI agents:

  • Competitor Finder Agent: Finds competitors in a given industry.
  • Customer Interaction Analysis Agent: Analyzes how companies interact with customers.

This modular design allows each agent to focus on a specific task, while also enabling future expansion with more features or data sources.

Two Agents

System Architecture

MCP Architecture System Architecture Diagram


Data Collection & Processing

Next, I started with data from Kaggle—a dataset of about 3 million customer support tweets. Each tweet had details like who wrote it, when, and what it said.

Processing Steps: - Used Apache Spark to process the data. - Filtered for tweets that were customer questions ("inbound") and not replies, to find the start of each conversation. - Randomly selected 10,000 conversations for manageability.

Dataset Head What the data looked like


Conversation Thread Extraction

For each starting tweet, I built the full conversation by following the reply chain. For example, if a customer tweeted "My order is late!" and the company replied, I linked those together, and kept following the thread.

  • Stored conversations in Parquet format for fast access.
  • Used Gemini's embedding model to convert each conversation into a vector (for semantic search).
  • Saved vectors in a special database (PgVector).

Example Conversation 1 Instance of an identified conversation between a user and company (AppleSupport)

Threads Example Transformed data where each row represents an entire conversation thread


AI Agent Prompt Engineering

With the data ready, I focused on the AI agents.

Customer Interaction Analysis Agent

  • Wrote a detailed prompt specifying what to look for: handling negative feedback, public/private replies, tone, helpfulness, etc.
  • Step-by-step instructions: extract info → analyze → write a report with examples.

Competitor Finder Agent

  • Prompted to search for top competitors in a given industry and summarize their strengths.
  • Decides when to use external search tools.
  • Presents results in a clear, structured way.
  • Iteratively tested and refined prompts for clarity and usefulness.

User Interface: Streamlit Dashboard

To make everything user-friendly, I built a dashboard using Streamlit:

  • Sidebar: Manage API keys.
  • Tabs:
    • Analyze customer interactions
    • Find competitors
  • Users can enter their own parameters and see results instantly.

Competitor Dashboard

Customer Dashboard Streamlit Dashboard


Engineering Best Practices

Throughout the project, I followed best practices in software engineering:

  • Docker: Ensures consistent app deployment
  • UV: Dependency management
  • .env files: Secure secrets management
  • Git: Version control
  • Logging: For debugging and monitoring

Docker

Logs


Evaluation & Learnings

Once everything was working, I evaluated the system:

  • Customer Interaction Analysis Agent: Delivered accurate, valuable insights.
  • Competitor Finder Agent: Worked, but results were sometimes inconsistent.
  • RAG Evaluation: Used the RAGAS library, showing strong context precision and recall.

Key Learnings: - Building modular AI systems - Handling big data - Designing effective prompts

Limitations: - Only uses historical Twitter data - Relies on certain APIs - Flexible for future features (e.g., live data, richer analysis tools)


View Full Report Here

Contact Me


Explore My Other Projects

  • Project banner SLT: Social Listening Tool

    Web Scraping BigQuery Data Pipeline Topic ModellingLookerK-MeansGCPVertex AIGemini


    A powerful tool built for Sense Worldwide, an innovation consulting company, that collects and analyzes social media conversations to identify trends and patterns, presenting key findings through easy-to-use interactive charts and reports.


    View Project

  • zaika teaser Zaika: A Food Ordering App

    React Vite Firebase NoSQLGCP


    A food ordering app that served 800+ orders and onboarded 600+ users in a single day, featuring real-time order tracking for our college festival.


    View Project

  • Food Fiesta website

    Food Fiesta: Landing Website

    HTML CSS Javascript ParcelBootstrap


    A Vibrant website promoting our college's Food Fiesta event and our new food ordering app, with details about the festival, featured food items, and easy ways to order through the app.


    View Project

  • Project banner Instacart Reorder Prediction

    XGBoost EDA Python Data VisualizationMachine Learning


    Leveraged XGBoost and customer purchase history to predict product reorder probability with 70% accuracy, analyzing 3 million orders and 50,000 products to help stores manage inventory better and improve the shopping experience.


    View Project

  • LearnSBAR promo LearnSBAR: Training Platform

    React Vite Typescript AWSDynamoDBVoice Transcription


    A training platform that helps nurses practice and improve their patient handoff communication skills through practice scenarios, instant feedback, and progress tracking. Features voice recording capabilities that automatically convert speech to text for easier review.


    View Project

  • Project banner Udemy Enrollment Prediction

    Web Scraping Machine Learning Python PandasRegressionRandom ForestHyparameter Tuning


    Built a predictive model analyzing 9000+ Udemy courses to forecast enrollment numbers using features like course pricing, content length, and instructor ratings. Used Random Forest regression to help course creators optimize their offerings.


    View Project

  • Project banner AI Competitor Intelligence Tool

    RAG Gen AI LLM MCPStreamlitRAG EvaluationSpark


    Designed an AI RAG system to analyze ~3 million tweets, understanding social media customer support. Optimized Python pipeline by converting it to Spark, reducing processing time from 2 hours to 5mins! Built a user-friendly web interface for the tool using Streamlit.


    View Project

  • Project banner Real Time Object Detection

    OpenCV YOLOv8 Deep Learning PythonData AugmentationDataset Generation


    Built a real-time object detection system at BARC Robotics using YOLOv8 and OpenCV. Calibrated cameras for position measurement and improved accuracy by training on real and synthetic images.


    View Project

  • Project banner Azure Data Lake + ETL Pipeline

    Azure Databricks ETL ADLS Gen2Data LakeSpark


    A modern data platform on Azure cloud that processes e-commerce data through automated pipelines. Azure Data Factory and Databricks transform raw data into clean, organized layers. Data marts implemented through DBT.


    View Project