Skip to content
View sherozshaikh's full-sized avatar

Block or report sherozshaikh

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sherozshaikh/README.md

Hey there, I'm Sheroz πŸ‘‹

Machine Learning Engineer & Data Scientist
Building production ML systems β€” from LLM-powered automation to healthcare AI

LinkedIn Email GitHub


πŸš€ About Me

  • πŸŽ“ M.S. Data Science @ Worcester Polytechnic Institute (WPI) | GPA: 3.9/4.0
  • πŸ† Best Data Science Project award winner (1st place out of 20+ teams) β€” healthcare ML project
  • πŸ₯ 5+ years building production ML systems across healthcare, fintech, and IoT
  • πŸ€– Passionate about LLMs, semantic search, and ML pipeline automation
  • πŸ“¦ Open-source contributor β€” published 4 Python packages on PyPI
  • πŸ“ Boston, MA

πŸ—οΈ What I've Built

  • LLM-Powered Ticket Routing β€” Claude API-based system automating 40% of classification workflows, saving ~$700/month in operational costs
  • ICD-10 Medical Coding System β€” Production LLM serving 10+ enterprise healthcare clients, processing 100K+ monthly requests
  • Semantic Search Platform β€” Vector embeddings over 940K healthcare documents, delivering ~$80K/month in operational savings
  • ML Document Classifier β€” Production classifier automating 80% of daily document triage (900+ docs) with 99%+ uptime
  • Time-Series Forecasting β€” PyTorch pipeline predicting equipment failures 30 days in advance
  • LoRA Fine-Tuning Pipeline β€” End-to-end text classification with parameter-efficient fine-tuning and reproducible benchmarking

🧰 Tech Stack

AI & ML Frameworks

PyTorch Scikit--learn HuggingFace LangChain XGBoost

LLMs & Vector Search

Claude OpenAI FAISS Pinecone Chroma

Data Engineering & ETL

PySpark Airflow Polars SQL

Production & MLOps

FastAPI Docker AWS MLflow GitHub Actions Prometheus

Languages

Python SQL Linux


πŸ“ˆ Highlights

  • πŸ₯ Deployed production LLM for ICD-10 medical coding serving 10+ enterprise healthcare clients
  • πŸ” Built semantic search over 940K documents, saving ~$80K/month in operational costs
  • ⚑ Automated 80% of daily document triage with ML classifier (900+ docs/day)
  • πŸ“Š Optimized PySpark ETL for 15M+ Medicare records β€” 75% fewer data scans, 58% faster queries
  • πŸ“¦ Published 4 open-source Python packages on PyPI for ML pipeline tooling
  • πŸ† 1st place β€” WPI Best Data Science Project (Winter 2024)

GitHub Streak

Profile Views


πŸ’¬ Let's connect β€” always happy to chat about ML engineering, LLMs, healthcare AI, or open-source!

Pinned Loading

  1. mini-rainbow-dqn mini-rainbow-dqn Public

    End-to-end deep reinforcement learning platform for Atari Breakout. Implements DQN, Double DQN, Dueling Networks, and Prioritized Experience Replay with side-by-side agent comparison, live inferenc…

    Python

  2. agentic-rag-eval agentic-rag-eval Public

    Component ablation study of an agentic RAG pipeline for multi-hop QA on 5K HotpotQA distractor questions. Evaluates query decomposition, iterative retrieval, Qdrant dense/sparse/RRF retrieval, cros…

    Python

  3. pageclassifier pageclassifier Public

    Gemini-powered page classifier that decides whether a document page image contains invoice line items. Designed to sit between paperflight and any downstream extractor to reduce expensive LLM calls…

    Python

  4. predictive-maintenance-platform predictive-maintenance-platform Public

    End-to-end ML platform for turbofan engine RUL forecasting, failure classification, and anomaly detection using NASA CMAPSS FD001 dataset

    Python

  5. retail-demand-allocator retail-demand-allocator Public

    End-to-end ML platform for retail demand forecasting and marketing budget optimization using UCI Online Retail dataset

    Jupyter Notebook

  6. spam-email-classification-lora spam-email-classification-lora Public

    Spam Email Classification using LoRA Fine-tuned Transformers: High-performance spam email classification using LoRA-adapted transformer models (ELECTRA, RoBERTa). Achieves 99.4%+ accuracy with para…

    Jupyter Notebook