Skip to content
View karanmrn's full-sized avatar
🎯
skillmaxxxing|Sidequestmaxxxing|Retardmaxxxing|Peptidemaxxxing
🎯
skillmaxxxing|Sidequestmaxxxing|Retardmaxxxing|Peptidemaxxxing

Block or report karanmrn

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
karanmrn/README.md

header

Hi hi, I'm Karan

Senior Data Engineer · AI Systems

I build data platforms that reason. Six years turning messy source systems into
foundations people trust — now working where data engineering meets AI.

linkedin x medium instagram


The belief underneath everything: elegant simplicity beats complex perfection. A pipeline you can explain in two sentences will outlive a clever one you can't. I'd rather understand why something works from first principles than memorise that it does.


How I think about the work

  • 🪶 Simplicity is the hard part. Anyone can add. The skill is knowing what to remove.
  • Action teaches what planning can't. I learn more shipping a rough thing on Monday than designing the perfect one for a month.
  • 🧱 First principles, always. When I hit a gap, I learn it from the foundation — not the tutorial.
  • 🤝 Build for the next person. Readable beats clever. The system should explain itself.

What I do

I'm a Senior Data Engineer at Pets Choice, where I built a greenfield Snowflake lakehouse — medallion architecture, dbt, Apache Iceberg, and Azure Data Factory. Lately I've been pushing into the layer above the warehouse: making data accessible through natural language and agents — Cortex semantic layers, Document AI, RAG, and MCP-compatible access for LLM tooling.

What I know

Data Platform & Warehousing

Snowflake Databricks dbt Delta Lake

Snowflake (Iceberg, Dynamic Tables, Snowpipe Streaming, Cortex) · Delta Lake · medallion architecture · Kimball dimensional modelling

Pipelines & Orchestration

Airflow Azure Data Factory dbt

Airflow · Azure Data Factory · dbt · Great Expectations · CI/CD for data (GitHub Actions, Azure DevOps)

Streaming & Big Data

Kafka Apache Spark Event Hubs

Kafka · Spark (batch & Structured Streaming) · Azure Event Hubs · PySpark optimisation (partitioning, broadcast joins)

AI & LLM Systems

Python FastAPI

Snowflake Cortex · RAG pipelines · Document AI / intelligent extraction · MCP (Model Context Protocol) servers · vector search · prompt & context engineering · agentic tooling with Claude Code

Cloud & Infra

Azure Docker Terraform GitHub Actions

Azure (ADLS Gen2, Synapse, Event Hubs) · Docker · Terraform · cost & performance optimisation

Languages & Analytics

SQL Python Bash Power BI

SQL (deep) · Python · Bash · Power BI

What I'm building

  • 🏠 london.rent — a PropTech rental platform built on that infrastructure, with neighbourhood intelligence.

Currently going deeper on

Agent architectures and evals · MLOps & deployment (Kubernetes, Terraform) · system design · the boundary where the semantic layer meets LLMs.

Background

Pets Choice — Senior Data Engineer · greenfield Snowflake lakehouse, Cortex + Document AI + MCP Tenacium DC — Data Engineer Everest — Data Engineer (consumer appliances) · cut annual pipeline compute costs ~$10k via Spark optimisation Orkash — Data Analyst · Airflow, Kimball star schemas, ~100GB daily volumes MSc Data Science & Analytics, Cardiff University


GitHub Stats

Karan's GitHub stats

Top languages

GitHub streak


Off the clock: Formula 1 (Hamilton's the GOAT, Forza Ferrari 🏎️), basketball, festivals and raves, and long walks where most of my actual thinking happens. The Weeknd on repeat.

Pinned Loading

  1. StockMarketAnalysis StockMarketAnalysis Public

    Jupyter Notebook

  2. Youtube Youtube Public

    Jupyter Notebook

  3. Machine-Learning-Image-Classificartion Machine-Learning-Image-Classificartion Public

    MACHINE LEARNING PROJECT

    Jupyter Notebook

  4. CreditRiskAnalysis CreditRiskAnalysis Public

    Jupyter Notebook

  5. HousePricePrediction HousePricePrediction Public

    Machine Learning project on Predicting House Price from Kaggle

    Jupyter Notebook

  6. CreditFraudDetection CreditFraudDetection Public

    Jupyter Notebook