Skip to content
View vivek12345's full-sized avatar

Block or report vivek12345

Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
vivek12345/README.md

๐Ÿ‘‹ Hi, I'm Vivek Nayyar

๐Ÿš€ Engineering Leader | ๐Ÿง  AI Builder

Iโ€™m an engineering leader and hands-on AI/ML builder, focused on deeply understanding and implementing the nuts and bolts of modern LLMs. From training tokenizers and building BPE from scratch, to implementing transformers line-by-line, I love demystifying AIโ€”one project and one workshop at a time.

๐Ÿง  Projects & Experiments

Project What I Did
๐Ÿงฉ Byte Pair Encoding (BPE) from Scratch Wrote a custom BPE tokenizer in Python with support for special tokens, regex splitting, and vocab merging.
๐Ÿง  LLM from Scratch Implemented a transformer-based LLM (embedding โ†’ attention โ†’ MLP โ†’ logits) using only PyTorch. Includes training loop, sampling, and inference.
๐Ÿฆ™ Agentic RAG Pipeline Built end-to-end Retrieval-Augmented Generation workflows using LangChain, DuckDB, FAISS, and streaming token-by-token inference.
๐Ÿ“Š Text-to-SQL for CSVs Built a system to parse natural language queries into SQL and run them over uploaded CSVs. Added Vespa-style search for LIKE queries.
๐Ÿ‡ฎ๐Ÿ‡ณ Hindi Tokenizer (WIP) Training a BPE tokenizer from scratch on Hindi corpora to enable better subword tokenization for Indian languages.
๐Ÿ” Secure LLM Workflows Integrated Cloudflare Zero Trust, IP whitelisting, and API key validation in LangChain-based pipelines.
๐Ÿ“ฆ SmartInvestReturns A personal finance site to calculate SIP, retirement corpus, and mutual fund strategies. Built with Next.js & TypeScript.

๐ŸŽ“ Workshops & Knowledge Sharing

  • ๐ŸŽฅ YouTube Channel โ†’ @locallobaat I create short explainers and tutorials on AI topics like tokenization, transformers, and building your own RAG pipeline.
    Recent videos include โ€œNo code whatsapp botโ€ and โ€œChat with any CSV using langchainโ€

  • ๐Ÿง  RAG Beyond Basics Workshop
    Covers advanced topics like agentic workflows, text-to-SQL, streaming outputs, observability, and PII-safe deployments.
    Delivered at internal events, React Summit 2024, and community meetups.


๐Ÿ”— Let's Connect


Pinned Loading

  1. bpe-tokenizer bpe-tokenizer Public

    A pure Python implementation of Byte Pair Encoding (BPE) tokenization, inspired by GPT-4's tokenization approach

    Python 1

  2. gpt2-from-scratch gpt2-from-scratch Public

    A clean, educational implementation of GPT-2 built from scratch using PyTorch. This project demonstrates the architecture and training of transformer-based language models.

    Python

  3. moe-with-gqa moe-with-gqa Public

    Implementation of mixture of experts with grouped query attention

    Python

  4. llama-with-gqa-and-rope llama-with-gqa-and-rope Public

    Implementation of llama models with GQA and RoPE

    Python

  5. fast-api-with-next-ai-sdk fast-api-with-next-ai-sdk Public

    Fast api sever with ai sdk v15 from next js

    JavaScript 2 1

  6. mini-rag mini-rag Public

    Lightweight RAG library with Milvus vector store

    Python 2