Skip to content
View t-timms's full-sized avatar
💭
God is the source code
💭
God is the source code
  • Dallas-Fort Worth, TX
  • 11:48 (UTC -05:00)

Block or report t-timms

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
t-timms/README.md

Tremayne Timms

ML & AI Engineer — Fine-Tuning · Agentic Systems · Edge Deployment · Production LLM Ops

LinkedIn GitHub followers GitHub stars


About Me

I build production LLM systems from the metal up — from quantized models running on Jetson edge hardware to multi-agent cloud deployments with tool-use, permission gating, and audit trails. Currently focused on MoE fine-tuning, Blackwell-native FP4 quantization (NVFP4), and SOTA agentic coding benchmarks.

Dallas-Fort Worth, TX · ttimmsinternational@gmail.com

Python Rust TypeScript PyTorch CUDA Docker GitHub Actions PostgreSQL


Current Focus (May 2026)

Project What Why It Matters
llama.cpp NVFP4 Blackwell-native FP4 quantization with MSE-optimal scales First consumer NVFP4 tooling on RTX 5070 Ti — PR #22897 awaiting upstream review

What I'm Building

Security-first open-source coding agent. Hand-rolled async ReAct loop with 4-tier deny-first permission engine, SHA-256 hash-chained audit trail, and 200+ LLM providers via LiteLLM. 854 tests.

  • 30+ built-in tools with JSON Schema validation, MCP server + client
  • Parallel + speculative tool dispatch, cost budget enforcement
  • Self-evolution via LLM-guided mutations, multi-language verify gate with retry
  • Training data export (openai/chatml/sharegpt), per-step reward annotations for GRPO
  • SWE-bench Lite: 34.8% single-shot · 52.2% oracle best-of-5

Autonomous multi-agent personal intelligence system on NVIDIA Jetson Orin Nano. 5 LangGraph expert agents, LiteLLM gateway (4 providers + Ollama), 3-tier ONNX intent router. 393 tests. Fully on-device — zero cloud dependencies.

Multi-agent algorithmic trading pipeline with DeepSeek R1 reasoning at every stage. 4-agent pipeline (TA → Chief → Risk → Execution), Kelly Criterion position sizing, Monte Carlo risk simulation, real-time WebSocket market data.

Qwen3.5-4B fine-tuned with ORPO for biblical Q&A. Hybrid RAG (ChromaDB + BM25 + cross-encoder reranking), constitutional AI guardrails, voice pipeline (Whisper + Kokoro TTS), Gradio UI. 183 tests, 34 W&B runs, 5,925 training steps.

Comprehensive GPU fleet validation modeled on NVIDIA DCGM. 16 diagnostic modules, Prometheus + Grafana, fault injection, JUnit XML for CI. 188 tests.

ML research control plane — experiment lifecycle management, model registry, cloud training launcher. Orchestrates gpu-server-test-suite (preflight checks) and llm-wiki (knowledge persistence). 28 tests, v0.1.0.

Git-backed knowledge base — Karpathy's LLM Wiki pattern. LangGraph ingest/query pipelines, instructor + Pydantic structured output, BM25 search, Groq → Gemini → Ollama fallback via LiteLLM. 117 tests, 40 wiki pages.

SQL + Python ETL pipeline for semiconductor quality analysis — supplier performance scoring, defect Pareto distributions, yield trend analysis.

Multi-model ML pipeline for Tesla tire wear prediction. Random Forest, XGBoost, Neural Network ensemble with Claude AI integration.


Open Source Contributions

  • llama.cpp #22897 — NVFP4 default type mapping + per-tensor scale tensors + MSE-optimal correction
  • llama.cpp #22858 — Missing LLAMA_FTYPE_MOSTLY_NVFP4 case fix (closed, replaced by #22897)

GitHub Activity


📈 Contribution Graph

Skills

Area Technologies
LLMs & Agents LiteLLM, 200+ providers, Ollama, llama.cpp, multi-agent orchestration, ReAct loops
Fine-Tuning Unsloth, TRL (SFT/DPO/GRPO/ORPO), QLoRA, PEFT, MoE architectures, RLHF/RLAIF
Inference vLLM (custom forks), speculative decoding (750 tok/s), TensorRT-LLM, EXL2
Quantization NVFP4 (Blackwell-native), GGUF, EXL2, FP8, NF4, GPTQ, AWQ
ML Infrastructure PyTorch, CUDA 12.8, torch.compile, DeepSpeed, lm-eval, W&B, MLflow
Systems Python, Rust, TypeScript, Docker, GitHub Actions CI/CD, systemd
Edge / Hardware NVIDIA Jetson Orin Nano, RTX 5070 Ti (Blackwell sm_120), 16 GB VRAM optimization
Data PostgreSQL, SQL, pandas, SQLAlchemy, ChromaDB, LanceDB, BM25

Tremayne Timms · GitHub · LinkedIn · Email

Pinned Loading

  1. bible-ai-assistant bible-ai-assistant Public

    Bible Q&A — Qwen3.5-4B fine-tuned with ORPO, hybrid RAG, constitutional AI guardrails, voice pipeline

    Python

  2. godspeed-coding-agent godspeed-coding-agent Public

    Alpha — Security-first open-source coding agent. 4-tier permissions, hash-chained audit trails, 200+ LLM providers. Seeking testers.

    Python

  3. sovereign-edge sovereign-edge Public

    Sovereign Edge Personal Intelligence System — Jetson Nano Super

    Python

  4. manna-trading manna-trading Public

    Multi-agent AI crypto trading — 4-agent pipeline (TA → Chief → Risk → Execution), DeepSeek R1 reasoning, Kelly Criterion, Monte Carlo, WebSocket

    TypeScript

  5. gpu-server-test-suite gpu-server-test-suite Public

    GPU Server Diagnostic Test Suite — modeled on NVIDIA DCGM architecture

    Python

  6. tesla-tire-wear-ml tesla-tire-wear-ml Public

    Tesla tire wear prediction — ML models (Random Forest, XGBoost, Ensemble) with Claude AI analysis for tire longevity insights

    Jupyter Notebook