Skip to content
View waqasm86's full-sized avatar

Block or report waqasm86

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. llcuda/llcuda llcuda/llcuda Public

    CUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small GGUF models (1B-5B) on dual Tesla T4 GPUs (15GB each, SM 7.5)

    Jupyter Notebook 8

  2. Ubuntu-Cuda-Llama.cpp-Executable Ubuntu-Cuda-Llama.cpp-Executable Public

    Pre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.

    Python 1

  3. cuda-nvidia-systems-engg cuda-nvidia-systems-engg Public

    Production-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…

    C++

  4. cuda-tcp-llama.cpp cuda-tcp-llama.cpp Public

    High-performance TCP inference gateway with epoll async I/O for CUDA-accelerated LLM serving. Binary protocol, connection pooling, streaming responses. Zero dependencies beyond POSIX and CUDA.

    C++

  5. cuda-llm-storage-pipeline cuda-llm-storage-pipeline Public

    Content-addressed LLM model distribution with SHA256 verification and SeaweedFS integration. Distributed storage, manifest management, LRU caching, and integrity checking for GGUF models.

    C++

  6. llcuda/llcuda.github.io llcuda/llcuda.github.io Public

    This is a github pages website for my llcuda python sdk project

    Python