Skip to content
View koaning's full-sized avatar

Block or report koaning

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
koaning/README.md
πŸ™‚ Vincent D. Warmerdam
┣━━ πŸ“¦ Open Source Packages
┃   ┣━━ scikit-lego       - lego bricks for sklearn
┃   ┣━━ drawdata          - draw datasets in jupyter
┃   ┣━━ embetter          - embeddings ready for sklearn
┃   ┣━━ uvtrick           - run functions in external venvs via uv
┃   ┣━━ mktestdocs        - turn markdown files into pytest tests
┃   ┣━━ wigglystuff       - extra notebook widgets
┃   ┣━━ mohtml            - Pythonic HTML (for Marimo)
┃   ┣━━ smartfunc         - turns docstrings into LLM-functions
┃   ┣━━ dicekit           - domain specific interface for dice
┃   ┣━━ taskhut           - basic task routing for annotation
┃   ┣━━ diskdantic        - a mini ORM for files on disk
┃   ┣━━ pbt               - domain specific interface for dice
┃   ┣━━ human-learn       - rule-based components for sklearn
┃   ┣━━ doubtlab          - suite of tools to help find bad labels
┃   ┣━━ simsity           - dead simple vector 'database'
┃   ┣━━ lazylines         - lightweight utils for .jsonl wrangling
┃   ┣━━ fh-matplotlib     - matplotlib for FastHTML
┃   ┣━━ fh-altair         - altair for FastHTML
┃   ┣━━ durations         - pytest duration insights
┃   ┣━━ tuilwindcss       - tailwindcss for textual tui apps
┃   ┣━━ sentence-models   - a different take on textscat
┃   ┣━━ memo              - saves a whole log of time
┃   ┣━━ scikit-partial    - partial_fit() pipelines for sklearn
┃   ┗━━ scikit-bloom      - bloom transformers for sklearn
┣━━ πŸ‘ Larger Project Contributions
┃   ┣━━ fairlearn         - contributed the CorrelationFilter
┃   ┣━━ polars            - contributed the .pipe() method
┃   ┗━━ BERTopic          - added lightweight sklearn pipeline support
┣━━ ⭐ Online Projects
┃   ┣━━ calmcode.io       - intermediate developer education
┃   ┗━━ koaning.io        - personal blog
┣━━ πŸŽ™οΈ Popular Talks
┃   ┣━━ Natural Intelligence is All You Need
┃   ┣━━ Group-by statements that save the day
┃   ┣━━ Tools to Improve Training Data
┃   ┣━━ Optimal on Paper, Broken in Reality
┃   ┣━━ Playing by the Rules-Based-Systems
┃   ┣━━ How to Constrain Artificial Stupidity
┃   ┣━━ The Profession of Solving the Wrong Problem
┃   ┣━━ Winning with Simple, even Linear, Models
┃   ┗━━ Untitled12.ipynb
┣━━ πŸ”¬ Random Experiments
┃   ┣━━ narlogs        - logs all dataframe pipelines
┃   ┣━━ scikit-prune   - prune scikit learn pipelines
┃   ┣━━ gitlit         - tracking github action times across open source
┃   ┣━━ sentimany      - many sentiment models, one repo
┃   ┣━━ tokenwiser     - sklearn token tricks
┃   ┣━━ clumper        - functional API for lists of dicts
┃   ┣━━ whatlies       - exploration tools for word embeddings
┃   ┣━━ skedulord      - makes cron a bit more fun
┃   ┣━━ icepickle      - cool and safe storage for linear models
┃   ┣━━ bulk           - simple bulk labelling interface
┃   ┣━━ evol           - grammar for genetic heuristics
┃   ┗━━ flowshow          - over the top logging decorator
┗━━ πŸ‘¨β€πŸ’» Employer
    ┣━━ πŸ€ marimo      - better Python notebooks
    ┃   ┣━━ mofresh           - Refresh marimo cells remotely
    ┃   ┣━━ mopaint           - MS paint notebook widget
    ┃   ┣━━ moterm            - Chainable terminal notebook widget
    ┃   ┣━━ mobuild           - Build Python pkgs from marimo notebook
    ┃   ┣━━ mopad             - Gamepad support for Python notebooks
    ┃   ┣━━ motalk            - Webspeechkit for Python notebooks
    ┃   ┗━━ datasette-marimo  - datasette plugin for marimo
    ┣━━ 🎲 :probabl.   - scikit-learn and friends
    ┃   ┣━━ scikit-churn      - safety rails for churn work
    ┃   ┣━━ scikit-playtime   - rethinking pipelines
    ┃   ┗━━ scikit-mdn        - mixture density networks
    ┣━━ πŸ’₯ Explosion   - developer tools for nlp
    ┃   ┣━━ prodigy-hf        - Prodigy integration for the HuggingFace stack
    ┃   ┣━━ prodigy-pdf       - Annotate PDFs via Prodigy
    ┃   ┣━━ prodigy-ann       - ANN techniques to find relevant subsets
    ┃   ┣━━ prodigy-segment   - Prodigy integration for Segment Anything
    ┃   ┣━━ prodigy-lunr      - Search techniques to find relevant subsets
    ┃   ┣━━ prodigy-whisper   - Transcribe audio with OpenAI's whisper models
    ┃   ┣━━ prodigy-tui       - Prodigy from the terminal
    ┃   ┗━━ cluestar          - inspiration for your first text labels
    ┗━━ πŸ€– Rasa        - conversational software provider
        ┣━━ nlu examples      - custom nlu components for Rasa
        ┣━━ taipo             - data augmentation tools
        ┗━━ algo whiteboard   - nlp education

Follow me on twitter @fishnets88

Pinned Loading

  1. scikit-lego scikit-lego Public

    Extra blocks for scikit-learn pipelines.

    Python 1.4k 124

  2. embetter embetter Public

    just a bunch of useful embeddings for scikit-learn pipelines

    Python 520 16

  3. human-learn human-learn Public

    Natural Intelligence is still a pretty good idea.

    Jupyter Notebook 823 57

  4. drawdata drawdata Public

    Draw datasets from within Python notebooks.

    JavaScript 1.6k 147

  5. wigglystuff wigglystuff Public

    A collection of creative AnyWidgets for Python notebook environments

    JavaScript 166 14

  6. mktestdocs mktestdocs Public

    Run pytest against markdown files/docstrings.

    Python 157 10