Skip to content

spock74/Foundations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Foundations Core: Information Theory & Neurosymbolic NLP

** Status: Incipient Personal Research Project** This repository hosts experimental code for researching information theory applications in Natural Language Processing (NLP), attempting to bridge the gap between classical statistical methods (Shannon/Markov) and modern Neurosymbolic AI.

Project & Vision

For now it's still a simple scaffolding. The goal is to build a high-performance backend (using Common Lisp) that provides foundational metrics for text analysis. The vision is driven by construct 'ald-fashined' (so to speak) like classical statistical methods such as precise entropy, divergence metrics, and several others NLP tools, and nkowledge from causality (SCM, others), graph theory, and linguistics formalizations theories as pipes connecting local-running small LLM models (sLLM).

By integrating this approach we hope to steer LLM generation, like hallucination detection, and knowledge extraction from texts.


Research Roadmap

This is a living document of the research trajectory.

Phase 1: Statistical Foundations (Current)

  • Shannon Entropy: Measure raw information content of text sequences.
  • Markov Entropy: Measure predictability of text given order-$N$ history.
  • Kullback-Leibler Divergence: Quantify "surprise" or information loss when approximating one text distribution with another.
  • High-Performance Backend: Implement core logic in SBCL Typescript/Common Lisp with an HTTP API.

Phase 2: Visual Workbench (Frontend) (Next)

  • Web Interface: Build a modern dashboard using Next.js + ShadcnUI.
  • Interactive Playground: Real-time entropy calculation as the user types.
  • Visualizations:
    • Heatmaps: Color-code words based on their "surprise" (entropy contribution).
    • Transition Graphs: Visualize Markov chains using generic-graph-adapter/Mermaid.
    • Divergence Plot: Compare two texts side-by-side visually (KL-Divergence).

Phase 3: Advanced NLP Experiments

Already under fully development into another project to be ported here

  • Dynamic N-gram Analysis: Detection of phrase boundaries using entropy spikes (Perplexity).
  • Syntactic Entropy: Measuring entropy over Part-of-Speech tags rather than raw tokens.
  • Text Classification: Zero-shot classification using compression-based distances (NCD) or KL-Divergence.

Phase 4: Local Small Language Models (sLLM)

  • Local Inference: Integrate bindings for local inference (llama.cpp or similar) to run models like Phi-3, Gemma-2B, or TinyLlama locally.
  • Neurosymbolic Grounding: Use the calculated Entropy/KL metrics to "vet" or "rank" generations from the sLLM.
  • Constrained Decoding: Use grammar-constrained decoding (GBNF) driven by Lisp logic.

Phase 5: Neurosymbolic Lisp Middleware (Orchestrator)

  • Proactive Semantic Engine: Middleware layer combining classical semantic embeddings with symbolic Lisp logic.
  • Macro-Driven Inference: Utilize Lisp's homoiconicity to automatically expand captured knowledge into executable inference rules via macros.
  • Deterministic Grounding: Provide sLLMs with rigid tool calls, persistent memory, and boolean logic validation to prevent hallucinations.
  • Knowledge Extraction: Automated conversion of unstructured text into symbolic knowledge graphs (S-expressions).

Phase 6: Optimization & DSPy

  • Prompt Engineering as Code: Adopt DSPy concepts to optimize prompts programmatically.
  • Teleprompter Implementation: Build a Lisp-based optimizer that attempts to "compile" vague intents into optimal prompts by measuring metric improvements.
  • Fine-tuning: Finetune sLLMs on "high-entropy" synthetic data generated to maximize reasoning capabilities.

Usage

Dependencies

  • SBCL (Steel Bank Common Lisp)
  • Quicklisp
  • libev (via Homebrew/apt)

Running the Server

Start the interactive REPL and load the system:

;; In SBCL REPL
(ql:quickload :foundations-core)
(foundations:start :port 8080)

Testing the API

Entropy:

curl -X POST http://127.0.0.1:8080/api/science/entropy \
     -H "Content-Type: application/json" \
     -d '{"text": "BANANA", "order": 1}'

KL-Divergence (Information Loss):

curl -X POST http://127.0.0.1:8080/api/science/divergence \
     -H "Content-Type: application/json" \
     -d '{"text_p": "PROBABILIDADE", "text_q": "POSSIBILIDADE"}'

License: Dual Licensing

This project is licensed under a Dual License model:

  1. Personal & Non-Commercial Use: Licensed under the MIT License. You are free to use, modify, and distribute this software for personal queries, research, or open-source projects.
  2. Commercial Use: When production ready, this project is intended for commercial use besides research. For any commercial application, proprietary software integration, or deployed services generating revenue, a separate Commercial License is required.

Please contact the author for commercial licensing inquiries

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors