Foundations Core: Information Theory & Neurosymbolic NLP

** Status: Incipient Personal Research Project** This repository hosts experimental code for researching information theory applications in Natural Language Processing (NLP), attempting to bridge the gap between classical statistical methods (Shannon/Markov) and modern Neurosymbolic AI.

Project & Vision

For now it's still a simple scaffolding. The goal is to build a high-performance backend (using Common Lisp) that provides foundational metrics for text analysis. The vision is driven by construct 'ald-fashined' (so to speak) like classical statistical methods such as precise entropy, divergence metrics, and several others NLP tools, and nkowledge from causality (SCM, others), graph theory, and linguistics formalizations theories as pipes connecting local-running small LLM models (sLLM).

By integrating this approach we hope to steer LLM generation, like hallucination detection, and knowledge extraction from texts.

Research Roadmap

This is a living document of the research trajectory.

Phase 1: Statistical Foundations (Current)

Shannon Entropy: Measure raw information content of text sequences.
Markov Entropy: Measure predictability of text given order-$N$ history.
Kullback-Leibler Divergence: Quantify "surprise" or information loss when approximating one text distribution with another.
High-Performance Backend: Implement core logic in SBCL Typescript/Common Lisp with an HTTP API.

Phase 2: Visual Workbench (Frontend) (Next)

Web Interface: Build a modern dashboard using Next.js + ShadcnUI.
Interactive Playground: Real-time entropy calculation as the user types.
Visualizations:
- Heatmaps: Color-code words based on their "surprise" (entropy contribution).
- Transition Graphs: Visualize Markov chains using generic-graph-adapter/Mermaid.
- Divergence Plot: Compare two texts side-by-side visually (KL-Divergence).

Phase 3: Advanced NLP Experiments

Already under fully development into another project to be ported here

Dynamic N-gram Analysis: Detection of phrase boundaries using entropy spikes (Perplexity).
Syntactic Entropy: Measuring entropy over Part-of-Speech tags rather than raw tokens.
Text Classification: Zero-shot classification using compression-based distances (NCD) or KL-Divergence.

Phase 4: Local Small Language Models (sLLM)

Local Inference: Integrate bindings for local inference (llama.cpp or similar) to run models like Phi-3, Gemma-2B, or TinyLlama locally.
Neurosymbolic Grounding: Use the calculated Entropy/KL metrics to "vet" or "rank" generations from the sLLM.
Constrained Decoding: Use grammar-constrained decoding (GBNF) driven by Lisp logic.

Phase 5: Neurosymbolic Lisp Middleware (Orchestrator)

Proactive Semantic Engine: Middleware layer combining classical semantic embeddings with symbolic Lisp logic.
Macro-Driven Inference: Utilize Lisp's homoiconicity to automatically expand captured knowledge into executable inference rules via macros.
Deterministic Grounding: Provide sLLMs with rigid tool calls, persistent memory, and boolean logic validation to prevent hallucinations.
Knowledge Extraction: Automated conversion of unstructured text into symbolic knowledge graphs (S-expressions).

Phase 6: Optimization & DSPy

Prompt Engineering as Code: Adopt DSPy concepts to optimize prompts programmatically.
Teleprompter Implementation: Build a Lisp-based optimizer that attempts to "compile" vague intents into optimal prompts by measuring metric improvements.
Fine-tuning: Finetune sLLMs on "high-entropy" synthetic data generated to maximize reasoning capabilities.

Usage

Dependencies

SBCL (Steel Bank Common Lisp)
Quicklisp
libev (via Homebrew/apt)

Running the Server

Start the interactive REPL and load the system:

;; In SBCL REPL
(ql:quickload :foundations-core)
(foundations:start :port 8080)

Testing the API

Entropy:

curl -X POST http://127.0.0.1:8080/api/science/entropy \
     -H "Content-Type: application/json" \
     -d '{"text": "BANANA", "order": 1}'

KL-Divergence (Information Loss):

curl -X POST http://127.0.0.1:8080/api/science/divergence \
     -H "Content-Type: application/json" \
     -d '{"text_p": "PROBABILIDADE", "text_q": "POSSIBILIDADE"}'

License: Dual Licensing

This project is licensed under a Dual License model:

Personal & Non-Commercial Use: Licensed under the MIT License. You are free to use, modify, and distribute this software for personal queries, research, or open-source projects.
Commercial Use: When production ready, this project is intended for commercial use besides research. For any commercial application, proprietary software integration, or deployed services generating revenue, a separate Commercial License is required.

Please contact the author for commercial licensing inquiries

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
foundations-core		foundations-core
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundations Core: Information Theory & Neurosymbolic NLP

Project & Vision

Research Roadmap

Phase 1: Statistical Foundations (Current)

Phase 2: Visual Workbench (Frontend) (Next)

Phase 3: Advanced NLP Experiments

Phase 4: Local Small Language Models (sLLM)

Phase 5: Neurosymbolic Lisp Middleware (Orchestrator)

Phase 6: Optimization & DSPy

Usage

Dependencies

Running the Server

Testing the API

License: Dual Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Foundations Core: Information Theory & Neurosymbolic NLP

Project & Vision

Research Roadmap

Phase 1: Statistical Foundations (Current)

Phase 2: Visual Workbench (Frontend) (Next)

Phase 3: Advanced NLP Experiments

Phase 4: Local Small Language Models (sLLM)

Phase 5: Neurosymbolic Lisp Middleware (Orchestrator)

Phase 6: Optimization & DSPy

Usage

Dependencies

Running the Server

Testing the API

License: Dual Licensing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages