Skip to content

IdeaFlowCo/graph-fs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphFS — Graph Database-Backed Filesystem

Replace the macOS filesystem with a graph database, so every file is a node, every directory is a relationship, and AI agents can query the filesystem semantically.

The Idea

Instead of hierarchical POSIX paths, store files as nodes in a graph database. Apps still see a normal filesystem (via FUSE or File Provider), but the underlying storage is a graph — enabling semantic search, relationship queries, and AI-native file operations.

Traditional:  /projects/noos/src/index.ts
Graph:        (:Directory {name:'projects'})-[:CONTAINS]->(:Directory {name:'noos'})-[:CONTAINS]->(:File {name:'index.ts'})
              Plus: (:File {name:'index.ts'})-[:IMPORTS]->(:File {name:'db.ts'})
              Plus: (:File {name:'index.ts'})-[:ABOUT]->(:Topic {name:'API server'})

How macOS File Access Works (Why This Is Hard)

Every app on macOS — GUI or CLI — accesses files through the same stack:

GUI Apps (Finder, Preview, VS Code...)
    ↓
Frameworks (NSFileManager, Swift FileManager, libc fopen/fread)
    ↓
System Calls (open, read, write, stat, readdir - POSIX)
    ↓
VFS (Virtual File System layer in the kernel)
    ↓
Actual filesystem driver (APFS, HFS+, NFS, FUSE...)

You can't just alias ls and cat — that only intercepts shell commands. GUI apps call open()/read() directly via system calls. To intercept everything, you need to plug in at the VFS layer (FUSE) or use Apple's File Provider API.

Why Shell Aliases Don't Work for Apps

Approach Works for shell? Works for GUI apps? Works for Python/Node/etc?
Alias ls/cat Yes No No
FUSE mount Yes Yes Yes
File Provider Yes (via Finder) Yes Yes
DYLD_INSERT_LIBRARIES Unsigned only Blocked by SIP Unsigned only

Recommended Approach: Hybrid with Normal Filesystem

For daily use alongside the regular macOS filesystem, Apple File Provider Extension is the best path:

  • Appears in Finder sidebar alongside regular folders (like iCloud Drive, Dropbox)
  • No kernel extension needed
  • Works with all apps via Finder, Open/Save dialogs
  • Apple-maintained API

For prototyping, FUSE (via macFUSE) is fastest to iterate on. The core graph logic transfers directly to File Provider later.

Quick Start

cd ~/code/graph-fs
python3 -m venv venv
source venv/bin/activate
pip install fusepy

# Create files and directories in the graph
python graphfs_cli.py add /projects --dir
python graphfs_cli.py add /projects/myapp --dir
python graphfs_cli.py add /projects/myapp/main.py -c "print('hello')" -t "python,code"
python graphfs_cli.py add /projects/myapp/readme.md -c "# My App" -t "markdown,docs"

# List contents
python graphfs_cli.py ls /
python graphfs_cli.py ls /projects/myapp

# The graph part — create relationships between files
python graphfs_cli.py link /projects/myapp/main.py /projects/myapp/readme.md DOCUMENTS
python graphfs_cli.py related /projects/myapp/main.py

# Tag and search
python graphfs_cli.py tag /projects/myapp/main.py "backend,api"
python graphfs_cli.py search --tag python
python graphfs_cli.py search "main"

# Import a real directory from disk into the graph
python graphfs_cli.py import ~/code/some-project /code/some-project

# Show node details with content and relationships
python graphfs_cli.py show /projects/myapp/main.py --content

# Raw SQL for power queries
python graphfs_cli.py query "SELECT path, tags FROM nodes WHERE tags LIKE '%python%'"

# Stats
python graphfs_cli.py stats

FUSE Mount (optional — requires macFUSE)

# One-time setup:
brew install macfuse
# → Approve kernel extension in System Settings > Privacy & Security
# → Reboot

# Mount the graph filesystem
python graphfs_fuse.py /Volumes/GraphFS

# Now ALL apps see it:
ls /Volumes/GraphFS
cat /Volumes/GraphFS/projects/myapp/main.py
open /Volumes/GraphFS  # Opens in Finder

# Unmount
umount /Volumes/GraphFS

Research & Prior Art

AgentFS (Turso Database)

  • URL: https://github.com/tursodatabase/agentfs
  • Blog: https://turso.tech/blog/agentfs
  • What: Agent-specific filesystem on top of SQLite. Each agent gets a .db file containing a virtual filesystem, key-value store, and tool call audit trail.
  • Storage: SQLite with dentry (paths), inode (content/metadata), KV, and toolcall tables
  • Mount: FUSE on Linux, NFS on macOS
  • SDKs: TypeScript, Python, Rust, CLI
  • Key insight: "Each filesystem is a SQLite file, allowing you to store billions on any media" — avoids the problem of network filesystems needing one mount per agent
  • Limitation: Not a graph database. Traditional hierarchical paths stored in SQLite. No relationship queries.

GDBFS (USTC Academic Project)

  • URL: https://github.com/USTC-OS-group33/GDBFS
  • What: FUSE filesystem using Neo4j (graph) + MongoDB (content storage)
  • Language: Python (fusepy)
  • Architecture: Neo4j handles directory tree relationships, MongoDB stores file content/metadata
  • Status: Academic proof-of-concept, no performance data published
  • Key insight: Splitting graph structure (Neo4j) from blob storage (MongoDB) is practical — graph DBs aren't great at storing large binary blobs

Relevant Graph Databases

Database Type Speed vs Neo4j Embedded? Language Notes
Kuzu Property graph 18-188x faster Yes (in-process) C++ "DuckDB for graphs". Implements Cypher. Vector search built in. Being archived — team working on something new.
Neo4j Property graph Baseline No (server) Java Most popular, mature ecosystem. Server overhead = latency per syscall.
Memgraph Property graph ~10x faster No (server, in-memory) C++ Neo4j-compatible Cypher. All data in RAM = fast but memory-limited.
SQLite Relational N/A Yes C Not a graph DB but proven for filesystem metadata. What AgentFS chose. Can model graphs with adjacency tables.
SurrealDB Multi-model Varies Embeddable Rust Graph + document + relational. Can run embedded or as server.
DGraph Distributed graph Fast at scale No (server) Go GraphQL native. Overkill for local filesystem.

Key Performance Insight

For a FUSE filesystem, embedded databases dominate because:

  • A single ls -la triggers dozens of stat() calls
  • Each stat() = one round-trip to the DB
  • Network round-trip to Neo4j (~1-5ms) vs embedded Kuzu/SQLite (~0.01ms) = 100-500x difference
  • At filesystem scale (thousands of ops/sec), this is the difference between usable and unusable

Architecture Options

Option A: Pure Graph (Kuzu/Neo4j)

App → FUSE → Kuzu (embedded) → graph storage
  • Every POSIX op maps to a Cypher query
  • Clean model but graph DBs aren't optimized for blob storage
  • Best for: metadata-heavy, small files, AI-native queries

Option B: Hybrid (Graph + Blob Store) — Recommended

App → FUSE → GraphFS daemon → Kuzu (metadata/relationships) + local files (content)
  • Graph stores: paths, relationships, semantic tags, permissions, timestamps
  • Local disk/SQLite stores: actual file content (binary blobs)
  • Best of both worlds: graph queries for structure, fast I/O for content
  • This is essentially what GDBFS did (Neo4j + MongoDB)

Option C: SQLite with Graph Schema (AgentFS approach)

App → FUSE → SQLite (adjacency list tables)
  • Model graph relationships in relational tables
  • Proven, fast, zero-dependency
  • Loses native graph query expressiveness (no Cypher, manual JOINs for traversals)

macOS Integration Paths

1. FUSE via macFUSE

  • How: Install macFUSE, write a FUSE driver
  • Pros: ALL apps see it as a real filesystem
  • Cons: Requires kernel extension, Apple tightening security, may need SIP adjustment on Apple Silicon
  • Mount point: /Volumes/GraphFS or similar
  • Libraries: Go (bazil.org/fuse, hanwen/go-fuse), Rust (fuser), Python (fusepy)

2. Apple File Provider Extension

  • How: NSFileProviderExtension / FileProvider framework
  • Pros: Apple-blessed, no kext needed, Finder integration
  • Cons: Scoped to a "domain" (sidebar item), not full filesystem replacement
  • Used by: iCloud Drive, Dropbox, OneDrive
  • Best for: Making GraphFS appear as a Finder location without FUSE

3. NFS Loopback (AgentFS approach on macOS)

  • How: Run local NFS server, mount via mount_nfs
  • Pros: No kernel extension needed, well-supported
  • Cons: NFS overhead, less flexible than FUSE

Data Model

(:Directory {
  name: String,
  path: String,        // full POSIX path for fast lookup
  mode: Int,           // unix permissions
  uid: Int, gid: Int,
  atime: DateTime,
  mtime: DateTime,
  ctime: DateTime
})

(:File {
  name: String,
  path: String,
  content_ref: String, // pointer to blob storage
  size: Int,
  mode: Int,
  uid: Int, gid: Int,
  atime: DateTime,
  mtime: DateTime,
  ctime: DateTime
})

// Core relationships
(:Directory)-[:CONTAINS]->(:File|:Directory)
(:File)-[:LINKS_TO]->(:File)              // symlinks

// AI-native relationships (the killer feature)
(:File)-[:IMPORTS]->(:File)               // code dependencies
(:File)-[:ABOUT]->(:Topic)                // semantic tags
(:File)-[:RELATED_TO]->(:File)            // AI-discovered similarity
(:File)-[:CREATED_BY]->(:Agent)           // provenance tracking
(:File)-[:VERSION_OF]->(:File)            // version history as graph

MVP Plan

  1. Phase 1: FUSE mount with Kuzu (or SQLite fallback) for basic POSIX ops (open, read, write, stat, readdir, mkdir, unlink)
  2. Phase 2: Add semantic relationships (imports, topics, similarity)
  3. Phase 3: AI agent integration — agents can query the filesystem with Cypher instead of find/grep
  4. Phase 4: macOS File Provider extension for native Finder integration

Open Questions

  • Should file content live in the graph or in a separate blob store?
  • How to handle filesystem events (fswatch equivalent) for real-time graph updates?
  • Can we intercept Spotlight queries and route them through the graph?
  • What's the right caching strategy to make FUSE performance acceptable?
  • How to handle the Kuzu archival — fork it, or use SQLite with graph extensions?

References

About

Graph database-backed filesystem — FUSE + SQLite graph backend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors