Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
-
Updated
Jul 1, 2026 - Python
A parser turns its input (often text in form of a file) into a more advantageous representation (usually a certain data structure in memory) to perform a specific task.
Common examples include:
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
Repair malformed JSON from LLMs, APIs, logs, and user input in Python.
oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
Type-safe YAML parser and validator.
Wiktionary dump file parser and multilingual data extractor
A simple resume parser used for extracting information from resumes