PullMdLoader
Loader for converting URLs into Markdown using the pull.md service.
This package implements a document loader for web content. Unlike traditional web scrapers, PullMdLoader can handle web pages built with dynamic JavaScript frameworks like React, Angular, or Vue.js, converting them into Markdown without local rendering.
Overview
Integration details
Class | Package | Local | Serializable | JS Support |
---|---|---|---|---|
PullMdLoader | langchain-pull-md | ✅ | ✅ | ❌ |
Setup
Installation
pip install langchain-pull-md
Initialization
from langchain_pull_md.markdown_loader import PullMdLoader
# Instantiate the loader with a URL
loader = PullMdLoader(url="https://example.com")
Load
documents = loader.load()
documents[0].metadata
{'source': 'https://example.com',
'page_content': '# Example Domain\nThis domain is used for illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.'}
Lazy Load
No lazy loading is implemented. PullMdLoader
performs a real-time conversion of the provided URL into Markdown format whenever the load
method is called.
API reference:
Related
- Document loader conceptual guide
- Document loader how-to guides