Skip to content

Conversation

@Deepaksaini00
Copy link

Summary

This PR improves the PyMuPDF4LLM documentation around OCR support. It clarifies how OCR actually works, what dependencies are required, and highlights an important import-order requirement that isn’t currently obvious from the docs.

Key points

  • Documented that pymupdf.layout must be imported before pymupdf4llm for OCR heuristics to run
  • Added an explanation of when and how OCR is triggered (image-only pages, unreadable text, partial OCR)
  • Listed the full set of requirements (layout import, use_ocr, Tesseract, OpenCV)
  • Added a small decision-flow diagram to make the behavior easier to understand

This is a documentation-only update — no code or runtime behavior is changed.

Fixes #4833

@github-actions
Copy link
Contributor


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@Deepaksaini00
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@Deepaksaini00
Copy link
Author

recheck

@Deepaksaini00
Copy link
Author

@JorjMcKie Could you please review this PR?

@JorjMcKie
Copy link
Collaborator

Some general comments:

  1. The OpenCV dependency has been removed. The corresponding checks are being done using the batteries already included.
  2. We are in the process of re-configuring the components across our products pymupdf4llm, pymupdf and the layout plugin.

These activities will of course entail major documentation adjustments. Therefore, I'm afraid that your PR will become obsolete within a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants