Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529

aka964 · 2025-05-17T16:05:55Z

Purpose

This pull request adds support for a new OCR engine using Google Cloud Vision API (GCV) to OCRmyPDF. It provides an alternative to Tesseract for cloud-based, high-accuracy OCR, especially useful for multilingual documents and complex image layouts.

Implementation Details

Added GVisionOcrEngine in ocrmypdf.builtin_plugins.gvision
Checks availability using the GOOGLE_APPLICATION_CREDENTIALS environment variable
Supports HOCR and plain text output
Includes unit tests for:
- Engine initialization
- HOCR generation

Requirements

Added google-cloud-vision dependency in pyproject.toml
Google Cloud credentials JSON file

Set environment variable:

On Windows (PowerShell):

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\path\to\your\credentials.json"

On Linux/macOS (bash):

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials.json

jbarlow83 · 2025-05-27T08:13:38Z

Thank you for this. The code looks good at a glance.

There is one serious problem you will need to address.

It seems that you copied code from this repository
https://github.com/grantbarrett/son-of-ocrmypdf_plugin_GoogleVision
which in turn forked this repository
https://github.com/kkrell2016/ocrmypdf_plugin_GoogleVision

Neither of these repositories contain a license file, so the code must be considered "all rights reserved" by the creator and cannot be copied. We will need these two people, kkrell2016 and grantbarrett, to license their code under an open source license that is compatible with OCRmyPDF's MPL 2.0 license. MIT, MPL 2.0, Apache 2.0, BSD would all be fine. It would be best if they can create a license file for their repositories indicating this.

Otherwise, the code you are contributing is not something you own, so it cannot become part of OCRmyPDF or any other open source project. I will also need you to license your changes to their work under a compatible license.

I do look forward to being able to accept this code, but I cannot accept it until the licensing is clarified.

On a more minor note, I would prefer to gvision an optional dependency so that the user is not required to install all Google Cloud if they are not going to use it. That's fairly minor compared to the licensing problem.

glorat · 2025-12-30T02:40:38Z

src/ocrmypdf/builtin_plugins/gcv2hocr2.py

Comparing this plugin to my home grown one... I'm simply using

from google.cloud import documentai from google.cloud.documentai_toolbox import document .... wrapped_document = document.Document.from_documentai_document(doc) hocr_string = wrapped_document.export_hocr_str(title='brieftech ocr')

Is there any difference in this custom gcv2hocr2 vs what Google team maintain themselves?

glorat · 2026-01-05T12:48:14Z

Just add some additional commentary - firstly thank you for this PR as it helped identify an important bug in my own plug-in (DPI detection specfically)

Licensing aside, the other issue is that this is using the Google Cloud Vision API, whereas the Google Document AI API is considered to be superior for OCR purposes.

Also, if I'm reading the code right, this plugin is trying to compose the target PDF itself, rather than using ocrmypdf's hocrtransform.HocrTransform.

I do have my own gvision.py plug-in that I've been using in production a long time on thousands of documents. I'm happy to share but it doesn't come with tests and it has some hardcoding to GCP resources that would need to clean up. If there is interest, I'm willing to tidy it up and contribute back either as a gist or a single file PR. Just ping me.

jbarlow83 · 2026-01-05T19:50:31Z

@glorat
I do have a better solution that will allow alternate OCR engines to generate some simple Python dataclasses that express the OCR to render, rather than having to directly compose hOCR, but it's not published yet because of other dependencies not being ready. (Essentially, hocrtranform is going to be refactored into "hocr -> element tree" and then "element tree -> PDF".

When those changes are merged I'll make a note to ping you because it should simplify integration.

Add GVision OCR engine support

b45a1d3

jbarlow83 added the third party issue Problem with a third party dependency label May 27, 2025

jbarlow83 changed the title ~~Add GVision OCR engine support~~ May 27, 2025

clach04 mentioned this pull request Sep 6, 2025

License? kkrell2016/ocrmypdf_plugin_GoogleVision#1

Open

glorat reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529

Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529

Uh oh!

aka964 commented May 17, 2025

jbarlow83 commented May 27, 2025 •

edited

Loading

glorat Dec 30, 2025

glorat commented Jan 5, 2026

jbarlow83 commented Jan 5, 2026

Labels

3 participants

Uh oh!

Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529

Are you sure you want to change the base?

Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529

Uh oh!

Conversation

aka964 commented May 17, 2025

Purpose

Implementation Details

Requirements

jbarlow83 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

glorat Dec 30, 2025

Choose a reason for hiding this comment

glorat commented Jan 5, 2026

jbarlow83 commented Jan 5, 2026

Labels

3 participants

jbarlow83 commented May 27, 2025 •

edited

Loading