-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add GVision OCR engine support [NOT MERGEABLE due to unclear licensing] #1529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for this. The code looks good at a glance. There is one serious problem you will need to address. It seems that you copied code from this repository Neither of these repositories contain a license file, so the code must be considered "all rights reserved" by the creator and cannot be copied. We will need these two people, kkrell2016 and grantbarrett, to license their code under an open source license that is compatible with OCRmyPDF's MPL 2.0 license. MIT, MPL 2.0, Apache 2.0, BSD would all be fine. It would be best if they can create a license file for their repositories indicating this. Otherwise, the code you are contributing is not something you own, so it cannot become part of OCRmyPDF or any other open source project. I will also need you to license your changes to their work under a compatible license. I do look forward to being able to accept this code, but I cannot accept it until the licensing is clarified. On a more minor note, I would prefer to gvision an optional dependency so that the user is not required to install all Google Cloud if they are not going to use it. That's fairly minor compared to the licensing problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing this plugin to my home grown one... I'm simply using
from google.cloud import documentai
from google.cloud.documentai_toolbox import document
....
wrapped_document = document.Document.from_documentai_document(doc)
hocr_string = wrapped_document.export_hocr_str(title='brieftech ocr')
Is there any difference in this custom gcv2hocr2 vs what Google team maintain themselves?
|
Just add some additional commentary - firstly thank you for this PR as it helped identify an important bug in my own plug-in (DPI detection specfically) Licensing aside, the other issue is that this is using the Google Cloud Vision API, whereas the Google Document AI API is considered to be superior for OCR purposes. Also, if I'm reading the code right, this plugin is trying to compose the target PDF itself, rather than using ocrmypdf's hocrtransform.HocrTransform. I do have my own gvision.py plug-in that I've been using in production a long time on thousands of documents. I'm happy to share but it doesn't come with tests and it has some hardcoding to GCP resources that would need to clean up. If there is interest, I'm willing to tidy it up and contribute back either as a gist or a single file PR. Just ping me. |
|
@glorat When those changes are merged I'll make a note to ping you because it should simplify integration. |
Purpose
This pull request adds support for a new OCR engine using Google Cloud Vision API (GCV) to OCRmyPDF. It provides an alternative to Tesseract for cloud-based, high-accuracy OCR, especially useful for multilingual documents and complex image layouts.
Implementation Details
GVisionOcrEngineinocrmypdf.builtin_plugins.gvisionGOOGLE_APPLICATION_CREDENTIALSenvironment variableRequirements
Added
google-cloud-visiondependency inpyproject.tomlGoogle Cloud credentials JSON file
Set environment variable:
On Windows (PowerShell):
On Linux/macOS (bash):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/credentials.json