Skip to content

Conversation

@PeterStaar-IBM
Copy link
Member

@PeterStaar-IBM PeterStaar-IBM commented Jan 6, 2026

Summary

  • Add a new chart extraction enrichment model powered by https://huggingface.co/ibm-granite/granite-vision-3.3-2b-chart2csv-preview that converts bar, pie, and line charts into structured tabular (CSV) data
  • Expose chart extraction via a new --enrich-chart-extraction CLI flag and do_chart_extraction pipeline option
  • Fix a type-check bug in DocumentPictureClassifier where item.meta was accessed without verifying it was a PictureMeta instance

Details

The new ChartExtractionModelGraniteVision enrichment model:

  • Runs after picture classification and processes pictures classified as bar_chart, pie_chart, or line_chart
  • Uses the Granite Vision model to generate CSV output from chart images, then parses it into TableData stored in PictureMeta.tabular_chart
  • Supports batched inference with left-padded inputs
  • Automatically enables picture classification when chart extraction is turned on, since classification results are needed to identify chart types

Test plan

  • Run conversion with --enrich-chart-extraction on documents containing bar, pie, and line charts
  • Verify that tabular_chart metadata is populated on chart PictureItems
  • Verify non-chart pictures are unaffected
  • Confirm the picture classifier type-check fix doesn't regress existing classification behavior

Examples

this page,

Screenshot 2026-01-30 at 13 51 02

gets transformed to,

Screenshot 2026-01-30 at 13 52 27
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@mergify
Copy link

mergify bot commented Jan 6, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

DCO Check Passed

Thanks @PeterStaar-IBM, all your commits are properly signed off. 🎉

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 28.46715% with 98 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...g/models/stages/chart_extraction/granite_vision.py 26.51% 97 Missing ⚠️
docling/pipeline/base_pipeline.py 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@PeterStaar-IBM PeterStaar-IBM marked this pull request as ready for review January 6, 2026 11:23
@PeterStaar-IBM PeterStaar-IBM requested a review from cau-git January 6, 2026 11:23
@dosubot
Copy link

dosubot bot commented Jan 6, 2026

Related Documentation

Checked 10 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Member

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good here, we only have to cleanup now.

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@Ryzhtus
Copy link

Ryzhtus commented Jan 16, 2026

Hi! Is there any help needed? I like this feature and believe it could significantly improve my project's data processing pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Copy link
Member

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants