Multi-modal example: ChartQA #379

totoluo · 2025-12-07T05:18:21Z

Added example of Chartqa agent
Added position id conversion for mrope
Added image/path message format for OpenAI-compatible MLLMs API.

totoluo · 2025-12-07T05:25:19Z

@microsoft-github-policy-service agree

ultmaster

Thanks for your contribution to this project! I have several minor comments, hoping to be fixed before we get this landed on main.

Generally I want this PR to be merged quickly and we can collaborate to add more tests, refine the documents later.

ultmaster · 2025-12-08T01:46:31Z

agentlightning/verl/multimodal_utils.py

@@ -0,0 +1,229 @@
+"""Utilities for multimodal model support in VERL training.


Are these functions not importable directly from VERL?

get_rope_index and processor are imported directly from verl and the current compute_mrope_position_ids get_image_grid_thw are just wrappers that are compatible with agl AgentModeDaemon. In terms of get_image_grid_thw, verl process images in dataset getitem and does not return it explicitly. Seems agl does get_train_data_batch from rollout ids in AgentModeDaemon so we need to load image and get them from the original sample. We can override the currrent dataset __getitem__ alternatively but seems like a bigger change.
However, the description on this file is redundant and I'll refactor and make them look simpler, and move them to daemon.py because they are only used by the daemon.

Hey @ultmaster, I have updated the branch based on all the comments, appreciate any reviews!

ultmaster · 2025-12-08T01:47:07Z

examples/chartqa/README.md

+
+```bash
+uv pip install datasets pillow pandas pyarrow nest_asyncio
+uv pip install "langgraph<1.0" "langchain[openai]<1.0" "langchain-community"


We have upgraded langchain to 1.0+ on main. Please upgrade your langchain accordingly.

ultmaster · 2025-12-08T01:47:34Z

examples/chartqa/README.md

@@ -0,0 +1,100 @@
+# ChartQA Example
+
+[![chartqa CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-chartqa.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-chartqa.yml)


Let's remove this until we really have a CI for this.

ultmaster · 2025-12-08T01:48:31Z

pyproject.toml

 apo = [
  "poml",
 ]
+multimodal = [


please upload new uv.lock

ultmaster · 2025-12-08T01:49:34Z

examples/chartqa/download_chartqa.sh

@@ -0,0 +1,6 @@
+#!/bin/bash


Blackbox one-click scripts are bad and hard to debug. Remove this script and add clear instructions in README.

ultmaster · 2025-12-09T06:43:47Z

@totoluo The code looks generally good now. I think I'll test it myself before I get it merged.

Could you tell me the lowest resource requirements to run this example?

totoluo · 2025-12-09T09:04:43Z

@totoluo The code looks generally good now. I think I'll test it myself before I get it merged.

Could you tell me the lowest resource requirements to run this example?

Hi @ultmaster, I used 2 GPU with 40G memory. Additionally, I used vllm==0.10.2 because I met the flash attention issue mentioned in pyproject.toml with vllm 0.11.0. This was the only glitch I hit when testing in local.

ultmaster · 2025-12-10T11:01:12Z

@totoluo I'm testing with vllm 0.11.0 (I know 0.11.1+ won't work); but I encountered the following error when running the following command:

uv run --no-sync vllm serve Qwen/Qwen2-VL-2B-Instruct \
    --gpu-memory-utilization 0.6 \
    --max-model-len 4096 \
    --allowed-local-media-path /home/xxx/Projects/agl-second-bench/examples/chartqa/data \
    --enable-prefix-caching \
    --port 8088

(EngineCore_DP0 pid=257550) INFO 12-10 18:44:41 [default_loader.py:267] Loading weights took 0.45 seconds
(EngineCore_DP0 pid=257550) INFO 12-10 18:44:41 [gpu_model_runner.py:2653] Model loading took 4.1513 GiB and 2.411431 seconds
(EngineCore_DP0 pid=257550) INFO 12-10 18:44:41 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     self.model_runner.profile_run()
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     self.model.get_multimodal_embeddings(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1458, in get_multimodal_embeddings
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     vision_embeddings = self._process_image_input(image_input)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1385, in _process_image_input
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     image_embeds = self.visual(pixel_values,
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]                    ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 739, in forward
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     x = blk(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]         ^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 489, in forward
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     x = x + self.attn(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]             ^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_vl.py", line 398, in forward
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     output = flash_attn_varlen_func(q,
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 1443, in flash_attn_varlen_func
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return FlashAttnVarlenFunc.apply(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/autograd/function.py", line 576, in apply
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return super().apply(*args, **kwargs)  # type: ignore[misc]
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 925, in forward
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     out_padded, softmax_lse, S_dmask, rng_state = _wrapped_flash_attn_varlen_forward(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return self._op(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 344, in backend_impl
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     result = self._backend_fns[device_type](*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return disable_fn(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 929, in _fn
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 377, in wrapped_fn
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     return fn(*args, **kwargs)
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]   File "/home/xxx/Projects/agl-second-bench/.venv/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 165, in _flash_attn_varlen_forward
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]     out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708]                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=257550) ERROR 12-10 18:44:42 [core.py:708] RuntimeError: cu_seqlens_q must be on CUDA

Have you tried vllm 0.11.0? Have you seen this before?

It seems to be related to this issue: vllm-project/vllm#27340

I don't want to downgrade to vllm 0.10.2, because previously it is not working well with verl 0.6.0.

ultmaster · 2025-12-10T11:02:03Z

I also want you to know that I've done some minor changes to your code to fix some lint issues and prepare for the CI. But I didn't manage to run the example code yet.

totoluo · 2025-12-10T11:47:16Z

cu_seqlens_q must be on CUDA

Hello @ultmaster, I also met this exact same issue. It's related to vllm-project/vllm#25926 and the fix has been merged in vllm-project/vllm#26219. However the commit has only been included in 0.11.1, 0.11.2 and 0.12.0 and these clashed with the requirement of group torch-stable. Seems the only compatible version is 0.10.2 or we need to use different vllm versions to run different examples in the project. Should I update the dependency group? What would you suggest we do here?

…support-issue-105

ultmaster · 2025-12-11T17:36:31Z

Hi @totoluo Thanks for the update. I tested the code on one A100 and I can confirm that the code passes sanity check and runs successfully: https://github.com/microsoft/agent-lightning/actions/runs/20140706916/job/57808847154

I have one question though: why is all batch size, train batch size, logprob batch size, mini batch size all set to 1? Have you tried a more realistic setting? Is the training successful in terms of accuracy on ChartQA dataset?

…ure/multimodal-support-issue-105

ultmaster · 2025-12-12T07:39:48Z

/ci

github-actions · 2025-12-12T07:39:58Z

🚀 CI Watcher for correlation id-3645291658-mj2k48tj triggered by comment 3645291658
🏃‍♀️ Tracking 2 workflow run(s):

🔴 Calc-X - PR #379 - ci-calc-x - id-3645291658-mj2k48tj — completed/failure
🟢 Spider - PR #379 - ci-spider - id-3645291658-mj2k48tj — completed/success

✅ All runs completed.

Copilot

Pull request overview

This PR adds comprehensive multimodal support to AgentLightning by introducing a ChartQA example demonstrating visual reasoning with LangGraph. The implementation includes M-RoPE position embedding support for Qwen2-VL models and image handling utilities for OpenAI-compatible MLLM APIs.

Key changes:

New ChartQA example with multi-step reasoning workflow (observe → extract → calculate → check → refine)
Core framework enhancements for M-RoPE position IDs computation in VERL training pipeline
Image processing utilities supporting both base64 encoding (cloud APIs) and file:// URLs (local vLLM)

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
uv.lock	Added image-related dependencies (av, qwen-vl-utils, datasets, pandas, pillow, pyarrow)
pyproject.toml	Defined new "image" dependency group for multimodal support
examples/chartqa/train_chartqa_agent.py	Training orchestration with VERL algorithm and configurable hyperparameters
examples/chartqa/prompts.py	LangChain prompt templates for the multi-step reasoning workflow
examples/chartqa/prepare_data.py	Dataset download and preprocessing script for HuggingFace ChartQA
examples/chartqa/multimodal_utils.py	Image encoding utilities for base64 conversion and message formatting
examples/chartqa/env_var.py	Environment variable configuration for ChartQA paths and API settings
examples/chartqa/debug_chartqa_agent.py	Debugging utilities for testing with cloud APIs or local vLLM proxy
examples/chartqa/chartqa_agent.py	Main agent implementation with LangGraph state machine and evaluation logic
examples/chartqa/README.md	Comprehensive documentation with setup and usage instructions
agentlightning/verl/trainer.py	Passes processor and image_base_dir to daemon for M-RoPE support
agentlightning/verl/daemon.py	Implements M-RoPE position_ids computation and image grid handling for Qwen2-VL
.github/workflows/examples-rag.yml	Fixed artifact naming consistency
.github/workflows/examples-chartqa.yml	New CI workflow for ChartQA example testing and training validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T07:42:44Z

examples/chartqa/env_var.py

+
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "token-abc123")
+
+OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")


The model name "gpt-4.1-mini" appears to be incorrect. OpenAI's current models are named "gpt-4o-mini", "gpt-4o", etc. The ".1" versioning pattern is not standard for OpenAI models. This should likely be "gpt-4o-mini" or another valid model identifier.

Suggested change

OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")

OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o-mini")

Copilot · 2025-12-12T07:42:45Z

examples/chartqa/chartqa_agent.py

+            if "THE ANSWER IS CORRECT" in last_message.content:  # type: ignore
+                if "THE ANSWER IS INCORRECT" in last_message.content:  # type: ignore
+                    correct_index = last_message.content.rfind("THE ANSWER IS CORRECT")  # type: ignore
+                    incorrect_index = last_message.content.rfind("THE ANSWER IS INCORRECT")  # type: ignore
+                    if correct_index > incorrect_index:
+                        return END
+                else:
+                    return END


The condition logic appears incorrect. When both "CORRECT" and "INCORRECT" are present in the message, the code checks if "CORRECT" appears after "INCORRECT" to decide whether to end. However, this doesn't handle the edge case where neither substring appears, which would still fall through to checking turn limits. More critically, if only "INCORRECT" appears without "CORRECT", the code would incorrectly continue to check turn limits rather than explicitly routing to refinement. Consider restructuring to check for "INCORRECT" first, then "CORRECT", with explicit handling for each case.

Suggested change

if "THE ANSWER IS CORRECT" in last_message.content: # type: ignore

if "THE ANSWER IS INCORRECT" in last_message.content: # type: ignore

correct_index = last_message.content.rfind("THE ANSWER IS CORRECT") # type: ignore

incorrect_index = last_message.content.rfind("THE ANSWER IS INCORRECT") # type: ignore

if correct_index > incorrect_index:

return END

else:

return END

content = last_message.content # type: ignore

has_correct = "THE ANSWER IS CORRECT" in content

has_incorrect = "THE ANSWER IS INCORRECT" in content

if has_incorrect and has_correct:

# If both are present, decide based on which comes last

correct_index = content.rfind("THE ANSWER IS CORRECT")

incorrect_index = content.rfind("THE ANSWER IS INCORRECT")

if correct_index > incorrect_index:

return END

else:

return "refine_answer"

elif has_incorrect:

return "refine_answer"

elif has_correct:

return END

Copilot · 2025-12-12T07:42:45Z

examples/chartqa/README.md

+USE_LLM_PROXY=1 \
+    OPENAI_API_BASE=http://localhost:8088/v1 \
+    OPENAI_MODEL=Qwen/Qwen2-VL-2B-Instruct \
+    python chartqa_agent.py


The comment in line 87 says to run "python chartqa_agent.py" but the correct command based on the file structure should be "python debug_chartqa_agent.py". This inconsistency could confuse users following the instructions.

Suggested change

python chartqa_agent.py

python debug_chartqa_agent.py

Copilot · 2025-12-12T07:42:45Z

examples/chartqa/README.md

+Then run the training script with the external store address:
+
+```bash
+AGL_MANAGED_STORE=0 python train_chartqa_agent.py fast --external-store-address http://localhost:4747


The comment mentions "fast" as a config option but the code only defines three configs: "debug", "qwen", and "ci". Using "fast" in the command line example will cause an error. This should be updated to use one of the actual available config options like "qwen" or "debug".

Suggested change

AGL_MANAGED_STORE=0 python train_chartqa_agent.py fast --external-store-address http://localhost:4747

AGL_MANAGED_STORE=0 python train_chartqa_agent.py qwen --external-store-address http://localhost:4747

Copilot · 2025-12-12T07:42:46Z

agentlightning/verl/daemon.py

+                # Compute image_grid_thw for this triplet using image_urls from prompt
+                if self._use_mrope:
+                    image_urls = trace.get("image_urls", [])
+                    image_grid_thw_list.append(self._get_image_grid_thw(image_urls))


When multiple triplets are processed for a single rollout, all of them share the same image_urls list from the first triplet. This could lead to incorrect image_grid_thw computations for subsequent triplets if they have different images. The code should iterate through triplets and compute image_grid_thw for each triplet individually, matching each to its corresponding input sequence.

Copilot · 2025-12-12T07:42:46Z

agentlightning/verl/daemon.py

+        # For mrope (3D position_ids), use the first dimension (text position_ids) for eos calculation
+        if self._use_mrope:
+            # position_ids is (batch_size, 4, seq_length), use first dim for text positions
+            text_position_ids = position_ids[:, 0, :]  # (batch_size, seq_length)
+            eos_mask_idx = torch.argmax(text_position_ids * attention_mask, dim=-1)  # (bsz,)


The variable name "text_position_ids" is misleading because it extracts the first dimension of position_ids, which for M-RoPE includes text positions but the comment suggests this is specifically for text. According to M-RoPE documentation, dimension 0 contains temporal/sequential positions that work for both text and vision tokens. Consider renaming to "sequential_position_ids" or "temporal_position_ids" to better reflect its actual purpose.

Suggested change

# For mrope (3D position_ids), use the first dimension (text position_ids) for eos calculation

if self._use_mrope:

# position_ids is (batch_size, 4, seq_length), use first dim for text positions

text_position_ids = position_ids[:, 0, :] # (batch_size, seq_length)

eos_mask_idx = torch.argmax(text_position_ids * attention_mask, dim=-1) # (bsz,)

# For mrope (3D position_ids), use the first dimension (sequential/temporal position_ids) for eos calculation

if self._use_mrope:

# position_ids is (batch_size, 4, seq_length), use first dim for sequential/temporal positions (per M-RoPE docs)

sequential_position_ids = position_ids[:, 0, :] # (batch_size, seq_length)

eos_mask_idx = torch.argmax(sequential_position_ids * attention_mask, dim=-1) # (bsz,)

Copilot · 2025-12-12T07:42:46Z

agentlightning/verl/daemon.py

+        text_pos = torch.zeros((1, len(input_ids)), dtype=torch.long, device=input_ids.device)
+        text_pos[0, valid_mask] = torch.arange(valid_mask.sum().item(), device=input_ids.device)
+
+        return torch.cat([text_pos, vision_pos], dim=0)


The position_ids computation for M-RoPE creates a zero tensor for text_pos and then fills valid positions, but the logic appears to have an issue. The code creates text_pos with shape (1, len(input_ids)), but input_ids is already a 1D tensor, so len(input_ids) gives the sequence length. However, the indexing text_pos[0, valid_mask] assumes valid_mask has been properly broadcast or is 1D. Since valid_mask comes from attention_mask.bool() and attention_mask is likely 1D, this should work, but it would be clearer to explicitly show the dimensions being used or add shape assertions for debugging.

Suggested change

text_pos = torch.zeros((1, len(input_ids)), dtype=torch.long, device=input_ids.device)

text_pos[0, valid_mask] = torch.arange(valid_mask.sum().item(), device=input_ids.device)

return torch.cat([text_pos, vision_pos], dim=0)

# Assert input shapes for clarity and debugging

assert input_ids.dim() == 1, f"input_ids should be 1D, got shape {input_ids.shape}"

assert attention_mask.shape == input_ids.shape, f"attention_mask shape {attention_mask.shape} does not match input_ids shape {input_ids.shape}"

text_pos = torch.zeros(len(input_ids), dtype=torch.long, device=input_ids.device)

text_pos[valid_mask] = torch.arange(valid_mask.sum().item(), device=input_ids.device)

return torch.cat([text_pos.unsqueeze(0), vision_pos], dim=0)

Copilot · 2025-12-12T07:42:46Z

examples/chartqa/chartqa_agent.py

+            # Local vLLM supports file:// URLs
+            if not image_path.startswith("file://"):
+                image_path = f"file://{os.path.realpath(image_path)}"
+            image_url = image_path


The comment references "file:// URLs" but the code in the else branch doesn't handle all cases correctly. If image_path is already an absolute path starting with "/" but not with "file://", the code will prepend "file://" correctly. However, if image_path is a relative path (which shouldn't happen based on the logic flow), it would create an invalid file:// URL. Consider adding validation to ensure the path is absolute before constructing the file:// URL, or document the assumption that image_path must be absolute when use_base64_images=False.

Suggested change

# Local vLLM supports file:// URLs

if not image_path.startswith("file://"):

image_path = f"file://{os.path.realpath(image_path)}"

image_url = image_path

# Local vLLM supports file:// URLs.

# Always use os.path.realpath to ensure the path is absolute.

abs_path = os.path.realpath(image_path)

image_url = f"file://{abs_path}"

Copilot · 2025-12-12T07:42:47Z

examples/chartqa/chartqa_agent.py

+            gt_num = float(gt.replace(",", ""))
+            if abs(pred_num - gt_num) / max(abs(gt_num), 1e-9) < 0.02:
+                return 1.0
+        except (ValueError, AttributeError):


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except (ValueError, AttributeError):

except (ValueError, AttributeError):

# If conversion to float fails, fall back to substring/partial match below.

ultmaster · 2025-12-12T09:21:46Z

Let's leave the comments for now and resolve them in another PR.

totoluo and others added 12 commits December 6, 2025 18:59

vqa multimodal debug

687d74b

debug

e2abe0c

add multimodal rope utils

393df27

debug training

f9e2b19

debug

e7fb50e

test 10

743564a

multimodal example chartqa

77e843f

sync uv.lock with main

3479bfc

chartqa agent

7ed046f

multimodal agent example

8b0d54f

Fix pre-commit and pyright issues for multimodal example

3d9b4b1

update readme

5aa1932

totoluo marked this pull request as ready for review December 7, 2025 05:18

ultmaster reviewed Dec 8, 2025

View reviewed changes

totoluo changed the title ~~Feature/multimodal-example~~ Dec 8, 2025

totoluo added 3 commits December 8, 2025 21:23

add uv lock and refactored some code

9b126eb

rever accidentally commented code

86ec3c8

update readme

8b07ac5

fix lint

ba024ad

ultmaster and others added 5 commits December 11, 2025 00:09

pin vllm 0.10.2

67e98c4

Merge remote-tracking branch 'upstream/main' into feature/multimodal-…

7916d93

…support-issue-105

update to use adaptor traces and remove nested_asyncio

12395c5

fix lint

c871e16

ready for tests

c30e66d

ultmaster added 2 commits December 12, 2025 00:24

update preparing chartqa dataset

fee9483

.

5b70e94

ultmaster added 9 commits December 12, 2025 12:31

Merge branch 'main' of github.com:microsoft/agent-lightning into feat…

741d8b5

…ure/multimodal-support-issue-105

resolve conflicts

dd5c54d

refactor

89ae0ed

playground fix

fbbf964

fix sync dependency

a246db9

reduce cost

665b104

fix

e683561

update readme

a7fbe73

update chartqa example

4e069c5

ultmaster requested a review from Copilot December 12, 2025 07:38

ultmaster added ci-spider ci-calc-x labels Dec 12, 2025

Copilot started reviewing on behalf of ultmaster December 12, 2025 07:38 View session

fix lint

ae4eeaa

Copilot AI reviewed Dec 12, 2025

View reviewed changes

ultmaster added 3 commits December 12, 2025 16:11

fix lint

78b941d

.

eb4c23e

.

1b4a5b4

ultmaster changed the title ~~New Example/multimodal~~ Dec 12, 2025

ultmaster merged commit 60f9955 into microsoft:main Dec 12, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-modal example: ChartQA #379

Multi-modal example: ChartQA #379

Uh oh!

totoluo commented Dec 7, 2025

totoluo commented Dec 7, 2025

ultmaster left a comment

ultmaster Dec 8, 2025

totoluo Dec 8, 2025

totoluo Dec 9, 2025

ultmaster Dec 8, 2025

ultmaster Dec 8, 2025

ultmaster Dec 8, 2025

ultmaster Dec 8, 2025

ultmaster commented Dec 9, 2025

totoluo commented Dec 9, 2025

ultmaster commented Dec 10, 2025

ultmaster commented Dec 10, 2025

totoluo commented Dec 10, 2025

ultmaster commented Dec 11, 2025

ultmaster commented Dec 12, 2025

github-actions bot commented Dec 12, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

Copilot AI Dec 12, 2025

ultmaster commented Dec 12, 2025

Uh oh!

Labels

2 participants

		@@ -0,0 +1,229 @@
		"""Utilities for multimodal model support in VERL training.

		@@ -0,0 +1,100 @@
		# ChartQA Example

		[![chartqa CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-chartqa.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-chartqa.yml)


		OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "token-abc123")

		OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")

-            if "THE ANSWER IS CORRECT" in last_message.content:  # type: ignore
-                if "THE ANSWER IS INCORRECT" in last_message.content:  # type: ignore
-                    correct_index = last_message.content.rfind("THE ANSWER IS CORRECT")  # type: ignore
-                    incorrect_index = last_message.content.rfind("THE ANSWER IS INCORRECT")  # type: ignore
-                    if correct_index > incorrect_index:
-                        return END
-                else:
-                    return END
+            content = last_message.content  # type: ignore
+            has_correct = "THE ANSWER IS CORRECT" in content
+            has_incorrect = "THE ANSWER IS INCORRECT" in content
+            if has_incorrect and has_correct:
+                # If both are present, decide based on which comes last
+                correct_index = content.rfind("THE ANSWER IS CORRECT")
+                incorrect_index = content.rfind("THE ANSWER IS INCORRECT")
+                if correct_index > incorrect_index:
+                    return END
+                else:
+                    return "refine_answer"
+            elif has_incorrect:
+                return "refine_answer"
+            elif has_correct:
+                return END

	AGL_MANAGED_STORE=0 python train_chartqa_agent.py fast --external-store-address http://localhost:4747
	AGL_MANAGED_STORE=0 python train_chartqa_agent.py qwen --external-store-address http://localhost:4747

-        text_pos = torch.zeros((1, len(input_ids)), dtype=torch.long, device=input_ids.device)
-        text_pos[0, valid_mask] = torch.arange(valid_mask.sum().item(), device=input_ids.device)
-        return torch.cat([text_pos, vision_pos], dim=0)
+        # Assert input shapes for clarity and debugging
+        assert input_ids.dim() == 1, f"input_ids should be 1D, got shape {input_ids.shape}"
+        assert attention_mask.shape == input_ids.shape, f"attention_mask shape {attention_mask.shape} does not match input_ids shape {input_ids.shape}"
+        text_pos = torch.zeros(len(input_ids), dtype=torch.long, device=input_ids.device)
+        text_pos[valid_mask] = torch.arange(valid_mask.sum().item(), device=input_ids.device)
+        return torch.cat([text_pos.unsqueeze(0), vision_pos], dim=0)

	except (ValueError, AttributeError):
	except (ValueError, AttributeError):
	# If conversion to float fails, fall back to substring/partial match below.

Multi-modal example: ChartQA #379

Multi-modal example: ChartQA #379

Uh oh!

Conversation

totoluo commented Dec 7, 2025

totoluo commented Dec 7, 2025

ultmaster left a comment

Choose a reason for hiding this comment

ultmaster Dec 8, 2025

Choose a reason for hiding this comment

totoluo Dec 8, 2025

Choose a reason for hiding this comment

totoluo Dec 9, 2025

Choose a reason for hiding this comment

ultmaster Dec 8, 2025

Choose a reason for hiding this comment

ultmaster Dec 8, 2025

Choose a reason for hiding this comment

ultmaster Dec 8, 2025

Choose a reason for hiding this comment

ultmaster Dec 8, 2025

Choose a reason for hiding this comment

ultmaster commented Dec 9, 2025

totoluo commented Dec 9, 2025

ultmaster commented Dec 10, 2025

ultmaster commented Dec 10, 2025

totoluo commented Dec 10, 2025

ultmaster commented Dec 11, 2025

ultmaster commented Dec 12, 2025

github-actions bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

ultmaster commented Dec 12, 2025

Uh oh!

Labels

2 participants

github-actions bot commented Dec 12, 2025 •

edited

Loading