Skip to content
View LDOUBLEV's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report LDOUBLEV

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful re…

Python 1,070 92 Updated Jan 31, 2026

VCode: SVG as Symbolic Visual Representation

Python 122 7 Updated Dec 19, 2025

(ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

Python 182 6 Updated Jan 26, 2026

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 32,212 3,319 Updated Jan 30, 2026

A Scientific Multimodal Foundation Model

629 31 Updated Sep 30, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 7,102 667 Updated Dec 27, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,289 40 Updated Dec 23, 2025

Let your Claude able to think

TypeScript 16,748 1,976 Updated Nov 4, 2025

This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.

78 2 Updated Jun 14, 2025
Python 233 4 Updated Dec 17, 2025

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 71 4 Updated Aug 8, 2025
Python 17 1 Updated Jul 24, 2025

The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"

Python 175 12 Updated Nov 16, 2025

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Python 122 6 Updated Nov 25, 2024

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 8,764 741 Updated Dec 17, 2025

基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥

Jupyter Notebook 69 7 Updated Jan 18, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,533 60 Updated Jun 14, 2025

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Python 299 39 Updated Apr 5, 2025

Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

Python 120 6 Updated Sep 28, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,823 376 Updated Oct 21, 2025

Fully open reproduction of DeepSeek-R1

Python 25,848 2,411 Updated Nov 24, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 19,194 1,312 Updated Jan 30, 2026

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,622 734 Updated Jan 22, 2026

Apache ECharts is a powerful, interactive charting and data visualization library for browser

TypeScript 65,578 19,840 Updated Jan 22, 2026

A generative world for general-purpose robotics & embodied AI learning.

Python 28,068 2,601 Updated Jan 31, 2026

🤖一个基于 WeChaty 结合 ChatGPT / Claude / Kimi / DeepSeek / Ollama等Ai服务实现的微信机器人 ,可以用来帮助你自动回复微信消息,或者管理微信群/好友,检测僵尸粉等...

JavaScript 9,720 1,143 Updated Jan 8, 2026

stock股票.获取股票数据,计算股票指标,筹码分布,识别股票形态,综合选股,选股策略,股票验证回测,股票自动交易,支持PC及移动设备。

Python 11,391 2,335 Updated Jan 30, 2026

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

TypeScript 71,553 8,839 Updated Jan 31, 2026

SDG is a specialized framework designed to generate high-quality structured tabular data.

Python 2,405 385 Updated Jan 19, 2026

ASCII generator (image to text, image to image, video to video)

Python 8,130 630 Updated Nov 22, 2024
Next