-
huazhong univisity of science and technology
- wuhan,hubei
- https://blog.csdn.net/qq_25737169
Starred repositories
OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful re…
(ICLR 2026) An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
Multilingual Document Layout Parsing in a Single Vision-Language Model
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Let your Claude able to think
This repository open-sources CreatiPoster, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
XiaomiMiMo / lmms-eval
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
基于Qwen-2.5-1.5B 进行DPO fine-tuning后,意外说真话的AI暴躁哥
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
Solve Visual Understanding with Reinforced VLMs
Fully open reproduction of DeepSeek-R1
OCR, layout analysis, reading order, table recognition in 90+ languages
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Apache ECharts is a powerful, interactive charting and data visualization library for browser
A generative world for general-purpose robotics & embodied AI learning.
🤖一个基于 WeChaty 结合 ChatGPT / Claude / Kimi / DeepSeek / Ollama等Ai服务实现的微信机器人 ,可以用来帮助你自动回复微信消息,或者管理微信群/好友,检测僵尸粉等...
stock股票.获取股票数据,计算股票指标,筹码分布,识别股票形态,综合选股,选股策略,股票验证回测,股票自动交易,支持PC及移动设备。
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
SDG is a specialized framework designed to generate high-quality structured tabular data.
ASCII generator (image to text, image to image, video to video)





