Skip to content
View TheShadow29's full-sized avatar

Highlights

  • Pro

Block or report TheShadow29

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
TheShadow29/README.md

Arka Sadhu

Senior Research Scientist on the Surreal team at Meta Reality Labs, working on multimodal agents for smart glasses, egocentric and long-video understanding, and vision-language research.

My work focuses on first-person video, long-context understanding, multimodal post-training, and research systems for real-time contextual assistance. More at theshadow29.github.io.

What I'm Doing

  • Researching multimodal agents for smart glasses, with a focus on egocentric and long-video understanding.
  • Building multimodal models and post-training setups for instruction following, reasoning, and real-time contextual assistance.
  • Working across the full research stack: benchmark design, distributed training, evaluation, low-latency inference, and deployment-facing integration.
  • I enjoy reading the latest papers and articles on LLMs, VLMs, instruction following, reasoning, and agentic workflows.

Connect

Website · Google Scholar · CV · LinkedIn · Twitter/X · Email

Pinned Loading

  1. awesome-grounding awesome-grounding Public

    awesome grounding: A curated list of research papers in visual grounding

    1.1k 103

  2. research-advice-list research-advice-list Public

    A compilation of research advice.

    227 16

  3. VidSitu VidSitu Public

    [CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

    Python 60 8

  4. vognet-pytorch vognet-pytorch Public

    [CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

    Python 69 7

  5. zsgnet-pytorch zsgnet-pytorch Public

    Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

    Python 71 12

  6. Video-QAP Video-QAP Public

    Repository for the paper Video Question Answering with Phrases via Semantic Roles

    Python 4