I am a final-year Ph.D. student in Computer Science at KAUST, advised by Prof. Bernard Ghanem. Prior to that, I obtained my master degree from Shanghai Jiao Tong University, and bachelor degree from Xi’an JiaoTong University.

My research centers on Long-Form Video Understanding, with expertise in:

  • Video-Language Models, such as video-language pre-training, video-language foundation models.
  • Long-Form Video Understanding, such as temporal action detection, action recognition, and video grounding.

📢 I am actively seeking full-time research positions. Feel free to reach out if you have any opportunities: shuming.liu[at]kaust.edu.sa

🔥 News

  • 2025.06: I start my Internship at Meta as Research Scientist Intern.
  • 2025.05: I am awarded the Dean’s List Award of KAUST for 2025.
  • 2025.04: OpenTAD is accepted to CVPR Workshop 2025.
  • 2025.02: BOLT is accepted by CVPR 2025.
  • 2025.01: I will join Meta as Research Scientist Intern in Summer 2025!
  • 2024.07: One co-authored paper is accepted by ECCV 2024.
  • 2024.06: We rank 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tasks of the EPIC-KITCHENS-100 2024 Challenge, as well as 1st place in the Moment Queries task of the Ego4D 2024 Challenge by using OpenTAD!
  • 2024.06: I am awarded the Dean’s List Award of KAUST for 2024.
  • 2024.05: We release the OpenTAD, which is currently the largest TAD codebase.
  • 2024.02: Two papers are accepted by CVPR 2024.
  • 2024.01: One paper is accepted by ICLR 2024.
  • 2023.02: One paper is accepted by CVPR 2023 and one paper is accepted by CVPRW 2023.

📝 Publications

sym

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Shuming Liu, Bernard Ghanem, Vikas Chandra, Yunyang Xiong, et al.

Arxiv 2026, [Project Page] [Code]

sym

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu, Bernard Ghanem

CVPR 2025, [Code]

sym

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Shuming Liu, Chen Zhao, Bernard Ghanem, et al.

CVPRW 2025, [Code]

sym

Harnessing Temporal Causality for Advanced Temporal Action Detection

Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem

Technical Report, [Code]

sym

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem

CVPR 2024, [Code]

sym

ETAD: Training Action Detection End to End on a Laptop

Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem

CVPRW 2023, [Code]

📖 Educations

  • 2021.09 - now, Ph.D., King Abdullah University of Science and Technology (KAUST), Saudi Arabia.
  • 2018.09 - 2021.04, Master, Shanghai Jiao Tong University (SJTU), China.
  • 2014.09 - 2018.06, Bachelor, Xi’an JiaoTong University (XJTU), China.

🎖 Honors and Awards

  • 2025.05 Dean’s List Award of KAUST (20%)
  • 2024.06 Dean’s List Award of KAUST (20%)
  • 2021.03 Outstanding Graduate of SJTU
  • 2019.12 Scholarship of SJTU (5%)
  • 2018.06 Outstanding Undergraduate of XJTU
  • 2017.12 Scholarship of XJTU (5%)

💻 Service

Conference Reviewer: CVPR, ICCV, ECCV, ICLR, ICML, NeurIPS, AAAI, WACV, BMVC

Journal Reviewer: TPAMI, IJCV, TIP, TMM, Neurocomputing

Teaching Assistant: Introduction to Computer Vision (KAUST), Computer Vision (SJTU)