Shape of Motion: 4D Reconstruction from a Single Video

Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE(3) motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.

Check our gallery for more interactive results!

Interactive Demos' cameras and depths are estimated with MegaSaM and this demo is made with viser.

For each method, we render the video from a novel viewpoint and overlay their predicted 3D tracks onto the novel views. TAPIR + Depth Anything does not produce novel views and we instead overlay their tracks onto our renderings.

Acknowledgements

We thank Ruilong Li, Brent Yi, Noah Snavely and Aleksander Holynski for helpful discussion. We also thank Brent Yi for helping us set up the interactive demo. We are in memory of our beloved cat Sriracha, who will always be missed and loved. This project is supported in part by DARPA No. HR001123C0021. and IARPA DOI/IBC No. 140D0423C0035. The views and conclusions contained herein are those of the authors and do not represent the official policies or endorsements of these institutions.

BibTeX


@inproceedings{som2024,
  title     = {Shape of Motion: 4D Reconstruction from a Single Video},
  author    = {Wang, Qianqian and Ye, Vickie and Gao, Hang and Zeng, Weijia and Austin, Jake and Li, Zhengqi and Kanazawa, Angjoo},
  booktitle   = {International Conference on Computer Vision (ICCV)},
  year      = {2025}
}

Shape of Motion: 4D Reconstruction from a Single Video

Abstract

Interactive Demo

Click and move me!

More Results

Input Video

3D Tracks

Novel View

Input Video

3D Tracks

Novel View

Input Video

3D Tracks

Novel View

Input Video

3D Tracks

Novel View

Input Video

3D Tracks

Novel View

3D Tracking Comparison

HyperNeRF

Deformable-3D-GS

TAPIR + Depth Anything

Ours

HyperNeRF

Deformable-3D-GS

TAPIR + Depth Anything

Ours

HyperNeRF

Deformable-3D-GS

TAPIR + Depth Anything

Ours

Novel View Synthesis Comparison

HyperNeRF

Deformable-3D-GS

Ours

HyperNeRF

Deformable-3D-GS

Ours

2D Tracking Comparison

TAPIR

Ours

TAPIR

Ours

Failure Cases

Related links

Acknowledgements

BibTeX

Shape of Motion:
4D Reconstruction from a Single Video