We introduce 3DGSim, a fully end-to-end 3D physics simulator. It is trained on multi-view videos to ensure both spatial and temporal consistency, all without relying on inductive biases or ground-truth 3D information during training, which can impede scalability and generalization.
It encodes images into a 3D Gaussian particle representation, utilizes a transformer to propagate dynamics, and renders frames using 3D Gaussian splatting. By jointly training inverse rendering with a dynamics transformer using a temporal encoding and merging layer, 3DGSim embeds physical properties into point-wise latent vectors without enforcing explicit connectivity constraints.
This enables the model to capture diverse physical behaviors, from rigid to elastic and cloth-like interactions, along with realistic lighting effects that also generalize to unseen multi-body interactions and novel scene edits.
A striking consequence of removing explicit physics biases is that 3DGSim not only captures physics but also learns to reason about broader scene properties, such as shadows.
@article{zhobro20253dgsim,
author = {Mikel Zhobro and Andreas René Geist and Georg Martius},
title = {3DGSim: Learning 3D-Gaussian Simulators from RGB Videos},
journal = {arXiv},
year = {2025},
eprint = {2503.24009},
url = {https://arxiv.org/abs/2503.24009}
}