3DGSIM: Learning 3D-Gaussian Simulators from RGB Videos

Mikel Zhobro, A. René Geist, Georg Martius

University of Tuebingen, Max Planck Institute for Intelligent Systems

Paper arXiv Video Code (coming soon) Data

3DGSim

Our 3DGSim model simulates complex dynamics using only multi-view RGB videos. It represents scenes with 3D Gaussian particles, each having its own latent feature vector, and relies solely on our TEM-PTV3 transformer—no hand-crafted priors needed.

Abstract

We introduce 3DGSim, a fully end-to-end 3D physics simulator. It is trained on multi-view videos to ensure both spatial and temporal consistency, all without relying on inductive biases or ground-truth 3D information during training, which can impede scalability and generalization.

It encodes images into a 3D Gaussian particle representation, utilizes a transformer to propagate dynamics, and renders frames using 3D Gaussian splatting. By jointly training inverse rendering with a dynamics transformer using a temporal encoding and merging layer, 3DGSim embeds physical properties into point-wise latent vectors without enforcing explicit connectivity constraints.

This enables the model to capture diverse physical behaviors, from rigid to elastic and cloth-like interactions, along with realistic lighting effects that also generalize to unseen multi-body interactions and novel scene edits.

Video

Datasets

As part of 3DGSim we introduce three challenging datasets, each addressing distinct physical interactions and deformation characteristics.

In this dataset, the cloth is anchored at four corners, challenging the model to infer implicit constraints and effectively model dynamic deformations characteristic of cloth-like materials.

Results

Generalizations

Editing Scenes

A key advantage of 3DGSim is its 3D representation of the simulator’s state, enabling direct scene editing for modular construction, counterfactual reasoning, and scenario exploration.

Generalization to Multi-Objects Simulations

Despite being trained only on object-ground collisions, 3DGSim correctly captures realistic multi-body dynamics. Instead of collapsing into chaotic interactions, individual objects retain structural integrity and move cohesively.

Learning Shadows as part of dynamcis

A striking consequence of removing explicit physics biases is that 3DGSim not only captures physics but also learns to reason about broader scene properties, such as shadows.

BibTeX

@article{zhobro20253dgsim,
      author    = {Mikel Zhobro and Andreas René Geist and Georg Martius},
      title     = {3DGSim: Learning 3D-Gaussian Simulators from RGB Videos},
      journal   = {arXiv},
      year      = {2025},
      eprint    = {2503.24009},
      url       = {https://arxiv.org/abs/2503.24009}
    }