Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements

Overview

Abstract

Recent 4D Gaussian Splatting (4DGS) methods often fail under fast motion with large inter-frame displacements, where Gaussian attributes are poorly learned during training, and fast-moving objects are often lost from the reconstruction. In this work, we introduce Spatiotemporal Position Implicit Network for 4DGS, coined SPIN-4DGS, which learns Gaussian attributes from explicitly collected spatiotemporal positions rather than modeling temporal displacements, thereby enabling more faithful splatting under fast motions with large inter-frame displacements. To avoid the heavy memory overhead of explicitly optimizing attributes across all spatiotemporal positions, we instead predict them with a lightweight feed-forward network trained under a rasterization-based reconstruction loss. Consequently, SPIN-4DGS learns shared representations across Gaussians, effectively capturing spatiotemporal consistency and enabling stable high-quality Gaussian splatting even under challenging motions. Across extensive experiments, SPIN-4DGS consistently achieves higher fidelity under large displacements, with clear improvements in PSNR and SSIM on challenging sports scenes from the CMU Panoptic dataset. For example, SPIN-4DGS notably outperforms the strongest baseline, D3DGS, by achieving +1.83 higher PSNR on the Basketball scene.

Interactive Qualitative Results

Visual Comparisons

Visual comparisons between our SPIN-4DGS method and strong baselines on challenging sports scenes. The interaction layer is refactored so only the active scene is rendered and loaded at a time.

Choose a scene, then drag the divider to inspect each reconstruction pair. Videos are loaded only for the active scene and only when the cards approach the viewport.

Failure Modes of Prior 4DGS

The Challenge in Fast Motion

Large inter-frame displacement exposes two recurring failure modes in existing 4DGS pipelines: unstable attributes in explicit formulations and broken canonical assumptions in deformable ones.

Motion Regime Large inter-frame displacement

Explicit 4DGS Cross-frame attribute interference

Deformable 4DGS Static canonical mismatch

Explicit 4DGS

Attribute Collapse

Shared Gaussian attributes become inconsistent across fast-moving frames.

Failure Mode

Cross-frame interference accumulates during training, so one Gaussian is pushed to represent incompatible temporal states.

Observed Effect

The reconstruction becomes blurry and temporally unstable as attributes collapse over time.

Deformable 4DGS

Canonical Miss

A static canonical space cannot keep up with rapid non-rigid motion.

Failure Mode

The canonical representation no longer matches the true object position once displacement becomes too large between frames.

Observed Effect

Fast-moving objects disappear or fragment because the model cannot recover them in canonical space.

Key Insight

Decouple position modeling from attribute learning

This observation motivates our core idea: collect reliable spatiotemporal positions first, then predict Gaussian attributes from them implicitly.

Approach

Method Overview

Illustration of the overall framework. SPIN-4DGS consists of two stages of (a) Spatiotemporal Position Estimation and (b) Implicit Network for 4DGS. Specifically, (a) we slice Gaussians along the temporal axis to obtain spatiotemporal position sets and refine them with rasterization loss. Then, (b) the refined positions are normalized and passed through a 4D hash encoder and multibranch decoders to predict Gaussian attributes (scale, rotation, color, and opacity).

Benchmarks

Experimental Results

SPIN-4DGS consistently improves reconstruction fidelity across sports, free-view, and indoor dynamic scenes while remaining stable under challenging motion.

CMU Panoptic Sports

CMU Panoptic Sports. SPIN-4DGS achieves the best PSNR on all six scenes with an average of 30.11 dB, outperforming Realtime-4DGS (28.38 dB) by +1.73 dB and D3DGS (28.70 dB) by +1.41 dB. SSIM remains consistently high at 0.93, demonstrating strong robustness under extreme fast motion and large inter-frame displacements.

Neu3DV Benchmark

Neu3DV Benchmark. Using the default SPIN-4DGS training setup without refinement, our method achieves the highest average PSNR of 32.19 dB, surpassing Realtime-4DGS (32.01 dB) by +0.18 dB and clearly outperforming deformable baselines such as Grid4D (31.49 dB) and 4DGaussian (31.01 dB). SSIM remains consistently high at 0.95, confirming stable and sharp reconstructions even in complex scenes.

MeetRoom Benchmark

MeetRoom Benchmark. Evaluated on three indoor scenes (Discussion, Trimming, VR Headset), SPIN-4DGS consistently achieves the highest PSNR across all baselines. It outperforms the strongest explicit method Realtime-4DGS (30.47 dB) by an average margin of +1.6 dB. These results demonstrate strong robustness in cluttered indoor environments with relatively small motions.

Ablation

Effect of Spatiotemporal Position Slicing

Without slicing, Gaussians are optimized jointly across time, causing cross-frame interference and blurred fast-motion reconstructions. Our spatiotemporal slicing decouples Gaussians per frame, improving fidelity while reducing training cost.

Qualitative comparison of spatiotemporal slicing. Without slicing, Gaussians interfere across frames, causing blur and artifacts. Our slicing decouples Gaussians per time step, yielding sharper and temporally consistent reconstructions.

Slicing	PSNR ↑	SSIM ↑	Train Time ↓	Train Memory ↓
✗	27.48	0.89	1h 20m	18GB
✓	28.96	0.92	25m	9GB

Evaluated on Football sequence (batch size fixed to 1). Slicing improves fidelity and reduces training cost.

Ablation

Effect of Spatiotemporal Position Refinement

Qualitative results on spatiotemporal position refinement. Figure 6 and the videos above show that SPIN-4DGS already captures stable object structures with only 0.5K refinement iterations. Increasing the budget to 1K and 2K mainly improves fine details such as ball edges and racket nets.

Reference

BibTeX

@inproceedings{kim2026implicit,
title={Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements},
author={Seung-gyeom Kim and Areum Kim and Yongjae Yoo and Sukmin Yun},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=MWtXs60n38}
}