research-poster · research-designer A0 portrait · scales to fit · Print → Save as PDF for the real poster
Introduction

Reconstructing 3D scenes from a handful of photos usually means minutes of per-scene optimization and known camera poses — impractical for casual capture.

WhiskerSplat predicts a metric-scale 3D field in a single forward pass from ≤3 unposed views, recovering scale and geometry without test-time optimization.

Contributions
  1. Feed-forward reconstruction. A single encoder maps sparse, unposed views to a metric 3D field — no per-scene fitting.
  2. Cross-view attention resolves scale ambiguity across views, replacing pose supervision.
  3. State of the art on CT-Scenes-Hard at 0.9 s/scene — ~40× faster than optimization baselines.
Headline results
28.41
PSNR (dB) ↑
0.087
LPIPS ↓
0.9 s
per scene ↓
3
input views
Method — feed-forward reconstruction pipeline
3 unposed views View Encoder ViT-B, shared per-view tokens Cross-View Attention resolves metric scale 3D Field anisotropic splats ~60k primitives Differentiable Renderer novel views + depth photometric + depth loss (training only)
Figure 1. Sparse views are encoded independently, fused by cross-view attention to fix metric scale, decoded into an anisotropic-splat 3D field, and rendered differentiably. The whole pipeline runs feed-forward at test time.
Qualitative results — CT-Scenes-Hard (3 views)
REFERENCE
OURS
FELINEGS
PAWSPLAT
REFERENCE
OURS
FELINEGS
PAWSPLAT
Quantitative — CT-Scenes-Hard
MethodPSNR↑LPIPS↓Time↓
Meowtrics-NeRF24.100.17138 m
PawSplat26.300.1244.2 s
Yarn-3R26.780.1151.1 s
FelineGS27.050.1081.6 s
TabbyFormer27.420.0971.0 s
WhiskerSplat28.410.0870.9 s
Ablation — input view count
2 views26.90
3 views28.41
5 views29.80
9 views30.60
w/o cross-attn27.10
w/o depth sup.27.60
full model28.41
Conclusion

Cross-view attention makes feed-forward, optimization-free 3D reconstruction from sparse unposed views practical and accurate.

Future work: dynamic scenes, in-the-wild lighting, and on-device inference.

References (selected)
  1. K. Park, Y. Kim, C. Lee, M. Choi. WhiskerSplat: Feed-Forward Neural 3D Reconstruction from Sparse Views. PURRCV 2026. arXiv:2608.04217
  2. Y. Kim et al. Meowtrics-NeRF: Optimization-Based Neural Fields for Indoor Scenes. PURRCV 2025.
  3. C. Lee, S. Han. FelineGS: Real-Time Gaussian Splatting for Sparse Capture. Whiskr Tech Report, 2025.
  4. T. Seo et al. CT-Scenes: A Benchmark for Sparse-View 3D Reconstruction. 2025.