Reconstructing 3D scenes from a handful of photos usually means minutes of per-scene optimization and known camera poses — impractical for casual capture.
WhiskerSplat predicts a metric-scale 3D field in a single forward pass from ≤3 unposed views, recovering scale and geometry without test-time optimization.
Contributions
Feed-forward reconstruction. A single encoder maps sparse, unposed views to a metric 3D field — no per-scene fitting.
Cross-view attention resolves scale ambiguity across views, replacing pose supervision.
State of the art on CT-Scenes-Hard at 0.9 s/scene — ~40× faster than optimization baselines.
Headline results
28.41
PSNR (dB) ↑
0.087
LPIPS ↓
0.9 s
per scene ↓
3
input views
Method — feed-forward reconstruction pipeline
Figure 1. Sparse views are encoded independently, fused by cross-view attention to fix metric scale, decoded into an anisotropic-splat 3D field, and rendered differentiably. The whole pipeline runs feed-forward at test time.
Qualitative results — CT-Scenes-Hard (3 views)
REFERENCE
OURS
FELINEGS
PAWSPLAT
REFERENCE
OURS
FELINEGS
PAWSPLAT
Quantitative — CT-Scenes-Hard
Method
PSNR↑
LPIPS↓
Time↓
Meowtrics-NeRF
24.10
0.171
38 m
PawSplat
26.30
0.124
4.2 s
Yarn-3R
26.78
0.115
1.1 s
FelineGS
27.05
0.108
1.6 s
TabbyFormer
27.42
0.097
1.0 s
WhiskerSplat
28.41
0.087
0.9 s
Ablation — input view count
2 views26.90
3 views28.41
5 views29.80
9 views30.60
w/o cross-attn27.10
w/o depth sup.27.60
full model28.41
Conclusion
Cross-view attention makes feed-forward, optimization-free 3D reconstruction from sparse unposed views practical and accurate.
Future work: dynamic scenes, in-the-wild lighting, and on-device inference.
References (selected)
K. Park, Y. Kim, C. Lee, M. Choi. WhiskerSplat: Feed-Forward Neural 3D Reconstruction from Sparse Views. PURRCV 2026. arXiv:2608.04217
Y. Kim et al. Meowtrics-NeRF: Optimization-Based Neural Fields for Indoor Scenes. PURRCV 2025.
C. Lee, S. Han. FelineGS: Real-Time Gaussian Splatting for Sparse Capture. Whiskr Tech Report, 2025.
T. Seo et al. CT-Scenes: A Benchmark for Sparse-View 3D Reconstruction. 2025.