PURRCV 2026 · Oral · Catnip Bay

WhiskerSplat Feed-Forward Neural 3D Reconstruction from Sparse Views

Metric-scale 3D from as few as 3 unposed RGB views in ~0.9 s — no per-scene optimization.

Kitty Park¹, Ya-ong Kim^1,2, Calico Lee¹, Mittens Choi¹

¹ CatTower Vision Lab · ² Whiskr Inc ✉ kitty.park@cattower.example

PURRCV 2026 · Oral

CatTower
Vision Lab

github.com/cattower/whiskersplat

Introduction

Reconstructing 3D scenes from a handful of photos usually means minutes of per-scene optimization and known camera poses — impractical for casual capture.

WhiskerSplat predicts a metric-scale 3D field in a single forward pass from ≤3 unposed views, recovering scale and geometry without test-time optimization.

Contributions

Feed-forward reconstruction. A single encoder maps sparse, unposed views to a metric 3D field — no per-scene fitting.
Cross-view attention resolves scale ambiguity across views, replacing pose supervision.
State of the art on CT-Scenes-Hard at 0.9 s/scene — ~40× faster than optimization baselines.

Headline results

28.41

PSNR (dB) ↑

0.087

LPIPS ↓

0.9 s

per scene ↓

input views

Method — feed-forward reconstruction pipeline

Figure 1. Sparse views are encoded independently, fused by cross-view attention to fix metric scale, decoded into an anisotropic-splat 3D field, and rendered differentiably. The whole pipeline runs feed-forward at test time.

Qualitative results — CT-Scenes-Hard (3 views)

REFERENCE

OURS

FELINEGS

PAWSPLAT

REFERENCE

OURS

FELINEGS

PAWSPLAT

Quantitative — CT-Scenes-Hard

Method	PSNR↑	LPIPS↓	Time↓
Meowtrics-NeRF	24.10	0.171	38 m
PawSplat	26.30	0.124	4.2 s
Yarn-3R	26.78	0.115	1.1 s
FelineGS	27.05	0.108	1.6 s
TabbyFormer	27.42	0.097	1.0 s
WhiskerSplat	28.41	0.087	0.9 s

Ablation — input view count

2 views26.90

3 views28.41

5 views29.80

9 views30.60

w/o cross-attn27.10

w/o depth sup.27.60

full model28.41

Conclusion

Cross-view attention makes feed-forward, optimization-free 3D reconstruction from sparse unposed views practical and accurate.

Future work: dynamic scenes, in-the-wild lighting, and on-device inference.

References (selected)

K. Park, Y. Kim, C. Lee, M. Choi. WhiskerSplat: Feed-Forward Neural 3D Reconstruction from Sparse Views. PURRCV 2026. arXiv:2608.04217
Y. Kim et al. Meowtrics-NeRF: Optimization-Based Neural Fields for Indoor Scenes. PURRCV 2025.
C. Lee, S. Han. FelineGS: Real-Time Gaussian Splatting for Sparse Capture. Whiskr Tech Report, 2025.
T. Seo et al. CT-Scenes: A Benchmark for Sparse-View 3D Reconstruction. 2025.