Automated extraction of plant morphological traits is crucial for supporting crop breeding and agricultural management through high-throughput-field-phenotyping (HTFP). Solutions based on multi-view RGB images are attractive due to their scalability and affordability, enabling volumetric measurements that 2D approaches cannot directly capture. While advanced methods like Neural Radiance Fields (NeRFs) have shown promise, their application has been limited to counting or extracting traits from only a few plants or organs. Furthermore, accurately measuring complex structures like individual wheat heads—essential for studying crop yields—remains particularly challenging due to occlusions and the dense arrangement of crop canopies in field conditions. The recent development of 3D Gaussian Splatting (3DGS) offers a promising alternative for HTFP due to its high-quality reconstructions and explicit point-based representation. In this paper, we present Wheat3DGS, a novel approach that leverages 3DGS and the Segment Anything Model (SAM) for precise 3D instance segmentation and morphological measurement of hundreds of wheat heads automatically, representing the first application of 3DGS to HTFP. We validate the accuracy of wheat head extraction against high-resolution laser scan data, obtaining per-instance mean absolute percentage errors of 15.1%, 18.3%, and 40.2% for length, width, and volume respectively. We provide additional comparisons to NeRF-based approaches and traditional Muti-View Stereo (MVS), Our approach enables rapid, non-destructive measurement of key yield-related traits at scale, with significant implications for accelerating crop breeding and improving our understanding of wheat development.
Our Wheat3DGS approach combines 3D Gaussian Splatting with instance segmentation to enable accurate 3D reconstruction and measurement of wheat heads in field conditions, where we obtain the initial 2D wheat head masks using a pretrained wheat head detector and the Segment Anything Model (SAM).
Figure 2: Method overview: Given RGB images of wheat field plots, we first extract 2D segmentation masks of wheat heads and reconstruct the 3D field using Gaussian Splatting. Our novel match-and-fine-tune strategy then iteratively associates 2D masks across views, lifting masks to 3D and projecting 3D segmentations back to other views to refine each wheat head representation. This enables robust 3D wheat head segmentation despite occlusions and perspective changes, followed by trait extraction for phenotyping applications.
Building upon FlashSplat's framework for lifting 2D masks to 3D, we developed a novel match-and-fine-tune strategy to associate wheat heads across multiple views. The process works by first lifting an initial 2D mask to 3D by annotating corresponding Gaussians, then rendering this 3D segmentation in other views to find matching 2D masks with high overlap, which we use to refine the 3D segmentation. This iterative approach gradually builds confidence in Gaussian assignments despite inconsistent views, enabling us to detect and segment hundreds of individual wheat head instances per plot.
We evaluated Wheat3DGS on seven real wheat plots at the Field Phenotyping Platform (FIP), each containing six rows of various wheat varieties (genotypes), where we captured 36 images per plot (30 for training and 6 for evaluation). Our approach achieves high-quality 3D reconstructions, and accurate 3D segmentation and phenotypic measurement of wheat heads even in challenging field conditions.
Our approach achieves detailed reconstructions that capture the fine structure of wheat heads, including awns and individual spikelets. This level of detail is critical for accurate phenotypic measurements.
Figure 3: Comparison between Nerfacto (left) and our 3DGS approach (right).
Method | SSIM ↑ | PSNR ↑ | LPIPS ↓ |
---|---|---|---|
Instant-NGP | 0.662 | 20.891 | 0.506 |
Nerfacto | 0.769 | 25.387 | 0.384 |
FruitNeRF | 0.752 | 23.382 | 0.422 |
3DGS* | 0.843 | 25.447 | 0.226 |
Table 1: Comparison of novel view synthesis quality. Best results in red, second best in orange, third best in yellow.
Wheat3DGS successfully identifies and segments individual wheat heads with high precision and recall, even in cases of occlusion and varying density. This enables per-instance phenotypic measurements.
(a) Pretrained YOLO
(b) YOLO + SAM
(c) DINO + SAHI
(d) GT vs. FruitNeRF
(e) GT vs. Ours
Figure 4: Left (a-c): 2D detection and segmentation of wheat heads using different approaches. Right (d-e): Comparison of projected 3D segmentation with ground truth. Green indicates correct segmentation, orange shows false positives, and red represents missed wheat heads.
Our approach significantly outperforms previous methods on most segmentation metrics when compared against ground truth 2D masks.
Method | IoU (%) | Precision (%) | Recall (%) | F1 | MSE | SSIM |
---|---|---|---|---|---|---|
FruitNeRF | 0.34 | 0.95 | 0.35 | 0.50 | 0.05 | 0.70 |
Ours | 0.50 | 0.81 | 0.57 | 0.67 | 0.06 | 0.90 |
Table 2: Quantitative results against ground truth 2D segmentation masks. Best results per metric are highlighted in red.
We validated the geometric accuracy of our extracted wheat heads by comparing them to aligned Terrestrial Laser Scanning (TLS) data of individual wheat head instances in terms of their 3D morphological traits (length - L, width - W, volume - V). We compared the Gaussian centers (i.e., point clouds) of our 3DGS results to TLS and Multi-View Stereo (MVS) point clouds. Note that automatically finding wheat heads instances in TLS and MVS data relies on the predictions of our method.
per-instance | per-row-average | ||||||
---|---|---|---|---|---|---|---|
L | W | V | L | W | V | ||
ρ | MVS | 0.51 | 0.35 | 0.40 | 0.74 | 0.53 | 0.32 |
3DGS | 0.51 | 0.27 | 0.32 | 0.69 | 0.43 | 0.05 | |
MAE | MVS | 1.51 | 0.35 | 12.57 | 0.58 | 0.19 | 9.64 |
3DGS | 1.48 | 0.25 | 10.72 | 0.79 | 0.13 | 6.12 | |
MAPE | MVS | 16.0 | 26.0 | 47.2 | 5.9 | 15.0 | 39.9 |
3DGS | 15.1 | 18.3 | 40.2 | 8.1 | 9.9 | 24.4 |
Table 3: Per-instance and per-row-average agreement: TLS (reference) vs. 3DGS and MVS. Best results per trait and metric are highlighted in red.
L | W | V | ||||||
---|---|---|---|---|---|---|---|---|
TLS | 3DGS | MVS | TLS | 3DGS | MVS | TLS | 3DGS | MVS |
15.2 | 11.2 | 10.1 | 5.2 | 35.0 | 8.5 | 6.9 | 10.8 | 7.1 |
Table 4: One-way ANOVA F-statistics for length (L), width (W), and volume (V) based on 2389 samples of 42 populations (P-value << 0.01 in each case). Best results per trait are highlighted in red.
Our results show that 3DGS can achieve comparable or even superior performance to MVS in terms of 3D reconstruction quality, which is even more pronounced when removing common failure cases (suppl. material).
Additionally, our results showed strong discriminative power between genotypes (different per-row-averages), sometimes even outperforming TLS data, highlighting the potential of our method for phenotyping applications.
We thank Nicola Storni, Olivia Zumsteg and Mathilda Pohier for their help and advice with data collection, and to Michele Volpi for proof-reading the final version of the paper.
@article{Zhang2025wheat3dgs,
title={Wheat{3DGS}: {I}n-field {3D} {R}econstruction, {I}nstance {S}egmentation and {P}henotyping of {W}heat {H}eads with {G}aussian {S}platting},
author={Daiwei Zhang and Joaquin Gajardo and Tomislav Medic and Isinsu Katircioglu and Mike Boss and Norbert Kirchgessner and Achim Walter and Lukas Roth},
year={2025},
eprint={2504.06978},
journal = {arXiv preprint arXiv:2504.06978 [cs.CV]},
url={https://arxiv.org/abs/2504.06978},
note = {Comment: accepted at CPVRW 2025; Daiwei Zhang and Joaquin Gajardo contributed equally to this work.},
}