Skip to main navigation Skip to search Skip to main content

From pixels to points: An AI framework with weaker-and-fewer-labels for lightweight 3D phenotyping using 2D-3D coordinate mapping and VLMs

  • Lei Zhou
  • , Yingjun Xu
  • , Chu Zhang
  • , Xiya Zhang
  • , Qian Wu
  • , Liming Bian
  • , Osama Elsherbiny
  • , Huichun Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

3D phenotyping of seedlings is crucial to tomato cultivation in greenhouse facilities. Current studies focus on high-quality point cloud reconstruction and artificial intelligence (AI) 3D segmentation to derive phenotypic traits like plant height and crown width, which heavily rely on manual annotation and possess high complexity in deployment. This study proposes a novel AI framework from pixels to points, for efficient 3D plant phenotyping of tomato seedlings. Through the integration of 2D-3D coordinate mapping and AI vision language models, the proposed method enables accurate reconstruction and analysis of 3D phenotypic traits from single-view data. Top-down RGB images and corresponding point clouds with spatial alignment are captured using a binocular camera. Vision language models are employed with the text prompt “plant” to automatically generate bounding boxes and masks, thereby minimizing manual annotation. These outputs are further transferred to a lightweight YOLO11-segment model. The core innovation is established in our 2D-3D mapping strategy, through which plant-specific 3D points are efficiently extracted using only 2D masks. Non-plant points within initial masks are repurposed to determine ground height for improved plant height estimation, while masks are refined using the Excess Green Index to enhance crown width measurement. An mAP₅₀ of 96.0% is achieved by the YOLO11-segment model. Concerning sparse canopy, highly accurate results are yielded by our phenotyping approach, with RMSE values of 1.7 cm for plant height and 1.0 cm for crown width, and R2 values of 0.93 and 0.95 against manual measurements. For dense canopy, the usage of a reference chessboard improves the performance (RMSE was reduced from 9.57 cm to 2.07 cm). Annotation dependency is significantly reduced, computational complexity is decreased, edge deployment is supported, and efficient technology transfer is enabled by the presented method. Considerable potential is offered for high-throughput screening of elite tomato varieties with desirable agronomic traits.

Original languageEnglish
JournalArtificial Intelligence in Agriculture
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2026 The Authors

Keywords

  • Binocular vision
  • Crown width
  • Deep learning
  • Plant height
  • Plant phenotyping
  • Smart agriculture

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Engineering (miscellaneous)
  • General Agricultural and Biological Sciences
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'From pixels to points: An AI framework with weaker-and-fewer-labels for lightweight 3D phenotyping using 2D-3D coordinate mapping and VLMs'. Together they form a unique fingerprint.

Cite this