May 5, 2026

The real prize here is not a single diagnostic tool, but a replicable framework for marrying multimodal data with machine learning—a template that could reshape how complex chronic diseases are identified and managed worldwide.

Chinese scientists have conducted a rigorous head-to-head evaluation of five leading multimodal large language models (M-LLMs)—GPT-5, Gemini-2.5 Pro, Gemini-2.5 Flash, GLM-4.5V, and Claude-Sonnet-4.5—for the diagnosis of ocular surface diseases (OSDs). Using a retrospective dataset of 259 representative cases from Aksu, China, the study combined anterior segment photographs with structured clinical data to test each model’s accuracy, safety, and equity in a real-world, resource-limited setting. Published in The American Journal of Pathology, the work addresses a critical gap: while M-LLMs are often touted as clinical support tools, their performance in low-access regions—where OSDs impose a heavy burden—had never been systematically benchmarked.

The findings reveal significant variability in diagnostic reliability across models, raising both promise and caution. The researchers demonstrated that certain M-LLMs can match or approach specialist-level accuracy in ocular surface disease classification when fed high-quality multimodal inputs. However, the study also flagged equity concerns: models with less exposure to diverse, non-Western clinical data performed worse on the Aksu cohort, highlighting the risk of deploying AI tools trained on homogeneous datasets in globally diverse populations.

This research signals a pivotal moment for the field. By systematically evaluating not just raw accuracy but also safety and fairness, the team has provided a blueprint for how M-LLMs should be validated before clinical deployment, particularly in underserved regions. For China, where disparities in ophthalmic care between urban and rural areas are pronounced, these findings could accelerate the development of affordable, AI-driven diagnostic triage systems—potentially reducing the burden on overstretched specialists while improving outcomes for millions suffering from chronic eye disease.

Why it matters:
As multimodal AI moves from research labs into clinical workflows, the question is no longer whether models can perform well in controlled settings, but whether they can do so equitably across different geographies and populations. This study offers a replicable evaluation framework that any medical AI developer can adapt—making it a strategic reference point for investors and suppliers building or deploying diagnostic solutions for emerging markets.

Source →

ScientificChina — tracking what’s happening in Chinese science, technology, research, and industrial innovation in a way global professionals can actually use.

Follow ScientificChina for deeper insight into China’s evolving science, technology, and industrial landscape.

To explore more, visit
ScientificChina.

A Machine Eye for Lupus: China’s Scientists Help Teach AI to See the Unseen

Leave a Reply Cancel reply

You Might Also Like

China’s X‑Ray Vision: The eXTP Mission Poised to Transform Astrophysics

Beyond the Headline: What a Top Battery Scientist’s Departure Means for China’s Advanced Packaging Ambitions

Beyond the Shenzhen Silicon: Why China’s Manufacturing Servicification Matters for Chip Makers

Leave a Reply Cancel reply