In a groundbreaking study, scientists have unveiled LucaProt, a cutting-edge artificial intelligence (AI) tool poised to transform the field of virology. Designed to tackle the challenges of identifying highly divergent RNA viruses, LucaProt combines deep learning with structural and sequence-based analysis to reveal a stunning 513,134 RNA viral contigs and 161,979 putative viral species—many of which were previously unknown.
This discovery represents an 8.6-fold expansion of RNA virus diversity at the supergroup level compared to the latest classifications by the International Committee on Taxonomy of Viruses (ICTV). With implications for understanding virus evolution, ecology, and potential host interactions, the study offers a comprehensive view of the RNA virosphere, a domain largely considered “biological dark matter.”
LucaProt’s ability to integrate sequence and structural data has led to the discovery of 70,458 unique RNA viruses, including 60 previously unidentified supergroups, vastly expanding the known RNA viral diversity.
Hou, Xin et al.
By leveraging the power of AI, LucaProt bridges significant gaps in the field, paving the way for a more detailed exploration of the unseen viral world.
Expanding the Boundaries of Viral Diversity
Traditionally, RNA virus discovery has relied on sequence similarity comparisons, often constrained by the completeness of existing databases. However, LucaProt integrates both sequence and structural data, providing a new level of precision in identifying the RNA-dependent RNA polymerase (RdRP), a key protein in RNA viruses. The tool’s ability to incorporate structural information has proven essential for uncovering highly divergent viruses that conventional methods overlook.

In total, LucaProt expanded RNA virus diversity at the supergroup level by 8.6 times compared to current classifications by the International Committee on Taxonomy of Viruses (ICTV). Remarkably, it uncovered 60 entirely new supergroups, highlighting just how much of the RNA virosphere remains unexplored.
Outperforming the Competition
Benchmarking against four other virus discovery tools, LucaProt emerged as the most comprehensive and accurate. It achieved the highest recall rate—detecting 98.22% of all RNA viruses in the dataset—while maintaining low false positive rates. In contrast, other tools identified only a fraction (76.82% to 87.81%) of the RNA viruses and missed over half of the new discoveries made by LucaProt alone.

These results underscore LucaProt’s ability to address long-standing limitations in virus discovery tools. By prioritizing both sensitivity and specificity, the AI-driven model bridges the gap between detecting diverse, highly divergent RNA viruses and minimizing errors. This balance makes LucaProt a crucial advancement for virologists seeking to map the largely uncharted RNA virosphere and its evolutionary history. Moreover, its computational efficiency positions it as a scalable solution for analyzing vast genomic datasets, a necessity in the age of big data-driven biology.
New Frontiers in Virus-Host Relationships
One of the study’s most intriguing revelations is the potential hosts of these newly discovered viruses. While many are likely associated with microbial eukaryotes, the findings suggest that a significant proportion could infect bacteria or even archaea. This discovery hints at an evolutionary connection between RNA viruses in bacterial and eukaryotic hosts, opening new avenues for understanding virus-host co-evolution.
Despite its success, the study acknowledges key challenges. Deep evolutionary classification remains a hurdle due to the extreme genetic divergence of RNA viruses. Additionally, the researchers could only identify segments associated with RdRP, leaving other parts of segmented viral genomes unexplored. Future AI models may address these gaps, potentially detecting all genome segments and functional proteins.
AI-Powered Discovery Expands the Known Diversity of RNA Viruses
Another significant limitation is the reliance on data from the Sequence Read Archive (SRA), which lacks complementary DNA sequencing data for many samples.
This study not only highlights the untapped diversity of RNA viruses but also demonstrates the power of AI in uncovering biological ‘dark matter,’ paving the way for breakthroughs in understanding virus evolution and ecology.
Scientific China
This restricted the ability to confirm the RNA nature of certain viral supergroups. To overcome this, future studies could incorporate a more integrated approach, combining both RNA and DNA sequencing data to validate and expand upon the discoveries. Additionally, the co-occurrence patterns of viral genome segments could be leveraged to uncover segments beyond RdRP, shedding light on the complete genetic makeup of these elusive viruses.
A New Era in Virus Discovery
LucaProt’s development marks a significant milestone in virology, not only expanding our knowledge of RNA virus diversity but also setting the stage for more comprehensive studies into viral evolution and ecology. The researchers believe their AI-driven approach can be adapted to study other “biological dark matter,” bringing us closer to unraveling life’s hidden complexities.
This discovery could have far-reaching implications, from understanding pandemic threats to exploring the role of RNA viruses in ecosystems. As researchers continue to refine LucaProt and similar tools, the mysteries of the RNA virosphere are finally coming to light—a reminder of the power of technology to transform science.