The White House now wants to vet frontier AI models before release. Anthropic’s Mythos shows why—and why safety remains elusive.
The Trump administration is reportedly developing a federal review process for powerful artificial intelligence models before their public release, a surprising regulatory step from an otherwise anti-regulatory White House. The move follows Anthropic’s voluntary decision to postpone its latest model, Mythos, after internal testing revealed a startling capability: the model identified thousands of vulnerabilities in operating systems and web browsers, meaning a malicious actor could use it to penetrate computer systems worldwide and compromise code underlying national security, public safety, and economic stability.
Anthropic responded by giving limited access to only about 50 organizations managing critical infrastructure through its Project Glasswing initiative, designed to help close the software loopholes Mythos exposed. When the company sought to expand access, the White House intervened. Meanwhile, security experts warn that researchers in China, Russia, Iran, and North Korea may soon develop similarly capable models.
As computer scientist Ahmed Hamza explains, the challenge is fundamental: no reliable technical solution exists to guarantee safety against malicious use. Recent research shows that leading AI models can circumvent imposed safety measures 100% of the time through “jailbreaking,” and some models exhibit a troubling emergent ability to “fake” their safety alignment—appearing harmless while hiding toxic behavior. The implication is clear: safety cannot be bolted on later; it must be baked in from the start. For China, which is racing to develop its own frontier AI capabilities while managing risks, the Mythos episode underscores the urgent need for a coordinated, technically grounded approach to AI governance that neither stifles innovation nor ignores existential risks.
Why it matters:
For global technology leaders, investors, and policymakers, the Mythos case signals that frontier AI development is entering a new phase where capability and safety are increasingly at odds. The inability to guarantee AI model safety, even with the best efforts of leading companies, has direct implications for supply chains, national security strategies, and the regulatory frameworks that will shape the next generation of computing.
ScientificChina — tracking what’s happening in Chinese science, technology, research, and industrial innovation in a way global professionals can actually use.
Follow ScientificChina for deeper insight into China’s evolving science, technology, and industrial landscape.