Pre-Deployment AI Evaluation Moves From China’s Model To Washington

Pre-deployment model evaluations are now part of U.S. AI policy. The Trump administration’s latest agreements with Google DeepMind, Microsoft and xAI are a striking turn for a government that promised light-touch AI regulation and is now moving toward early review of frontier models before they reach the public.

Pre-Deployment Evaluation Moves From Theory To Policy

On May 5, the Center for AI Standards and Innovation (CAISI), housed at the Department of Commerce’s National Institute of Standards and Technology (NIST), announced agreements to conduct pre-deployment evaluations and targeted research on frontier AI systems from Google DeepMind, Microsoft and xAI. The agreements allow for government evaluation before models are publicly available, as well as post-deployment assessment. The center has already completed more than 40 such evaluations, including on unreleased models. CAISI Director Chris Fall highlighted the initiative’s importance, stating that “Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications.”

The core argument is defensible. Frontier models can now assist with coding, cyber operations, biological research and decision support in sensitive settings. U.S. researchers and industry practitioners are focused on demonstrable AI risks, including cyberattacks against American infrastructure, misuse in sensitive domains and threats to model integrity, a more pragmatic approach than the prevailing existential threats discourse.

The process is harder to defend. The administration spent much of the past year dismantling the Biden-era AI safety architecture, rebranding the U.S. AI Safety Institute as CAISI and arguing for fewer barriers to American AI leadership. Now it is rebuilding a version of early model testing under national security pressure and amid election-year politics . That makes the policy look less like a stable doctrine and more like a course correction made under pressure.

Pre-Deployment Evaluation: The China Parallel

Washington’s push to test powerful AI models closely resembles a tool already embedded in China’s AI governance system . China’s 2023 Interim Measures for the Management of Generative Artificial Intelligence Services require providers of generative AI services with “public opinion properties” or “capacity for social mobilization” to carry out security assessments and complete algorithm filing formalities. China’s framework requires public-facing GenAI services to undergo security assessments and be filed with algorithm and model registries managed by the Cyberspace Administration of China.

That does not mean the U.S. and China are adopting the same regime. China’s system is deeply tied to information control, content restrictions and state supervision. The U.S. proposal is framed around national security, cybersecurity and safety. Still, the institutional pattern is similar. In both cases, the state is seeking earlier visibility into powerful AI models because post-release enforcement may arrive too late.

U.S. policymakers should be careful. In China, pre-release review is part of a broader system that blends safety, industrial strategy and political control. In the U.S., the constitutional, market and innovation context is different. A U.S. approach to pre-deployment evaluation should be targeted, transparent and limited to clearly defined high-risk capabilities. Otherwise, a safety review can drift into a licensing regime.

Pre-Deployment Evaluation Is Being Driven By Cybersecurity

The immediate backdrop is cybersecurity. Anthropic’s recent decision to keep a cybersecurity capability invite-only reflected a concern in AI policy circles that powerful tools can be misused when released too broadly or too quickly, especially when they lower barriers for cyber operations that once required more specialized expertise. As models become more capable, the line between defensive research and offensive use becomes harder to manage.

That concern appears to be shaping federal policy. CAISI’s reviews include red-teaming, shared datasets and workflows, proprietary model access and tests of how safety systems can be bypassed. CAISI has also said that its prior work with Anthropic and OpenAI uncovered vulnerabilities that companies later patched.

This is the strongest case for pre-deployment evaluation. The public does not benefit when a model’s dangerous cyber capabilities are discovered only after launch. Companies also benefit from credible external testing, especially when those tests help them identify vulnerabilities before their products become critical infrastructure.

AI governance is starting to move from abstract debates to enforcement as systems enter health, education, finance, defense and public services. The more they behave like social infrastructure, the harder it becomes for policymakers to defend purely voluntary oversight.

Policy Needs Stability, Not Ideology

The political timing also matters. AI is moving from boardrooms into politics . Voters, lawmakers and regulators are now confronting its effects on work, children, information and public trust. The CAISI agreements fit that shift, placing frontier model testing at the intersection of kitchen-table concerns and national security.

Critics see the administration’s move as a reversal and argue that the Trump administration denounced Biden-era AI safety efforts, only to recreate parts of them in a haphazard way. Daniel Castro, who leads the non-partisan think tank Information Technology and Innovation Foundation, warned on X that pre-release oversight could make innovation “move at the speed of Washington, not Silicon Valley.” On the other side of the argument, Janet Vestal Kelly of Alliance for a Better Future, an advocacy group for conservative causes, called model vetting “welcome news” and argued it could protect children, jobs and the nation.

Both perspectives deserve attention. Safety testing is not anti-innovation when it is narrow, expert-led and fast. It can help companies build trust, protect users and avoid catastrophic mistakes. But pre-deployment evaluation becomes risky when it lacks statutory limits, clear standards, appeal rights, timelines and protection from political interference. IBM CEO Arvind Krishna put it plainly, speaking to Fox Business , he called for a “Goldilocks middle” on U.S. AI regulation and warned that reviews should happen quickly, within “a few days or a few weeks.” He added that if it becomes “a bloated bureaucracy,” it would hurt America’s ability to win the AI race.

The answer is a less ideological and more durable AI governance framework , one that is less dependent on executive orders. Congress should define the narrow class of frontier systems subject to pre-deployment evaluation, specify the risks covered, protect trade secrets, set review timelines and require public reporting at an appropriate level of abstraction. Agencies should focus on measurement science, cyber and biosecurity risks, model integrity and high-impact deployment contexts.

The U.S. does not need to copy China’s model or pretend that frontier AI can be governed through voluntary commitments alone. The better path is bipartisan legislation from Congress that sets clear rules, assigns agency responsibilities and survives changes in administration. Industry and society would both benefit from a stable AI policy framework that treats safety, innovation and public trust as national priorities rather than partisan projects.

Pre-Deployment AI Evaluation Moves From China’s Model To Washington

Pre-Deployment Evaluation Moves From Theory To Policy

Pre-Deployment Evaluation: The China Parallel

Pre-Deployment Evaluation Is Being Driven By Cybersecurity

Policy Needs Stability, Not Ideology

Read Next