President Trump’s new executive order establishes a national policy for pre-deployment AI evaluations. That is a step in the right direction and a departure from the strictly deregulatory approach taken by his administration. Whether a classified, voluntary process can earn public trust at a time when the government and leading AI companies already appear too close for comfort is a question the order leaves open.

In an article last month , I wrote that Washington was moving toward a version of pre-deployment AI evaluation, previously seen in Chinese AI policy. China uses pre-release controls as part of a state-centered governance model. The United States has resisted that path, preferring innovation, market competition and lighter-touch oversight. The June 2 executive order shows that even Washington’s deregulatory posture now has limits. When frontier AI models may materially affect cybersecurity, national security and critical infrastructure, “release first and respond later” is no longer an acceptable policy.

The executive order is narrow, and this is important. Sections 1 and 2 set the objective and then focus mostly on internal government capacity, cyber defense and coordination across federal systems and critical infrastructure. Those are legitimate national priorities. The United States should modernize its cyber defenses, help rural hospitals, local utilities and community banks harden their systems and build stronger AI-enabled defensive capabilities.

Still, the same technical framework that makes cyber policy effective can also make it politically sensitive. That possibility is not theoretical. The Pentagon’s supply-chain risk designation involving Anthropic already showed how judgments about AI firms can become contested, especially when the process is opaque. Supply-chain risk designations, procurement exclusions and informal government pressure may be valid or even necessary in some circumstances. They also require discipline, clear standards and accountability. The machinery built to protect national security can begin to look like a way to favor some players, pressure others or blur the line between risk assessment and political discretion. This makes the governance details essential.

AI Evaluations And National Security

Section 3 is the heart of the order. It directs Treasury, the Department of War and Homeland Security, in consultation with the White House Chief of Staff and several other agencies and government officials, to develop a classified benchmarking process to assess the advanced cyber capabilities of AI models and determine when an advanced AI model is powerful enough to trigger government review. It then creates a voluntary pathway for companies to provide government access to such models for up to 30 days before release to other trusted partners. The order is unclear about the timeline for public release.

This is not a licensing regime. The order says it does not authorize a mandatory “licensing, pre-clearance, or permitting requirement” for new AI models. This language preserves a cooperative framework rather than creating an executive-branch approval system for frontier AI.

I support that basic structure. For cyber capabilities, a fully public benchmark would be self-defeating. If the government published the precise test for whether a model can discover or exploit software vulnerabilities, adversaries would learn from the test. Labs would also train toward the benchmark. In AI, public benchmarks have a short half-life. They become targets, then marketing claims, then artifacts of yesterday’s model cycle.

A classified benchmark can preserve surprise. It can allow the government to test for dangerous capabilities without revealing the vulnerabilities, methods or thresholds that matter most. It can also help align the federal government and frontier labs around the question of what level of AI-enabled cyber capability requires additional scrutiny before broad deployment.

That is a material improvement over vague speeches about AI safety.

Pre-Deployment AI Evaluations Requires Trust

The issue is how to establish legitimacy for a necessarily low-transparency process in a political system where trust is already thin. Participation is voluntary, the process is likely to involve only a small number of frontier AI companies, national security agencies will play a central role, and the results may affect market timing, reputation and access to trusted partners, even as key decisions remain hidden from public view.

That combination creates three risks.

The first is regulatory capture without formal regulation. A company that participates in the process may gain privileged access to government officials, early insight into national security priorities and a reputational advantage over competitors. Even if no one calls it a seal of approval, the market may treat it as one.

The second is selective designation. The benchmarking process may need to remain classified, but the criteria for deciding when a model becomes a “covered frontier model” should be visible. If outsiders cannot understand the basis for designation, they will struggle to know whether similar models are being treated similarly.

The third is mission creep. A cyber benchmark created for frontier model review could begin to shape procurement decisions, export controls, security designations and agency relationships with private firms. Some of that may be appropriate. Some of it may go well beyond the original purpose of the order.

The administration’s close relationship with parts of the technology industry makes this harder. A voluntary process can be sensible. A voluntary process among political allies and economic incumbents can look like club governance. That is not a reason to reject cooperation between government and industry. It is a reason to build guardrails around it.

AI Evaluations Require Public Accountability

The government should publish an unclassified governance framework around the classified benchmark. That framework should explain who participates, how conflicts of interest are handled, what agencies have decision rights, how companies are selected, what general categories of risk are assessed and what happens when a model raises serious concerns.

The government should also report aggregate information. It can disclose the number of models reviewed, general classes of findings, categories of remediation and the extent to which model releases were modified after testing. That can be done without revealing classified methods or cyber vulnerabilities.

Congress should also play a role. Executive orders are unstable. They shift with administrations, personnel and political incentives. If pre-deployment AI evaluation is becoming a recurring feature of American AI governance, Congress should provide a bipartisan statutory foundation.

The order gets the policy problem right. Frontier AI models with advanced cyber capabilities should not be released into the world without serious testing. It also leaves the legitimacy problem unresolved. Secrecy, voluntary participation and industry proximity are a fragile combination.