Info Gov

A pilot programme using frontier AI models to scan public-sector code repositories has identified 407 security findings across nine government organisations, including a critical vulnerability that could have allowed an external attacker to execute arbitrary code on a key digital service, the Department for Science, Innovation and Technology (DSIT) and the National Cyber Security Centre (NCSC) have revealed.


The Government Cyber Coordination Centre (GC3) - a joint NCSC/DSIT body - ran a series of weekly hackathons over a month in which teams used frontier AI systems to scan open-source government code for previously unidentified weaknesses. All critical vulnerabilities identified during the exercise have now been remediated, and the departments involved said no evidence of exploitation had been found for any of the findings.

Teams were given access to frontier models, including Claude Mythos and GPT-5.5, and allowed to design their own tooling rather than follow a mandated methodology. Across the nine participating organisations, the pilot generated 407 findings in total, spanning authentication bypass, data exposure and remote code execution risks. Some had already been identified and mitigated through existing compensating controls; others were previously unknown. The total cost of the exercise, measured in AI model token usage, was reported as £13,000.

The department said AI models were able to trace vulnerabilities across service boundaries, connecting business logic with technical detail in ways traditional static-analysis scanners cannot, though all findings were subject to human validation before entering departmental remediation pipelines.

The most significant finding affected legacy GitHub Actions workflows in a repository supporting a major government digital service. The vulnerability allowed an external user to trigger a chain of automated workflows simply by posting a specially crafted comment on an open pull request. This bypassed the safeguards normally applied to contributions from unverified users, because the trigger was the comment itself rather than the pull request. The department said this level of access could have supported wider repository compromise, including manipulating pull requests, approving workflow activity and altering trusted contributor permissions.

The department said the exercise demonstrated that the architecture surrounding an AI model mattered more than the choice of model itself, with AI Security Institute research cited as showing that near-frontier and frontier models perform comparably when given the right task structure. Effective triage was identified as essential, given that AI agents generate candidate findings far faster than human reviewers can validate them.

GC3 said a second phase of the pilot would extend the approach to additional departments and models, and broaden the scope from public code repositories to closed-source government IT estates, as part of implementation of the Government Cyber Action Plan.

Further details of the exercise can be found at When AI Leaves the Lab: Testing Frontier Models in Government Cyber Defence.

Also in this section

Jun 08, 2026

Second health trust issues notification after 2024 Synnovis ransomware attack

Mid and South Essex NHS Foundation Trust (MSE) has notified patients that personal data held by the trust was compromised in the June 2024 ransomware attack on Synnovis - a pathology services provider- in which criminals unlawfully accessed Synnovis's internal systems and subsequently published stolen files on online forums associated with data theft.

InfoGov Masthead Newsletter 800