오픈뉴스백과
세계의 오늘한국의 오늘피드
뉴스
전체 뉴스진영별 의제회사정부과학학술용어사전뉴스로 배우기
커뮤니티제보
...

오픈뉴스백과

집단지성 기반 뉴스 검증 플랫폼. 다양한 시각으로 뉴스를 이해합니다.

후원하기

서비스

세계의 오늘한국의 오늘뉴스정부과학학술용어사전소개

법적 고지

개인정보처리방침이용약관콘텐츠 이용 안내

문의

이메일 문의

본 플랫폼에서 제공하는 뉴스 콘텐츠의 저작권은 각 언론사에 있으며, 무단 복제 및 배포를 금지합니다.

RSS 피드를 통해 수집된 콘텐츠는 각 원저작자의 라이선스 조건을 따릅니다. 오픈 라이선스(CC-BY 등) 콘텐츠는 해당 라이선스에 따라 출처를 표기합니다.

오픈뉴스백과는 뉴스 집계 및 검증 플랫폼으로, 개별 기사의 내용에 대한 책임은 해당 언론사에 있습니다.

이용자가 작성한 피드백, 팩트체크, 독자 제보 등의 콘텐츠에 대한 책임은 해당 작성자에게 있습니다.

콘텐츠 제거 요청: contact@opennewspedia.com

© 2026 오픈뉴스백과 (OpenNewsPedia). All rights reserved.

뉴스 목록
관련 뉴스13건3개 미디어
진보 성향 50%보수 성향 50%
Geo News (Pakistan)
UK Government News
Dawn (Pakistan)
Geo News (Pakistan)
정부
기타

When AI Leaves the Lab: Testing Frontier Models in Government Cyber Defence

UK Government News
조회 0
OGL v3.0
이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.
When AI Leaves the Lab: Testing Frontier Models in Government Cyber Defence The Government Cyber Action Plan aims to boost cyber resilience across the UK public sector by using emerging technologies to manage risk. The Government Cyber Coordination Centre (GC3) - a partnership between the NCSC and the Department for Science, Innovation and Technology - is leading this work, exploring how frontier AI can be applied safely to cyber defence across government. From frontier models to front-line impact We know AI is disrupting the cyber threat landscape. Recently released frontier AI systems such as Claude Mythos and GPT-5.5 brought a step-change in cyber capabilities, and the UK AI Security Institute (AISI)’s evaluations show these models getting better at cyber tasks very quickly. However, evaluation in synthetic environments gives a limited understanding of real-world use. A high score on a benchmark does not necessarily translate into finding and fixing real vulnerabilities. What we did The Government Cyber Coordination Centre led a weekly, in-person series of hackathons which used frontier AI to scan public code repositories across government. Working closely with specialists from the AISI and NCSC, our goal was to find and mitigate previously unidentified vulnerabilities before they could be exploited. Rather than mandate a single approach, we gave teams model access and let them build their own tooling, noticing what worked each week and building on the best approaches. The UK Government encourages new source code to be open by default, with specific and justified exceptions. In practice, that creates a degree of shared visibility that attackers can also exploit. However, this openness also limits duplication and leads to cleaner, more easily maintained code. Code published in the open has also already passed extensive prepublication scrutiny, meaning it can be shared with frontier model providers with minimal additional review. This means that government departments can deploy new capabilities quickly and with confidence. An adversarial chain that challenges itself. One team ran each public repo through a six-stage AI agent pipeline: triage, validator, auditor, tracer, judge, summary. Each stage reads and challenges the last. In one case, the agent downgraded a finding once it established that a backup mechanism was in place. The pipeline was agentic, but the escalation was manual. This means a member of the team checked every line, re-verified exposure, and handled false positives. Deterministic scanners feeding a model. Another team ran traditional scanning tools first (including Gitleaks, Trivy, Semgrep and Hadolint) to generate a ranked findings document. Three model stages were then layered on top: a discovery stage that treated the scanner output as leads and read the source against OWASP and CWE frameworks, a chain-investigation stage that composed individual findings into attack paths via per-chain sub-agents, and a triage stage that confirmed the finding viability. Codifying a multi-service audit into reusable skills. Another department developed five domain-specific Claude Skills. The Skills distil an organisation wide audit across hundreds of services into something repeatable. Skills enabled a reusable, scoped, and consistent approach across every repository and operator. What we found Participants identified 407 findings in total, including critical weaknesses exposing services to authentication bypass, data exposure and remote code execution. Some were already understood and mitigated by compensating controls while others were previously unknown. All critical weaknesses have been remediated, and no evidence of exploitation was identified for any finding. AI models traced vulnerabilities across service boundaries, which traditional scanners can’t do, and linked business logic with technical detail. Departments prioritised validation and remediation through existing frameworks, patching critical and high-risk issues assessed as exploitable. It cost us £13,000 in tokens to find these weaknesses, working across nine government organisations for the month. Identifying Critical vulnerabilities: One notable finding affected legacy GitHub Actions in a repository supporting a key government digital service. The issue allowed an external user to trigger a workflow chain by posting a specially structured comment on an open pull request. This bypassed the usual protections for pull requests from unknown contributors because the workflow was triggered by a comment, not by the pull request itself. The impact was arbitrary remote code execution on the GitHub Actions runner. The workflow took content from the comment, passed it into deployment parameters, and used it in an environment substitution step that executed during the workflow. By placing executable content in the comment field, an external user could cause their input to run on the GitHub runner. This created a route for malicious actors to potentially extract secrets and tokens available to the workflow, including the GitHub token used by the automation. With that level of access, the issue could support wider repository compromise, including manipulating pull requests, approving workflow activity, altering trusted contributor status, and exploit further secrets available to the automation environment. What we learnt Across teams, the common thread was structure. Models were used as components, using Skills, running in parallel across repositories, and a human expert kept in the loop on anything that mattered. We learnt that: - Architecture matters the most. The strongest results came from using frontier models as tightly scoped components inside a structured pipeline. Breaking traditional vulnerability management workflows into discrete, task-specific harnesses let teams scale while controlling false positives and hallucination. - The model matters less than how it’s used. AISI’s research, borne out here, shows that with the right architecture and task design many near-frontier and frontier models perform comparably at scanning code. The best findings still lean heavily on human expertise in breaking the problem down and identifying wider context. - Triage is essential. Agents generate candidate findings far faster than humans can validate them. Poorly scoped runs burn tokens on low-value targets; weak review dumps the load onto stretched security teams. Careful upfront scoping and structured internal filtering of low-confidence findings kept human review focused. As in traditional vulnerability management, it’s not how many issues are found, but whether triage points limited resource where it matters. - Finding isn’t the same as fixing. Findings still had to enter the patch pipeline for remediation. AI shows promise here too, but today prioritisation, review and patch-generation all must integrate without overwhelming human-centred processes. What next GC3 will kick off a second phase of this pilot, with more departments, additional models, and an extension from public code to closed-source estates. Identifying vulnerabilities early on, raising the consistency of defensive practice, and helping departments share on proven techniques is how we put the Government Cyber Action Plan into practice. AISI and NCSC’s involvement will also deepen as we continue to evaluate AI as a tool for cyber defence in applied settings, closing the gap between a theoretical benchmark and a real reduction in risk. This pilot was a test of how government can adopt new capabilities responsibly, learn quickly, and share what works.
전문 보기

이 뉴스, 독자들은 어떻게 느꼈나요?

첫 반응을 남겨보세요

로그인하면 감정 반응에 참여할 수 있어요.

공식 발표 ↔ 진영별 보도

진보 성향 67%보수 성향 33%
8건4건
공식 발표 (1건) — 공공 라이선스 원문 직접 열람
진보 성향8
중도 성향0

보도 없음

보수 성향4
관련 뉴스 제보는 로그인 후 가능합니다.

'government' 카테고리 뉴스

UK to roll out Dutch-style employment support across Britain

UK Government News

The UK will continue to work with others to secure the Mechanism’s legacy in the delivery of justice for the victims of the atrocities: UK Statement at the UN Security Council

UK Government News

Concorso per 130 unità di personale nei ruoli della PCM categoria A-F1, profilo specialista scientifico tecnologico e specialista di comunicazione e sistemi di gestione e informatici, con competenze in materia di digitalizzazione

Governo Italiano

UK Government의 다른 기사

New champion to be appointed for Britain’s mutuals and co-operatives

UK Government News

Russia is not serious about peace and its war against Ukraine is increasingly unsustainable: UK statement to the OSCE

UK Government News

Government welcomes hospitality and tourism sector plans to further strengthen its safety standards to prevent violence against women and girls

UK Government News

피드백

피드백을 남기려면 로그인해 주세요.

🇬🇧UK Government News
보는 중

When AI Leaves the Lab: Testing Frontier Models in Government Cyber Defence

🇵🇰Dawn (Pakistan)

Budget FY26-27: What relief measures has the government announced for next year?

🇵🇰Dawn (Pakistan)

Govt unveils Rs18.8tr budget for FY2026-27; GDP growth targeted at 4pc

🇵🇰Dawn (Pakistan)

Govt unveils Rs18.8tr budget for FY2026-27; economic growth targeted at 4pc

🇵🇰Dawn (Pakistan)

Finance minister presents FY2026-27 budget in NA amid loud protests by opposition

🇵🇰Dawn (Pakistan)

NA session for FY2026-27 budget begins

🇵🇰Dawn (Pakistan)

NA session for FY2026-27 budget presentation yet to begin; PPP says Bilawal will not attend

🇵🇰Dawn (Pakistan)

Budget for FY2026-27 set to be presented in NA today; Bilawal to skip session

🇵🇰Dawn (Pakistan)

Budget for FY2026-27 set to be presented in NA today

🇵🇰Geo News (Pakistan)

FY27 budget: Govt proposes major tax relief for property buyers, sellers

🇵🇰Geo News (Pakistan)

FinMin Aurangzeb unveils Budget 2026-27 under shadow of Mideast crisis

🇵🇰Geo News (Pakistan)

Budget 2026-27: Your guide to key terms that matter

🇵🇰Geo News (Pakistan)

Govt to present Budget 2026-27 today with estimated Rs17.5tr outlay