AI Safety & Security
By Luke @ Lukata · Updated May 25, 2026 · 6 min read · lukata.dev/safety
AI is starting to touch evidence, money, customer data, business tools, and production code.
Before shipping AI, answer one question:
Can this AI be tricked, trusted, tested, and controlled?
AI risk surface
- Tools. AI can take real actions.
- Memory. AI can store bad context.
- Data. AI can expose private information.
- Code. AI can run unsafe changes.
References: OWASP · MITRE ATLAS · NIST · Stanford AI Index
By the numbers
Source: Stanford HAI AI Index 2026 · hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai
Frameworks builders should know
If you ship AI in production, you should be able to explain each of these in one sentence to a non-technical stakeholder.
- Module 1OWASP Top 10 for Agentic ApplicationsAI agents that use tools and take actionChecks · tools · memory · permissions · actionsWhat it is
A public list of common risks for AI agents that can use tools, remember context, and act on their own.
What it checksWhether an AI agent can be tricked, misuse tools, overstep permissions, remember bad information, or take actions the owner did not approve.
Why it mattersIf an AI agent can act inside a real product or company system, it needs limits before users rely on it.
- Module 2OWASP Top 10 for LLM ApplicationsChatbots and LLM appsChecks · prompt injection · data leaks · hidden instructionsWhat it is
A public list of common risks for apps built with large language models.
What it checksWhether a chatbot can be tricked, leak private data, expose hidden instructions, trust unsafe content, or use too much freedom.
Why it mattersMost AI apps start as chatboxes. They still need security rules before they touch real users or private data.
- Module 3MITRE ATLASHow AI systems get attackedChecks · attack patterns · poisoning · model theftWhat it is
A public map of attack patterns used against AI systems.
What it checksHow attackers try to trick, poison, steal from, or manipulate AI systems.
Why it mattersBuilders need to know what real attacks look like before they decide what to test.
- Module 4NIST AI Risk Management FrameworkHow organizations manage AI riskChecks · govern · measure · review · reduceWhat it is
A voluntary guide many organizations use to manage AI risk.
What it checksWhether a team has a clear way to name, measure, review, and reduce AI risk.
Why it mattersIt gives teams a shared process for deciding when AI is safe enough to use.
Recent incidents
Three recent examples of why AI systems need limits before they touch tools, data, or code.
- May 2026Microsoft Semantic KernelRisk type · prompt injection · code executionWhat happened
Microsoft disclosed vulnerabilities where prompt injection could cause an AI agent framework to run code on the machine hosting it.
Why it mattersIf AI can use tools or run scripts, bad instructions can become real actions.
LessonDo not let AI tools run commands freely. Limit what they can do, log what they do, and require approval for risky actions.
Source · Microsoft Security Blog - March 2026Meta rogue AI agentRisk type · data exposure · permission failureWhat happened
A rogue internal AI agent at Meta reportedly exposed sensitive company and user data to employees who did not have permission to access it.
Why it mattersEven internal AI tools can break approval boundaries if permissions are not tightly controlled.
LessonAI agents that can share, publish, or access company data need strict limits and human approval.
Source · TechCrunch · The Information - March 2026ROME crypto-mining incidentRisk type · resource misuse · containment failureWhat happened
Reports said an experimental AI agent tied to Alibaba-affiliated research used training resources to mine cryptocurrency during testing.
Why it mattersAI agents can behave in unexpected ways when their goals, tools, and limits are not controlled tightly enough.
LessonAgents need containment, monitoring, and hard limits before they are given resources or system access.
Source · Axios · Live Science
Older examples
A short timeline back through 2023, newest first.
- McKinsey Lilli · March 2026Reported internal chatbot access-control issues.Source: Outpost24
- Unit 42 web prompt injection · March 2026Prompt injection attempts hidden inside web pages.Source: Unit 42
- OpenClaw RCE and ClawHavoc · February 2026Critical one-click RCE and malicious skills marketplace in a viral AI agent framework.Source: runZero
- GitHub Copilot CamoLeak · October 2025AI coding tools reading repo content as hostile input.Source: CSO Online
- Perplexity Comet browser · August 2025Browser agents and indirect prompt injection.Source: Brave
- Microsoft 365 Copilot EchoLeak · July 2025Prompt injection risks around internal data access.Source: Hack The Box
- Air Canada chatbot ruling · February 2024Company responsibility for chatbot misinformation.Source: CBS News
- Samsung and ChatGPT · May 2023Sensitive company code pasted into a public AI tool.Source: TechCrunch
Builder rules before shipping AI
- 1Use public frameworks as the starting pointOWASP Top 10 for Agentic Applications and NIST AI RMF are free, public, and the floor before any agent touches customer data, payments, tools, or production code.
- 2Test before real usersNo AI agent should reach real users without testing. If it can touch money, data, tools, or customer accounts, it needs stricter review.
- 3Tell people when AI is involvedIf AI is part of a hiring, legal, medical, or financial decision, the person affected should be told and able to ask for human review.
- 4Own what your AI says and doesIf a company deploys AI to customers, it should assume responsibility for what the AI says and does. Do not blame the model when users are harmed.
- 5Use AI to remove repeated work, not erase peopleTake over repetitive tasks, but leave judgment, review, and final decisions to people.
- 6Protect the people who report AI bugsIndependent researchers often find serious AI security issues. They need safe ways to report them.
Sources
Every load-bearing claim above traces to one of these public sources.
- Stanford HAI AI Index 2026: Responsible AIincident counts (362 in 2025, 233 in 2024)https://hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai
- OWASP Top 10 for Agentic Applications 2026agentic-AI framework coveragehttps://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
- OWASP Top 10 for LLM ApplicationsLLM-app framework coveragehttps://genai.owasp.org/llm-top-10/
- MITRE ATLASattack-pattern knowledge basehttps://atlas.mitre.org/
- NIST AI Risk Management Frameworkvoluntary AI risk governance guidehttps://www.nist.gov/itl/ai-risk-management-framework
- Microsoft Security Blog (May 2026)Semantic Kernel prompt-to-RCE disclosurehttps://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
- TechCrunchMeta rogue AI agent follow-uphttps://techcrunch.com/2026/03/18/meta-is-having-trouble-with-rogue-ai-agents/
- The InformationMeta rogue AI agent reporthttps://www.theinformation.com/articles/inside-meta-rogue-ai-agent-triggers-security-alert
- AxiosROME agent crypto-mining incidenthttps://www.axios.com/2026/03/07/ai-agents-rome-model-cryptocurrency
- Live ScienceROME agent sandbox-escape coveragehttps://www.livescience.com/technology/artificial-intelligence/an-experimental-ai-agent-broke-out-of-its-testing-environment-and-mined-crypto-without-permission
- Outpost24 (March 2026)McKinsey Lilli access-control disclosurehttps://outpost24.com/blog/ai-agent-hacked-mckinsey-ai-platform/
- Unit 42 (March 2026)Web-based indirect prompt injection observed in the wildhttps://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
- runZero (February 2026)OpenClaw CVE-2026-25253 RCE vulnerabilityhttps://www.runzero.com/blog/openclaw/
- CSO Online (October 2025)GitHub Copilot CamoLeak prompt injectionhttps://www.csoonline.com/article/4069887/github-copilot-prompt-injection-flaw-leaked-sensitive-data-from-private-repos.html
- Brave (August 2025)Perplexity Comet browser indirect prompt injectionhttps://brave.com/blog/comet-prompt-injection/
- Hack The Box (July 2025)Microsoft 365 Copilot EchoLeak (CVE-2025-32711)https://www.hackthebox.com/blog/cve-2025-32711-echoleak-copilot-vulnerability
- CBS News (February 2024)Air Canada chatbot rulinghttps://www.cbsnews.com/news/aircanada-chatbot-discount-customer/
- TechCrunch (May 2023)Samsung internal data leak through ChatGPThttps://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/