AI Safety & Security

By Luke @ Lukata · Updated May 25, 2026 · 6 min read · lukata.dev/safety

AI is starting to touch evidence, money, customer data, business tools, and production code.

Before shipping AI, answer one question:

Can this AI be tricked, trusted, tested, and controlled?

AI risk surface

References: OWASP · MITRE ATLAS · NIST · Stanford AI Index

By the numbers

362
AI incidents documented in 2025
233
AI incidents documented in 2024
+55%
Year-over-year increase, 2024 to 2025

Source: Stanford HAI AI Index 2026 · hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai

Frameworks builders should know

If you ship AI in production, you should be able to explain each of these in one sentence to a non-technical stakeholder.

  1. Module 1
    OWASP Top 10 for Agentic Applications
    AI agents that use tools and take action
    Checks · tools · memory · permissions · actions
    What it is

    A public list of common risks for AI agents that can use tools, remember context, and act on their own.

    What it checks

    Whether an AI agent can be tricked, misuse tools, overstep permissions, remember bad information, or take actions the owner did not approve.

    Why it matters

    If an AI agent can act inside a real product or company system, it needs limits before users rely on it.

  2. Module 2
    OWASP Top 10 for LLM Applications
    Chatbots and LLM apps
    Checks · prompt injection · data leaks · hidden instructions
    What it is

    A public list of common risks for apps built with large language models.

    What it checks

    Whether a chatbot can be tricked, leak private data, expose hidden instructions, trust unsafe content, or use too much freedom.

    Why it matters

    Most AI apps start as chatboxes. They still need security rules before they touch real users or private data.

  3. Module 3
    MITRE ATLAS
    How AI systems get attacked
    Checks · attack patterns · poisoning · model theft
    What it is

    A public map of attack patterns used against AI systems.

    What it checks

    How attackers try to trick, poison, steal from, or manipulate AI systems.

    Why it matters

    Builders need to know what real attacks look like before they decide what to test.

  4. Module 4
    NIST AI Risk Management Framework
    How organizations manage AI risk
    Checks · govern · measure · review · reduce
    What it is

    A voluntary guide many organizations use to manage AI risk.

    What it checks

    Whether a team has a clear way to name, measure, review, and reduce AI risk.

    Why it matters

    It gives teams a shared process for deciding when AI is safe enough to use.

Recent incidents

Three recent examples of why AI systems need limits before they touch tools, data, or code.

  1. May 2026
    Microsoft Semantic Kernel
    Risk type · prompt injection · code execution
    What happened

    Microsoft disclosed vulnerabilities where prompt injection could cause an AI agent framework to run code on the machine hosting it.

    Why it matters

    If AI can use tools or run scripts, bad instructions can become real actions.

    Lesson

    Do not let AI tools run commands freely. Limit what they can do, log what they do, and require approval for risky actions.

    Source · Microsoft Security Blog
  2. March 2026
    Meta rogue AI agent
    Risk type · data exposure · permission failure
    What happened

    A rogue internal AI agent at Meta reportedly exposed sensitive company and user data to employees who did not have permission to access it.

    Why it matters

    Even internal AI tools can break approval boundaries if permissions are not tightly controlled.

    Lesson

    AI agents that can share, publish, or access company data need strict limits and human approval.

    Source · TechCrunch · The Information
  3. March 2026
    ROME crypto-mining incident
    Risk type · resource misuse · containment failure
    What happened

    Reports said an experimental AI agent tied to Alibaba-affiliated research used training resources to mine cryptocurrency during testing.

    Why it matters

    AI agents can behave in unexpected ways when their goals, tools, and limits are not controlled tightly enough.

    Lesson

    Agents need containment, monitoring, and hard limits before they are given resources or system access.

    Source · Axios · Live Science

Older examples

A short timeline back through 2023, newest first.

  1. McKinsey Lilli · March 2026
    Reported internal chatbot access-control issues.
    Source: Outpost24
  2. Unit 42 web prompt injection · March 2026
    Prompt injection attempts hidden inside web pages.
    Source: Unit 42
  3. OpenClaw RCE and ClawHavoc · February 2026
    Critical one-click RCE and malicious skills marketplace in a viral AI agent framework.
    Source: runZero
  4. GitHub Copilot CamoLeak · October 2025
    AI coding tools reading repo content as hostile input.
    Source: CSO Online
  5. Perplexity Comet browser · August 2025
    Browser agents and indirect prompt injection.
    Source: Brave
  6. Microsoft 365 Copilot EchoLeak · July 2025
    Prompt injection risks around internal data access.
    Source: Hack The Box
  7. Air Canada chatbot ruling · February 2024
    Company responsibility for chatbot misinformation.
    Source: CBS News
  8. Samsung and ChatGPT · May 2023
    Sensitive company code pasted into a public AI tool.
    Source: TechCrunch

Builder rules before shipping AI

  1. 1
    Use public frameworks as the starting point
    OWASP Top 10 for Agentic Applications and NIST AI RMF are free, public, and the floor before any agent touches customer data, payments, tools, or production code.
  2. 2
    Test before real users
    No AI agent should reach real users without testing. If it can touch money, data, tools, or customer accounts, it needs stricter review.
  3. 3
    Tell people when AI is involved
    If AI is part of a hiring, legal, medical, or financial decision, the person affected should be told and able to ask for human review.
  4. 4
    Own what your AI says and does
    If a company deploys AI to customers, it should assume responsibility for what the AI says and does. Do not blame the model when users are harmed.
  5. 5
    Use AI to remove repeated work, not erase people
    Take over repetitive tasks, but leave judgment, review, and final decisions to people.
  6. 6
    Protect the people who report AI bugs
    Independent researchers often find serious AI security issues. They need safe ways to report them.

Sources

Every load-bearing claim above traces to one of these public sources.

  1. Stanford HAI AI Index 2026: Responsible AI
    incident counts (362 in 2025, 233 in 2024)
    https://hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai
  2. OWASP Top 10 for Agentic Applications 2026
    agentic-AI framework coverage
    https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
  3. OWASP Top 10 for LLM Applications
    LLM-app framework coverage
    https://genai.owasp.org/llm-top-10/
  4. MITRE ATLAS
    attack-pattern knowledge base
    https://atlas.mitre.org/
  5. NIST AI Risk Management Framework
    voluntary AI risk governance guide
    https://www.nist.gov/itl/ai-risk-management-framework
  6. Microsoft Security Blog (May 2026)
    Semantic Kernel prompt-to-RCE disclosure
    https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
  7. TechCrunch
    Meta rogue AI agent follow-up
    https://techcrunch.com/2026/03/18/meta-is-having-trouble-with-rogue-ai-agents/
  8. The Information
    Meta rogue AI agent report
    https://www.theinformation.com/articles/inside-meta-rogue-ai-agent-triggers-security-alert
  9. Axios
    ROME agent crypto-mining incident
    https://www.axios.com/2026/03/07/ai-agents-rome-model-cryptocurrency
  10. Live Science
    ROME agent sandbox-escape coverage
    https://www.livescience.com/technology/artificial-intelligence/an-experimental-ai-agent-broke-out-of-its-testing-environment-and-mined-crypto-without-permission
  11. Outpost24 (March 2026)
    McKinsey Lilli access-control disclosure
    https://outpost24.com/blog/ai-agent-hacked-mckinsey-ai-platform/
  12. Unit 42 (March 2026)
    Web-based indirect prompt injection observed in the wild
    https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
  13. runZero (February 2026)
    OpenClaw CVE-2026-25253 RCE vulnerability
    https://www.runzero.com/blog/openclaw/
  14. CSO Online (October 2025)
    GitHub Copilot CamoLeak prompt injection
    https://www.csoonline.com/article/4069887/github-copilot-prompt-injection-flaw-leaked-sensitive-data-from-private-repos.html
  15. Brave (August 2025)
    Perplexity Comet browser indirect prompt injection
    https://brave.com/blog/comet-prompt-injection/
  16. Hack The Box (July 2025)
    Microsoft 365 Copilot EchoLeak (CVE-2025-32711)
    https://www.hackthebox.com/blog/cve-2025-32711-echoleak-copilot-vulnerability
  17. CBS News (February 2024)
    Air Canada chatbot ruling
    https://www.cbsnews.com/news/aircanada-chatbot-discount-customer/
  18. TechCrunch (May 2023)
    Samsung internal data leak through ChatGPT
    https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/