AI Safety & Security

By Luke @ Lukata · Updated May 29, 2026 · 8 min read · lukata.dev/safety

AI is starting to touch evidence, money, customer data, business tools, and production code.

Before shipping AI, answer one question:

Can this AI be tricked, trusted, tested, and controlled?

AI risk surface

Tools. AI can take real actions.
Memory. AI can store bad context.
Data. AI can expose private information.
Code. AI can run unsafe changes.

References: OWASP · MITRE ATLAS · NIST · Stanford AI Index

By the numbers

362

AI incidents documented in 2025

233

AI incidents documented in 2024

+55%

Year-over-year increase, 2024 to 2025

Source: Stanford HAI AI Index 2026 · hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai

Frameworks builders should know

If you ship AI in production, you should be able to explain each of these in one sentence to a non-technical stakeholder.

Module 1
OWASP Top 10 for Agentic Applications
AI agents that use tools and take action
Checks · tools · memory · permissions · actions
What it is
A public list of common risks for AI agents that can use tools, remember context, and act on their own.
What it checks
Whether an AI agent can be tricked, misuse tools, overstep permissions, remember bad information, or take actions the owner did not approve.
Why it matters
If an AI agent can act inside a real product or company system, it needs limits before users rely on it.
Module 2
OWASP Top 10 for LLM Applications
Chatbots and LLM apps
Checks · prompt injection · data leaks · hidden instructions
What it is
A public list of common risks for apps built with large language models.
What it checks
Whether a chatbot can be tricked, leak private data, expose hidden instructions, trust unsafe content, or use too much freedom.
Why it matters
Most AI apps start as chatboxes. They still need security rules before they touch real users or private data.
Module 3
MITRE ATLAS
How AI systems get attacked
Checks · attack patterns · poisoning · model theft
What it is
A public map of attack patterns used against AI systems.
What it checks
How attackers try to trick, poison, steal from, or manipulate AI systems.
Why it matters
Builders need to know what real attacks look like before they decide what to test.
Module 4
NIST AI Risk Management Framework
How organizations manage AI risk
Checks · govern · measure · review · reduce
What it is
A voluntary guide many organizations use to manage AI risk.
What it checks
Whether a team has a clear way to name, measure, review, and reduce AI risk.
Why it matters
It gives teams a shared process for deciding when AI is safe enough to use.

How Lukata AI is secured

Most AI gets dangerous when you give it three things: tools it can use, memory it keeps, and data it can reach. This assistant has none of them on purpose, so there is nothing behind it to steal or hijack. Here is what it is hardened against, mapped to the OWASP LLM Top 10, with attack patterns named by MITRE ATLAS and risk handled the way the NIST AI Risk Management Framework suggests: name it, reduce it, review it.

Defended
Prompt injection and jailbreaks
Prompt injection is when someone buries hidden instructions inside normal-looking text to take over what the AI does; a jailbreak is coaxing it past its own rules. A separate fast screen reads every message before the assistant does. Anything adversarial gets a fixed lockdown reply with no AI model in the loop, so there is nothing to talk into breaking its rules.
Maps to · OWASP LLM01 · MITRE ATLAS
Defended
Hidden-instruction leaks
Every assistant runs on private setup instructions that tell it how to behave, and attackers try to trick it into repeating them back. A secret marker is planted in those instructions. If it ever shows up in a reply, that reply is blocked before you see it.
Maps to · OWASP LLM07 · MITRE ATLAS
Defended
Excessive agency (the moat)
Excessive agency means handing an AI more power than it needs: the ability to use tools, take actions, or keep memory. This one has none of that. It cannot use tools, take actions, or remember you between visits. Less power means less to abuse, and nothing to hijack into doing damage.
Maps to · OWASP LLM06
Defended
Making things up
It is built to say "I don't know" instead of inventing an answer. Honesty is treated as a feature, not an afterthought.
Maps to · OWASP LLM09
Defended
Flood and runaway cost
Rate limits and input-size caps stop spam, abuse, and runaway bills, and idle counters are swept automatically so the limiter itself cannot be flooded.
Maps to · OWASP LLM10
Defended
Unsafe replies
Every reply passes through a cleaning step before it reaches your screen, so nothing unexpected slips through the output.
Maps to · OWASP LLM05
Defended
Secret and config leakage
Secrets are the passwords and keys that let the service run, and a careless setup can leak them in an error page or status check. Internal endpoints are locked down, status checks reveal no keys or configuration, and standard security headers are set on every response.
Maps to · OWASP LLM02
Defended
Outdated dependencies
Dependencies are the outside code building blocks an app relies on, and old ones carry known holes. Known-vulnerable packages are patched and pinned, and the dependency audit is kept clean.
Maps to · OWASP LLM03
Not applicable
Training-data poisoning
Poisoning means slipping bad data into an AI while it learns, so it picks up the wrong lessons. Not applicable here: the assistant never trains on what visitors type, so there is no training set for anyone to poison.
Maps to · OWASP LLM04
Not applicable
Vector and embedding attacks
Some assistants pull answers from a stored library of documents, and that library can be tampered with or mined for private data. Not applicable today: there is no document-retrieval database behind this one. That changes only if sourced answers are added later, and this page will say so when they are.
Maps to · OWASP LLM08

Self-reported against public frameworks, not a third-party audit or certification.

Recent incidents

Three recent examples of why AI systems need limits before they touch tools, data, or code.

May 2026
Microsoft Semantic Kernel
Risk type · prompt injection · code execution
What happened
Microsoft disclosed vulnerabilities where prompt injection could cause an AI agent framework to run code on the machine hosting it.
Why it matters
If AI can use tools or run scripts, bad instructions can become real actions.
Lesson
Do not let AI tools run commands freely. Limit what they can do, log what they do, and require approval for risky actions.
Source · Microsoft Security Blog
March 2026
Meta rogue AI agent
Risk type · data exposure · permission failure
What happened
A rogue internal AI agent at Meta reportedly exposed sensitive company and user data to employees who did not have permission to access it.
Why it matters
Even internal AI tools can break approval boundaries if permissions are not tightly controlled.
Lesson
AI agents that can share, publish, or access company data need strict limits and human approval.
Source · TechCrunch · The Information
March 2026
ROME crypto-mining incident
Risk type · resource misuse · containment failure
What happened
Reports said an experimental AI agent tied to Alibaba-affiliated research used training resources to mine cryptocurrency during testing.
Why it matters
AI agents can behave in unexpected ways when their goals, tools, and limits are not controlled tightly enough.
Lesson
Agents need containment, monitoring, and hard limits before they are given resources or system access.
Source · Axios · Live Science

Older examples

A short timeline back through 2023, newest first.

McKinsey Lilli · March 2026
Reported internal chatbot access-control issues.
Source: Outpost24
Unit 42 web prompt injection · March 2026
Prompt injection attempts hidden inside web pages.
Source: Unit 42
OpenClaw RCE and ClawHavoc · February 2026
Critical one-click remote code execution and a malicious skills marketplace in a viral AI agent framework.
Source: runZero
GitHub Copilot CamoLeak · October 2025
AI coding tools reading repo content as hostile input.
Source: CSO Online
Perplexity Comet browser · August 2025
Browser agents and indirect prompt injection.
Source: Brave
Microsoft 365 Copilot EchoLeak · July 2025
Prompt injection risks around internal data access.
Source: Hack The Box
Air Canada chatbot ruling · February 2024
Company responsibility for chatbot misinformation.
Source: CBS News
Samsung and ChatGPT · May 2023
Sensitive company code pasted into a public AI tool.
Source: TechCrunch

Encrypting client data, the right way

Encryption is not one switch. Client data needs protecting in three different places, and the most common mistake is using the wrong tool for the job. Here is the short, correct version.

Step 1
Encrypt data at rest
AES-256
Scrambles stored data so a stolen database or backup is useless without the key. Most managed databases do this for you; the real job is confirming it is on and guarding the key.
Step 2
Encrypt data in transit
TLS 1.2+
Always serve over HTTPS. Never let client data travel over plain HTTP, even between your own services.
Step 3
Hash passwords, do not encrypt them
bcrypt / argon2
Passwords should be one-way hashed with bcrypt or argon2 and a unique salt. Never plain SHA-256, never MD5, never reversible encryption.
Step 4
Use SHA-256 for the right job
SHA-256
SHA-256 proves a file or message was not tampered with. It is a fingerprint, not encryption, and not a way to store passwords.
Step 5
Protect the keys
key management
Encryption is only as strong as how you handle the key. Keep keys out of code and git, use a secrets manager or environment variables, and rotate them.

Builder rules before shipping AI

1
Use public frameworks as the starting point
OWASP Top 10 for Agentic Applications and the NIST AI Risk Management Framework are free, public, and the floor before any agent touches customer data, payments, tools, or production code.
2
Test before real users
No AI agent should reach real users without testing. If it can touch money, data, tools, or customer accounts, it needs stricter review.
3
Tell people when AI is involved
If AI is part of a hiring, legal, medical, or financial decision, the person affected should be told and able to ask for human review.
4
Own what your AI says and does
If a company deploys AI to customers, it should assume responsibility for what the AI says and does. Do not blame the model when users are harmed.
5
Use AI to remove repeated work, not erase people
Take over repetitive tasks, but leave judgment, review, and final decisions to people.
6
Protect the people who report AI bugs
Independent researchers often find serious AI security issues. They need safe ways to report them.

Sources

Every load-bearing claim above traces to one of these public sources.

Stanford HAI AI Index 2026: Responsible AI
incident counts (362 in 2025, 233 in 2024)
https://hai.stanford.edu/ai-index/2026-ai-index-report/responsible-ai
OWASP Top 10 for Agentic Applications 2026
agentic-AI framework coverage
https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
OWASP Top 10 for LLM Applications
LLM-app framework coverage
https://genai.owasp.org/llm-top-10/
MITRE ATLAS
attack-pattern knowledge base
https://atlas.mitre.org/
NIST AI Risk Management Framework
voluntary AI risk governance guide
https://www.nist.gov/itl/ai-risk-management-framework
Microsoft Security Blog (May 2026)
Semantic Kernel prompt-to-remote-code-execution disclosure
https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
TechCrunch
Meta rogue AI agent follow-up
https://techcrunch.com/2026/03/18/meta-is-having-trouble-with-rogue-ai-agents/
The Information
Meta rogue AI agent report
https://www.theinformation.com/articles/inside-meta-rogue-ai-agent-triggers-security-alert
Axios
ROME agent crypto-mining incident
https://www.axios.com/2026/03/07/ai-agents-rome-model-cryptocurrency
Live Science
ROME agent sandbox-escape coverage
https://www.livescience.com/technology/artificial-intelligence/an-experimental-ai-agent-broke-out-of-its-testing-environment-and-mined-crypto-without-permission
Outpost24 (March 2026)
McKinsey Lilli access-control disclosure
https://outpost24.com/blog/ai-agent-hacked-mckinsey-ai-platform/
Unit 42 (March 2026)
Web-based indirect prompt injection observed in the wild
https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/
runZero (February 2026)
OpenClaw CVE-2026-25253 remote-code-execution vulnerability
https://www.runzero.com/blog/openclaw/
CSO Online (October 2025)
GitHub Copilot CamoLeak prompt injection
https://www.csoonline.com/article/4069887/github-copilot-prompt-injection-flaw-leaked-sensitive-data-from-private-repos.html
Brave (August 2025)
Perplexity Comet browser indirect prompt injection
https://brave.com/blog/comet-prompt-injection/
Hack The Box (July 2025)
Microsoft 365 Copilot EchoLeak (CVE-2025-32711)
https://www.hackthebox.com/blog/cve-2025-32711-echoleak-copilot-vulnerability
CBS News (February 2024)
Air Canada chatbot ruling
https://www.cbsnews.com/news/aircanada-chatbot-discount-customer/
TechCrunch (May 2023)
Samsung internal data leak through ChatGPT
https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/