So the analysis is done with another call to claude with instructions like "You are a cybersecurity expert..." basically another level of extreme indirection with unpredictable results, and maybe vulnerable to injection itself
Missed naming opportunity...
DILLINGER
No, no, I'm sure, but -- you understand.
It should only be a couple of days.
What's the thing you're working on?
ALAN
It's called Tron. It's a security
program itself, actually. Monitors
all the contacts between our system
and other systems... If it finds
anything going on that's not scheduled,
it shuts it down. I sent you a memo
on it.
DILLINGER
Mmm. Part of the Master Control Program?
ALAN
No, it'll run independently.
It can watchdog the MCP as well.
Oh man, a complete new industry is to about to unfold. I already feel sorry for the people that jump on the latest remote mcp server and discover that their entire personal life (“what is your biggest anxiety?”) is on the streets
This is pretty cool. You should also attempt to scan resources if possible. Similar to the tool injection attack Invariant Labs discovered, I achieved the same result via resource injection [1].
The three things I want solved to improve local MCP server security are file system access, version pinning, and restricted outbound network access.
I've been running my MCP servers in a Docker container and mounting only the necessary files for the server itself, but this isn't foolproof. I know some others have been experimenting with WASI and Firecracker VMs. I've also been experimenting with setting up a squid proxy in my docker container to restrict outbound access for the MCP servers. All of this being said, it would be nice if there was a standard that was set up to make these things easier.
To solve current AI security problem, we need to throw more AI into it.
Neat, but what’s to stop a server from reporting one innocuous set of tools to MCP-Shield and then a different set of tools to the client?
What if we started the other way, by explicitly declaring what files an LLM process was capable of accessing? a snap container or a chroot might be a good first attempt
May be try out vet as well: https://github.com/safedep/vet
vet is backed by a code analysis engine that performs malicious package (npm, pypi etc.) scanning. We recently extended it to support GitHub repository scanning as well.
It found the malicious behaviour in mcp-servers-example/bad-mcp-server.js https://platform.safedep.io/community/malysis/01JRYPXM0SYTM8...
It seems that writing a tool in anything else than English will bypass most of this scanner
Nice! This is a much-needed space for security tooling, and I appreciate that you've put some thought into the new attack vectors. I also like the combination of signature-based analysis, and having an LLM do its own deep dive.
I expect a lot of people to refine the tool as they use it; one big challenge in maintaining the project is going to be incorporating pull requests that improve the prompt in different directions.
Instead of bending over backwards to secure an MCP server why not just run it as an OS user with very limited minimal permissions?
Suggestion: Integrate with https://kgateway.dev/
Cool work! Thanks for citing our (InvariantLabs) blog posts! I really like the identify-as feature!
We recently launched a similar tool ourselfs, called mcp-scan: https://github.com/invariantlabs-ai/mcp-scan
Sorry, but this will never work very well.
The tool contains a bunch of "denylist regexes", i.e.
`user (should not|must not|cannot) see`
But these can easily be bypassed. Any real security tool should use allowlists, but that is ofc much harder with natural languages.MCP-Shield can also analyse using Claude, but that code contains an easy to exploit prompt injection: https://github.com/riseandignite/mcp-shield/blob/19de96efe5e...
Cool.
If I'm not wrong you don't detect prompt injection done in the tool results? Any plans for that?
I'd like to remind you that tools is a json array to any modern llm inference api. That rather than returning text, tells you which function to call.
I'm all for abstraction of a level of indirection. But this is pushing things too far.
We now have an entire ecosystem, layers of unneeded engineering, cohorts of talent and capital going to create man in the middle servers that forces us to get this array from around the world + maintain a server with several gb of deps to get a json array that you should't trust.
2) It makes sense if every server has a tools.txt equivalent of their own swagger. Eg i would trust google photos to maintain and document their tools rather than the 10,000 MCP servers possibly alive for no reason and already out of date by the time you are done reading this comment. In addition to being over engineered, to trust a random server as a proxy never made any sense.
3) nobody wants to run servers. Can't find this meme, but found it here on HN several times.
Sorry but I would rather not wait a year for this industry to crash and burn and take down genai apps galore or worse, start leaking this data and your bills.
Kudos to document any security gaps though.
New! Snakeoil now AI enhanced
Looking promising, need to scan my servers :) Just added your tool to https://github.com/Puliczek/awesome-mcp-security
You install a service that gives access to a random language generator, then you try to secure it with a project that is literally a few hours old. This is like tripping over your own slippers.
Can you also secure my ~/.ssh/id_rsa2 and ~/.ssh/id_rsa_github and ~/.ssh/id_rsa_foo?
[dead]
People have been struggling with securing against SQL injection attacks for decades, and SQL has explicit rules for quoting values. I don't have a lot of faith in finding a solution that safely includes user input into a prompt, but I would love to be proven wrong.