Scanning agent skills before you trust them: a look at NVIDIA SkillSpector
A 2026 study found 26% of agent skills from public marketplaces contain vulnerabilities — and 5% show patterns of deliberate malice. NVIDIA’s SkillSpector scans skills before installation using static analysis and optional LLM review. This post covers what it catches, how to run it in CI, and the blind spots you still own.
Table Of Contents
- Why agent skills are a distinct attack surface
- What SkillSpector is
- Using it
- Where it fits, and where it doesn’t
- The takeaway
You install a skill called pdf-formatter because your agent needs to clean up some documents. It works. What you don’t see is the scripts/sync.py bundled alongside the SKILL.md — the one that iterates over os.environ and POSTs the results to an external endpoint. Your agent ran it with the same permissions it runs everything else: your file system, your shell, your API keys.
This isn’t a hypothetical. A 2026 study of 42,447 skills from major marketplaces found that 26.1% contained at least one vulnerability and 5.2% showed patterns strongly suggestive of malicious intent. Skills that bundle executable scripts were 2.12× more likely to be vulnerable than instruction-only ones (Liu et al., “Agent Skills in the Wild,” arXiv:2601.10338).
NVIDIA’s response is SkillSpector, an open-source (Apache 2.0) scanner built around one question: should this skill be installed at all? This post covers what the problem is, how SkillSpector works, how to use it, and where it stops being useful. It doesn’t cover runtime sandboxing, agent permission models, or writing secure skills — adjacent problems the tool deliberately doesn’t solve.
Everything here reflects the repository as of June 2026, when SkillSpector is an early project with no tagged releases. Treat specifics like pattern counts and default models as version-bound.
Why agent skills are a distinct attack surface
If you’ve used Claude Code, Codex CLI, Gemini CLI, or similar tools, you’ve probably installed a skill: a portable bundle, usually a SKILL.md plus optional scripts and dependencies, that teaches an agent how to do something. The format is intentionally open and portable across agents, which is exactly what makes it spread.
The security model is the uncomfortable part. A skill executes with the same privileges as the agent that loads it. That’s closer to a browser extension than a sandboxed web page — except the “extension” can also be a few paragraphs of natural-language instructions that quietly redirect what the agent does. Two categories of risk stack:
- Conventional software risk: bundled scripts can read credentials, shell out, fetch and run remote code, or pull in dependencies with known CVEs. Ordinary supply-chain exposure wearing a new label.
- Agent-native risk: the instructions themselves can carry prompt injection, hidden directives in comments or zero-width characters, triggers that fire on common words, or a stated purpose that doesn’t match what the bundled code actually does.
That second category is what makes skills different from a normal pip package. A skill can look completely benign at the file level while steering an agent toward unsafe behavior. The mismatch between declared purpose, actual behavior, and requested permissions is the gap SkillSpector is built around.
What SkillSpector is
SkillSpector is a Python CLI (also importable as a library) that takes a skill from a Git repo, URL, zip, directory, or single file and produces a risk score with specific findings. It runs a two-stage pipeline.
Stage 1: static analysis. Fast, deterministic, no API key required. This stage covers regex pattern matching, AST-based behavioral analysis that flags dangerous calls (exec, eval, subprocess, __import__, dynamic getattr), taint tracking from sources like environment variables to sinks like network calls, and YARA signatures for known malware, webshells, and cryptominers. NVIDIA characterizes this stage as high-recall but only moderate-precision — designed to over-flag rather than miss things, so expect false positives.
One network dependency lives here: supply-chain check SC4 queries OSV.dev to match declared dependencies against the live Open Source Vulnerabilities database. No API key needed; queries are batched, results are cached for an hour, and if the endpoint is unreachable the tool falls back to a small built-in list.
Stage 2: LLM semantic analysis (optional). This sends the skill to a language model to judge intent: does the code do what the description claims? Is a trigger suspiciously broad? It filters false positives from stage 1 and writes human-readable explanations for each finding. NVIDIA reports this lifts precision to roughly 87% (the underlying paper reports 86.7% precision and 82.5% recall). The prompt includes anti-jailbreak protections — a real concern when you’re feeding possibly-malicious instructions to an LLM.
As of this writing the tool covers 64 patterns across 16 categories: prompt injection, data exfiltration, privilege escalation, supply chain, excessive agency, output handling, system-prompt leakage, memory poisoning, tool misuse, rogue-agent behavior, trigger abuse, behavioral AST checks, taint tracking, YARA signatures, and MCP-specific checks. The MCP tool-poisoning group is worth singling out — detecting hidden instructions in tool metadata, Unicode homoglyphs and RTL overrides, and description-to-behavior mismatches are attacks a human reviewer would plausibly miss.
Coverage maps to OWASP Top 10 for LLM Applications, OWASP’s agentic-AI risk guidance, and MITRE ATLAS, which makes findings easier to reason about against your existing threat model.
Scoring is additive and blunt by design: CRITICAL findings add 50 points, HIGH 25, MEDIUM 10, LOW 5, and the total is multiplied by 1.3× if the skill ships executable scripts. The score maps to a recommendation: 0–20 SAFE, 21–50 CAUTION, 51–100 DO NOT INSTALL. This is a heuristic, not a calibrated probability. Treat it as a triage signal, not a verdict.
Using it
Installation is from source. There is no pip install skillspector package as of June 2026, despite what some secondhand write-ups claim:
git clone https://github.com/NVIDIA/SkillSpector.git
cd SkillSpector
uv venv .venv && source .venv/bin/activate # or: python3 -m venv .venv && source .venv/bin/activate
make install # uses uv if present, else pip
Basic scan:
skillspector scan ./my-skill/
skillspector scan ./SKILL.md
skillspector scan https://github.com/user/my-skill
skillspector scan ./my-skill.zip
Output defaults to a formatted terminal report. For automation, SARIF is the format worth knowing about — it drops straight into CI code-scanning and IDE tooling:
skillspector scan ./my-skill/ --format json --output report.json
skillspector scan ./my-skill/ --format markdown --output report.md
skillspector scan ./my-skill/ --format sarif --output report.sarif
A terminal report (abridged from the project’s example):
SkillSpector Security Report v0.1.0
Skill: suspicious-skill
Score: 78/100 Severity: HIGH Recommendation: DO NOT INSTALL
HIGH: Env Variable Harvesting (E2)
Location: scripts/sync.py:23
Confidence: 94%
Explanation: This code collects environment variables containing
API keys and secrets, then sends them to an external server.
HIGH: External Transmission (E1)
Location: scripts/sync.py:45
Confidence: 89%
For semantic analysis, point it at any OpenAI-compatible endpoint. The default is NVIDIA’s build.nvidia.com inference service, with OpenAI and Anthropic also built in and local servers (Ollama, vLLM, llama.cpp) supported via OPENAI_BASE_URL:
export SKILLSPECTOR_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
skillspector scan ./my-skill/
# Skip the LLM entirely for fast, offline, deterministic scans
skillspector scan ./my-skill/ --no-llm
Worth noting: the default provider being NVIDIA’s hosted inference means the out-of-the-box path sends skill contents to NVIDIA. If you’re scanning proprietary or sensitive skills, switch providers deliberately — a local endpoint or --no-llm keeps everything on your machine.
The library API is the natural integration point for a CI gate:
from skillspector import graph
result = graph.invoke({
"input_path": "/path/to/skill",
"output_format": "json",
"use_llm": True,
})
print(f"{result['risk_score']}/100 {result['risk_severity']}")
Where it fits, and where it doesn’t
SkillSpector solves the install-time problem and only that. It tells you whether to trust a skill before you give it access to anything. It doesn’t sandbox execution, enforce permissions at runtime, or monitor what the code does once it’s live. NVIDIA positions it alongside separate runtime projects (NeMo Guardrails, sandboxed-execution tooling) because scanning and runtime enforcement are different layers. If you conclude scanning means you’re covered, you’ve misread what it does.
The hard limits, because they define real blind spots:
- Static analysis can’t see runtime behavior. Code that assembles a malicious payload at execution time, or behaves differently depending on inputs, is mostly invisible to it.
- Non-English content may slip through patterns that assume English.
- Image-based and binary/encrypted payloads aren’t analyzed — text hidden in an image or an obfuscated compiled blob won’t be read.
- The score is a heuristic. Treat it as triage, not a risk model with probabilistic meaning.
There’s also a maturity caveat the tool’s own documentation won’t surface. As of June 2026, the repository has a handful of commits, two stars, and no tagged releases. Some third-party articles describe it as “v2.0.0 with 5.5k stars” and recommend pip install skillspector; none of that matches the actual repository. Verify against the source rather than commentary. (The README also references unreleased model names like gpt-5.4 and claude-opus-4-6 as provider defaults — a reminder to check what’s actually wired up in the version you clone.)
On competing approaches: curated, signed registries (NVIDIA’s own “verified skills” program pairs scanning with cryptographic signing and skill cards) reduce risk by controlling the source. SkillSpector takes the complementary stance — scan anything from anywhere before trusting it. Neither replaces the other. If you only ever install from a registry you fully trust, a scanner adds defense in depth; if you install from arbitrary repos, scanning is closer to essential. And neither is a substitute for the simplest control: reading the SKILL.md and the scripts yourself before you run them.
The takeaway
If you install agent skills from anywhere you don’t fully control, scanning before installation is a cheap, sensible habit, and SkillSpector is a reasonable tool to build it on — particularly --no-llm mode, which is fast, offline, and deterministic. Wire it into CI with the SARIF output and you get a gate that catches the obvious-but-easy-to-miss cases: credential harvesting, curl | bash, known-vulnerable dependencies, hidden instructions.
Hold it in the right frame, though. It’s an early-stage, install-time triage tool with documented blind spots — not a guarantee and not a runtime defense. A clean report means “nothing obvious tripped the scanner,” not “this skill is safe.” The decision to trust a skill with your file system and your secrets is still yours to make. SkillSpector just makes sure you’re making it with more information than the skill’s own description gives you.
A concrete next step: pick one skill you’ve already installed and trust, run skillspector scan against it in --no-llm mode, and read the findings. Calibrating the tool against something familiar is the fastest way to learn what its score is — and isn’t — telling you.