's   

Rushi's

Ctrl+AI+Ship

  • Home
  • Musings
  • Tech
  • About
  • Contact

Day: May 5, 2026

May 05
2026
0

How to evaluate an AI skill

Posted by Rushi

You installed a skill. The README looks fine. The demo in the docs worked on the first try. Should you keep it? You can’t tell from the README. You have to run the skill on your own work and look at what comes out. This post walks through doing that. We start with the simplest […]

Read More →
tech agent-evaluation, ai-skills, ai-tooling, ai-workflow, anthropic-skills, claude, claude-agent-sdk, claude-cli, developer-tools, evals, llm-as-judge, llm-evaluation, prompt-engineering, skill-evaluation, skill-md, test-fixtures

Tags

ai AI agents AI coding agents angularjs anthropic artificial intelligence automation browser Chrome Claude Code code css cursor design developer tools git Google html images java javascript js linux llm LLMs machine learning MCP nasa ollama open source pics productivity programming prompt engineering Python Research security software development software engineering Spec-Driven Development typescript video videos Windows youtube

RSS RSS

  • Three Days: The launch and government recall of Claude Fable 5
  • Docker Sandboxes (sbx): Running AI Coding Agents in Fully Isolated MicroVMs
  • What Makes a Great Software Architect
  • Stop Running AI Agents Naked: A Developer’s Guide to Sandboxing
  • Headroom: Cutting LLM Token Costs Without Cutting Answers
May 2026
MTWTFSS
 123
45678910
11121314151617
18192021222324
25262728293031
« Apr   Jun »
© 2026  rushis.com. | The content is copyrighted to Rushi and may not be reproduced.