Fluffy Parrot
Turning Claude API tuning into something you can feel.
The question I was chasing
Could tuning a language model feel less like editing a config file and more like working a piece of hardware?
Every playground I'd used treated temperature, top-p and token limits as form fields: type a number, rerun, hope for the best, lose track of what you changed. I wanted to know whether the act of tuning could feel physical: knobs you turn, takes you compare side by side, a diff you can actually read.
Why it exists
I spend a lot of time tuning prompts against the Claude API, and the feedback loop was always the weakest part. You nudge top-p, you rerun, and you're left comparing two walls of text from memory. There was no instrument, just trial, error, and vibes.
The constraints
Solo build, native-feeling on macOS, and one hard rule: it had to be trustworthy enough that a developer would paste in a real API key. That meant no backend, no telemetry, nothing phoning home: the key lives in the macOS Keychain and the app talks directly to Anthropic. Open source, so anyone can verify that claim rather than take my word for it.
The decisions that mattered
The first version used sliders. They were precise but lifeless; you didn't feel the parameter space. Switching to hardware-style knobs changed the whole character of the tool; tuning became tactile.
The decision that actually mattered, though, was giving every run its own tab with its cost and round-trip time attached. Once two takes sit side by side, you stop trusting your memory and start reading the real diff. That single choice is what turned a playground into an instrument.
Keychain-only key storage was non-negotiable. The moment a credential tool asks you to trust its server, you've lost the developer.
What it is
A free, open-source macOS app. Enter your prompts and context, hit RUN, then dial in temperature, top-p, top-k and token limits on hardware-style knobs. Every run keeps its own tab, cost and latency, so you compare takes instead of guessing.
Built with: Electron, React, TypeScript, the Claude API, Claude Code
Where it landed
It shipped free and open source, and it's the tool I now reach for whenever I need to feel out how a prompt behaves across the parameter space. If I took it further, I'd push the comparison view harder: small multiples of many runs at once, not just two side by side.
Part of the Rolling Waves work archive.