Fast AI Responses in Your Terminal: An Intro to Raypaste

You’re deep in a building sesh. You need a better structured prompt and Cursor/Claude Code’s plan mode is a bit overkill, or a quick explanation of some output/logs, or you need a SQL query based on types in your code. The old workflow: alt-tab to a browser, open a new chat window, wait for the page to load, type your question/prompt and paste whatever content you brought from your IDE/terminal output, copy the result, switch back. Repeat 47+ times a day.

Keep it in your terminal instead.

Raypaste is an open-source CLI tool that puts AI text generation where devs currently spend most of their time — their IDE/terminal. A single command, a streamed response, output copied to clipboard. No distractions. It’s built on OpenRouter, which means you can route to virtually any LLM — but by default raypaste uses ultra-fast open-source models running on Cerebras chips, delivering responses in ~100–900ms.

Key Features

Raypaste is fast and convenient, fitting nicely inbetween heavy hitting AI IDE’s/tools like Cursor, Codex, and Claude Code.

Responses in milliseconds — responses in ~100–900ms using open-source models running on Cerebras hardware.
Interactive mode with slash commands — stay in the loop, get instant responses and use /model, /prompt, /length to update settings on the fly.
Project context awareness — automatically loads context from CLAUDE.md, AGENTS.md, or .cursor/rules/ files to make generated prompts more relevant to your project.
Prompt templating — add and use your own reusable system prompts.
Model aliasing — bring your own key and use any model through open router.

Installation & Quick Start

Getting started takes less than two minutes.

1. Install raypaste

The easiest way on macOS is Homebrew:

brew install --cask raypaste/tap/raypaste

Or install with Go:

go install github.com/raypaste/raypaste-cli@latest

Or build from source:

git clone https://github.com/raypaste/raypaste-cli.git
cd raypaste-cli
./build

2. Get an OpenRouter API key

Raypaste routes requests through OpenRouter. Grab a key at openrouter.ai/keys.

Cerebras BYOK: For maximum speed, you can bring your own Cerebras API key and add it under OpenRouter > Settings > BYOK > Cerebras API key. As of Feb 2026 you get ~1 million free tokens per day from Cerebras.

3. Set your API key

The recommended approach is the config command:

raypaste config set api-key your_api_key_here

Or export it as an environment variable:

export RAYPASTE_API_KEY=your_api_key_here

To make the environment variable permanent, add it to your shell config:

# zsh (macOS default)
echo 'export RAYPASTE_API_KEY=your_api_key_here' >> ~/.zshrc
source ~/.zshrc

Important: Go programs don’t automatically load .env files. You must either export the variable or use the config command above. A config.yaml in your project directory also won’t work — the CLI reads from ~/.raypaste/config.yaml.

4. Run your first prompt

raypaste "help me write a blog post about AI tools for developers"

You’ll see the response stream into your terminal. When it finishes, the full text is copied to your clipboard.

Interactive Mode

For longer sessions — brainstorming, shipping a new feature or change, writing docs — keeping a terminal open with Raypaste’s interactive mode has been helpful for my daily work.

Launch it with:

raypaste interactive
# or
raypaste i

Type your text/question that goes with your selected prompt, press Enter, and get an instant response back. Each request is stateless — Raypaste doesn’t maintain conversation history, which keeps token usage predictable and responses fast.

Slash Commands

Command	What it does
`/model <alias>`	Switch the active model for subsequent requests
`/prompt <name>`	Load a prompt template from `~/.raypaste/prompts/`
`/length <short\|medium\|long>`	Override the response length
`/copy`	Copy the last response to clipboard
`/clear`	Clear the screen
`/help`	Show available commands
`/quit` or `/exit`	Exit interactive mode

Keyboard Shortcuts

Ctrl+C — cancel the current generation (if you’re fast enough)
Ctrl+D — exit interactive mode

Customization

Raypaste out of the box is configured to help write meta prompts for use in your favorite AI code editor. Here’s how to make it yours.

Custom Prompt Templates

Create and manage reusable prompts via the CLI:

Interactive mode:

raypaste config prompt add ascii-art

Or define inline:

raypaste config prompt add ascii-art \
  --description "Convert text to ASCII art" \
  --system "You are an ASCII art expert..." \
  --medium "400"

Use it:

raypaste "coffee cup" -p ascii-art

Managing Prompts

raypaste config prompt list           # List all prompts
raypaste config prompt show <name>    # Show details
raypaste config prompt remove <name>  # Remove custom prompt

Length Directives

Control output size per mode using either:

Token counts — plain integers set max_tokens directly
Text directives — injected into {{.LengthDirective}} in your system prompt

# Token counts
length_directives:
  short: "200"
  medium: "600"

# Text directives
length_directives:
  short: "Be concise, 2-3 sentences."
  medium: "Provide balanced detail."

Omit unwanted lengths to restrict a prompt to specific modes.

Template Variables

{{.LengthDirective}} — Replaced with active length directive (empty for token counts)
{{.Context}} — Replaced with project context when available

You can also create prompts manually in ~/.raypaste/prompts/<name>.yaml:

name: code-review
description: "Generate a code review prompt"
system: "You are a code review expert... Output length guidance: {{.LengthDirective}}"
length_directives:
  short: "200"
  medium: "600"

Model Aliases

Define custom model aliases in ~/.raypaste/config.yaml:

models:
  sonnet-4.6:
    id: "anthropic/claude-sonnet-4.6"
    provider: "anthropic"
    tier: "powerful"

Then use the alias with the -m flag:

raypaste "explain this stack trace" -m sonnet-4.6

You can also use any full OpenRouter model ID directly without defining an alias:

raypaste "hello" -m "anthropic/claude-sonnet-4.6"

Built-in Models

Alias	Model	Provider
`cerebras-llama-8b`	`meta-llama/llama-3.1-8b-instruct`	Cerebras
`cerebras-gpt-oss-120b`	`openai/gpt-oss-120b`	Cerebras
`openai-gpt5-nano`	`openai/gpt-5-nano`	OpenAI

Project Context Awareness

raypaste automatically loads context from your project to make prompts more relevant. It searches for (in order):

CLAUDE.md — Claude-specific guidance
AGENTS.md — Agent configuration
.cursor/rules/ — Cursor/Claude AI rules

Found context is injected via {{.Context}} in your prompts. The status message shows which file was loaded.

raypaste "refactor to use interfaces" -p metaprompt
# Context from CLAUDE.md automatically included

Configuration Reference

Configuration is loaded in this order (later sources override earlier ones):

Default values
~/.raypaste/config.yaml
Environment variables (RAYPASTE_*)
CLI flags

A full config file looks like:

# ~/.raypaste/config.yaml
api_key: "your_api_key_here"
default_model: cerebras-gpt-oss-120b
default_length: medium
disable_copy: false
temperature: 0.7

models:
  my-custom-model:
    id: "provider/model-name"
    provider: "provider-name"
    tier: "fast"
  # more custom models...

Conclusion

Raypaste is a tool I made for myself and have found some success so far with. It’s a tool for developers who don’t want to context-switch for repetitive AI queries. Sure you could use a Gemini Gem, and I still do sometimes, but after building Raypaste and using it for a week I can say there’s some good value in getting to stay in your terminal and have your preset prompts easily accessible. It’s fast, composable, and built to do one thing, do it well, and play nicely with everything else in your workflow.

If you find yourself alt-tabbing to a chat interface more than a few times a day, try out Raypaste and see if it works for you.

The source code, issues, and contribution guidelines are all at github.com/raypaste/raypaste-cli. PRs welcome — especially new built-in prompt templates and model alias ideas.

Keep building and shipping!