Paying for Claude Twice
I’m paying for Claude twice.
Once for my Pro subscription to use Claude Code CLI. And again for API access—to call the exact same model from my code. This isn’t a technical limitation. They could easily use the same infrastructure and just rate-limit based on my membership tier. But they don’t.
So I built a workaround.
The solution seemed obvious once I thought about it. The Claude CLI already has full model access through my subscription. All I needed was something sitting in the middle—accepting OpenAI-formatted requests and piping them through the CLI. A proxy.
I didn’t need streaming for local development. I didn’t need perfect token counts. I just needed it to work.
The idea is straightforward:
- Accept OpenAI-compatible HTTP requests
- Transform them into prompts for Claude CLI
- Pipe the request through
claudevia stdin - Return the response in OpenAI’s expected format
This lets you use familiar OpenAI SDK code while actually calling your local Claude CLI—no API keys or additional billing required.
Implementation #
The proxy is a lightweight Node.js server. Here’s how it works:
// server.js
import express from 'express';
import { spawn } from 'child_process';
const app = express();
app.use(express.json());
app.post('/v1/chat/completions', async (req, res) => {
const { messages, response_format } = req.body;
// Transform OpenAI format to Claude prompt
let prompt = messages
.map(msg => `${msg.role}: ${msg.content}`)
.join('\n');
// Add JSON schema if structured output requested
if (response_format?.type === 'json_object') {
prompt += '\n\nRespond with valid JSON only.';
}
// Spawn Claude CLI process
const claude = spawn('claude', ['--no-color'], {
stdio: ['pipe', 'pipe', 'pipe']
});
claude.stdin.write(prompt);
claude.stdin.end();
// Collect response
let output = '';
claude.stdout.on('data', data => {
output += data.toString();
});
claude.on('close', () => {
// Format as OpenAI response
res.json({
choices: [{
message: {
role: 'assistant',
content: output.trim()
}
}]
});
});
});
app.listen(3000, () => {
console.log('Proxy running on http://localhost:3000');
});
The interesting part isn’t the code—it’s how little code you need. The entire proxy is maybe 50 lines. Most of it is just format translation: turning OpenAI’s message format into text Claude understands, then wrapping Claude’s output in the shape the SDK expects.
The spawn call does the heavy lifting. Each request creates a new Claude CLI process, which is wasteful and slow. But for development? Fast enough.
Using It With OpenAI SDK #
Once the proxy is running, you can use the OpenAI SDK as normal, just pointing it to your local server:
from openai import OpenAI
# Point to local proxy instead of OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="any-string" # Ignored, but required by SDK
)
response = client.chat.completions.create(
model="claude-sonnet-4-5-20250929",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
Structured JSON Output #
One nice bonus: adding JSON schema support is trivial. Just modify the prompt:
response = client.chat.completions.create(
model="claude-sonnet-4-5-20250929",
messages=[
{"role": "user", "content": "List 3 colors"}
],
response_format={"type": "json_object"}
)
# Returns: {"colors": ["red", "blue", "green"]}
Limitations #
This won’t replace the real API. There’s no streaming—each request waits for the entire response. Token counts are estimates. And spawning a new CLI process for every request is absurdly inefficient. But none of that matters for local development.
What matters is I’m not burning API credits while I experiment. What matters is I can use the same OpenAI SDK patterns I already know. And what matters is it took an hour to build instead of signing up for another billing account.
Why This Exists #
It’s frustrating that AI providers force this artificial separation. You’re already paying them. They already have the infrastructure. The only thing stopping them from letting you use your subscription for API calls is a business decision to extract more revenue.
Until they change that, workarounds like this will keep popping up. Not because they’re elegant or efficient, but because developers don’t like paying twice for the same thing.
The full implementation is available at github.com/kushsharma/oai-api-proxy . It took an hour to build. And it shouldn’t have been necessary.