Case-studies

Scaling an Astrology Chatbot from Prototype to 10,000 Users

Technical patterns for building and scaling an LLM-powered astrology chatbot. Endpoint mix, caching strategy, context-window management, cost math at 10k users.

OK

Oleg Kopachovets

CTO & Co-Founder

May 9, 2026
6 min read
237 views
Astrology chatbot architecture showing LLM and API integration
Astrology chatbot architecture showing LLM and API integration
0%
Note on this walkthrough. This is an aspirational technical blueprint that mirrors the patterns we see across many chatbot integrations on our platform. Numbers, latencies, and costs are illustrative. Your real production behavior will depend on which LLM you pick, your geography, and your conversation design.

You're building an astrology chatbot. The user types "what's going on with my Mars right now?" and the bot replies with grounded, accurate, personalized text — not generic LLM hallucination. Behind the scenes, an LLM (Claude, GPT, or Gemini) talks to our API to fetch the user's real chart and current transit data, then weaves it into a conversational answer.

Here's how to ship it and scale it to 10,000 users without burning your runway on inference costs.

The naive architecture (and why it fails)

The first instinct most developers have:

javascript
1User message
2LLM with system prompt "you are an astrologer"
3LLM responds

This produces confident, fluent text that has no relationship to the user's actual chart. The model invents transits. It places Saturn in the wrong sign. It hallucinates the user's rising sign. Real users notice within three conversations and churn.

The grounded architecture:

javascript
1User message
2Backend extracts intent (sometimes the LLM does this)
3Backend fetches relevant astrology data via our API
4Backend builds a context-aware prompt with REAL chart data
5LLM responds, grounded in real data

The second pattern works. The first does not.

Endpoint mix

Different conversation intents map to different endpoints. We tune our chat endpoint for this exact pattern.

User intentEndpointCache TTL
"Tell me about my chart"/p/natal-apiForever (per user)
"What's happening today / this week?"/p/personalized-horoscopes-api24 hours
"Pull me a tarot card"/p/tarot-apiNever (random each draw)
"Am I compatible with X?"/p/synastry-apiPer-pair, forever
"Talk to me about my chart" (free-form)/p/astrology-chat-apiPer-day per-user
Anything elseLLM with chart context onlyN/A
The /p/astrology-chat-api endpoint is the workhorse for free-form conversation. It returns a compact, LLM-ready summary of the user's chart plus today's transits, already shaped to fit inside a context window. You don't have to manually assemble natal data and transit data and shrink it to fit Claude's prompt — the endpoint does that work and gives you something you can paste straight into the system message.

The integration pattern

typescript
1// Backend handler for an incoming user message
2async function handleUserMessage(userId: string, message: string) {
3 const user = await db.users.findById(userId);
4
5 // 1. Fetch the LLM-shaped chart context (cached daily)
6 const ctx = await getCachedOrFetch(`chat-ctx:${userId}:${today()}`, async () => {
7 return fetch('https://api.astrology-api.io/v1/astrology-chat/context', {
8 method: 'POST',
9 headers: { 'Authorization': `Bearer ${process.env.ASTROLOGY_API_KEY}` },
10 body: JSON.stringify({
11 natalChart: user.natalChart,
12 date: today(),
13 language: 'en',
14 }),
15 }).then(r => r.json());
16 });
17
18 // 2. Compose the LLM call with grounded context
19 const llmResponse = await anthropic.messages.create({
20 model: 'claude-opus-4-7',
21 max_tokens: 1024,
22 system: `You are a thoughtful astrologer. Use ONLY the data below to answer the user's question. Do not invent placements.
23
24USER CHART:
25${ctx.natalSummary}
26
27TODAY'S TRANSITS:
28${ctx.transitSummary}
29
30CURRENT MOON PHASE: ${ctx.moonPhase}
31`,
32 messages: [
33 { role: 'user', content: message },
34 ],
35 });
36
37 return llmResponse.content[0].text;
38}

Two important things in this code:

  1. ctx is cached daily per user. One API call per user per day is the baseline. If a user sends 50 messages in a day, you still only made one API call to us.
  2. The system prompt instructs the LLM to use ONLY the provided data. This is the difference between grounded and hallucinated output.

Caching strategy

Three layers:

LayerWhat's cachedTTL
User natal chartBirth chart from /p/natal-apiForever (chart never changes)
Daily chat contextOutput of /p/astrology-chat-api/context24 hours
Conversation transcriptsLast 10 turns per user7 days (for LLM context)
The forever cache for natal data is the biggest cost reducer. A user who chats 1,000 times in a year results in one natal API call, not 1,000.

Cost math at 10,000 users

Assumptions: 10,000 monthly active users, average 8 messages per active user per month, 30% of users active on any given day.

Astrology API calls per month:
  • Natal chart calls: ~500 new users a month = 500 requests.
  • Daily chat context: 3,000 daily active users × 30 days = 90,000 requests. With aggressive daily caching this drops to one per user per active day, which is the same 90,000 unless users spread their activity unevenly.
  • Tarot draws (estimated 10% of messages): 8 × 10,000 × 10% = 8,000 requests/month.
  • Synastry on demand: ~2,000 requests/month.
Total: ~100,500 requests/month.

This sits between our Professional tier ($37/mo, 55,000 requests) and Business tier ($99/mo, 220,000 requests). At 10k users you want Business — both for the volume headroom and the additional endpoint coverage (Business unlocks all premium endpoints).

LLM inference costs (Claude Opus 4.7 at current pricing, assuming a 2,000-token average prompt with 500-token reply): roughly $0.025 per turn. At 80,000 turns/month that's about $2,000/month in LLM costs. Inference dominates spend at scale, not the astrology API.
See /pricing for the current API tier table.

Where MCP fits in

If you're building a chatbot that runs inside Claude Desktop, ChatGPT custom GPTs, or any MCP-compatible host, you can skip writing the backend integration code entirely. Use /p/mcp-astrology — our Model Context Protocol server exposes every endpoint as an MCP tool. Claude Desktop discovers the tools automatically; the user types "what's going on with my Mars" and Claude calls our API natively.

When MCP makes sense:

  • Personal-use chatbots inside Claude Desktop.
  • Custom GPTs distributed through the OpenAI GPT store.
  • Internal team tools where your dev team uses Claude as the UI.

When MCP does not make sense:

  • Standalone consumer mobile app with your own UI.
  • Discord or Slack bot (you still need your own host).
  • Web chat where you need fine-grained control over prompts.

For consumer apps with your own UI, the direct API integration pattern shown above is the right call.

What can go wrong

  1. Context window blow-out. If you naively dump the full natal chart JSON into the prompt, you hit token limits fast and pay extra inference cost per turn. The /p/astrology-chat-api/context endpoint pre-summarizes specifically to avoid this. Use it.
  2. LLM invents placements. Always include an explicit instruction in the system prompt to use only the provided data. Add a fallback: if the user asks about something not in the provided context, the bot should say "I don't have that data on hand" rather than guess.
  3. Conversation drift. Conversation transcripts grow unbounded. Truncate to the last 10 turns or use sliding-window summarization to keep token costs predictable.
  4. Daily transit cache miss spikes. If your cache evicts at 03:00 UTC for all users at once, you get a thundering herd against the API at peak. Stagger cache TTLs by user-hash modulo a few hours to spread the load.

Wrap-up

10,000 chatbot users runs comfortably on our Business tier ($99/mo, 220,000 requests) with the right caching. The expensive part is LLM inference, not our API. Cache natal data forever, cache daily transit summaries daily, and use the dedicated chat-API endpoint to stay inside LLM token budgets. Skip the integration layer entirely if you're building inside Claude Desktop — MCP makes the bot work out of the box.

The free tier (50 requests/month) is enough to prototype the conversation flow with a single test user before committing to a paid plan.

Oleg Kopachovets

Oleg Kopachovets

CTO & Co-Founder

Technical founder at Astrology API, specializing in astronomical calculations and AI-powered astrology

More from Astrology API