Note on this walkthrough. This is an aspirational technical blueprint that mirrors the patterns we see across many chatbot integrations on our platform. Numbers, latencies, and costs are illustrative. Your real production behavior will depend on which LLM you pick, your geography, and your conversation design.
You're building an astrology chatbot. The user types "what's going on with my Mars right now?" and the bot replies with grounded, accurate, personalized text — not generic LLM hallucination. Behind the scenes, an LLM (Claude, GPT, or Gemini) talks to our API to fetch the user's real chart and current transit data, then weaves it into a conversational answer.
Here's how to ship it and scale it to 10,000 users without burning your runway on inference costs.
The naive architecture (and why it fails)
The first instinct most developers have:
1User message2 → LLM with system prompt "you are an astrologer"3 → LLM respondsThis produces confident, fluent text that has no relationship to the user's actual chart. The model invents transits. It places Saturn in the wrong sign. It hallucinates the user's rising sign. Real users notice within three conversations and churn.
The grounded architecture:
1User message2 → Backend extracts intent (sometimes the LLM does this)3 → Backend fetches relevant astrology data via our API4 → Backend builds a context-aware prompt with REAL chart data5 → LLM responds, grounded in real dataThe second pattern works. The first does not.
Endpoint mix
Different conversation intents map to different endpoints. We tune our chat endpoint for this exact pattern.
| User intent | Endpoint | Cache TTL |
|---|---|---|
| "Tell me about my chart" | /p/natal-api | Forever (per user) |
| "What's happening today / this week?" | /p/personalized-horoscopes-api | 24 hours |
| "Pull me a tarot card" | /p/tarot-api | Never (random each draw) |
| "Am I compatible with X?" | /p/synastry-api | Per-pair, forever |
| "Talk to me about my chart" (free-form) | /p/astrology-chat-api | Per-day per-user |
| Anything else | LLM with chart context only | N/A |
/p/astrology-chat-api endpoint is the workhorse for free-form conversation. It returns a compact, LLM-ready summary of the user's chart plus today's transits, already shaped to fit inside a context window. You don't have to manually assemble natal data and transit data and shrink it to fit Claude's prompt — the endpoint does that work and gives you something you can paste straight into the system message.The integration pattern
1// Backend handler for an incoming user message2async function handleUserMessage(userId: string, message: string) {3 const user = await db.users.findById(userId);4
5 // 1. Fetch the LLM-shaped chart context (cached daily)6 const ctx = await getCachedOrFetch(`chat-ctx:${userId}:${today()}`, async () => {7 return fetch('https://api.astrology-api.io/v1/astrology-chat/context', {8 method: 'POST',9 headers: { 'Authorization': `Bearer ${process.env.ASTROLOGY_API_KEY}` },10 body: JSON.stringify({11 natalChart: user.natalChart,12 date: today(),13 language: 'en',14 }),15 }).then(r => r.json());16 });17
18 // 2. Compose the LLM call with grounded context19 const llmResponse = await anthropic.messages.create({20 model: 'claude-opus-4-7',21 max_tokens: 1024,22 system: `You are a thoughtful astrologer. Use ONLY the data below to answer the user's question. Do not invent placements.23
24USER CHART:25${ctx.natalSummary}26
27TODAY'S TRANSITS:28${ctx.transitSummary}29
30CURRENT MOON PHASE: ${ctx.moonPhase}31`,32 messages: [33 { role: 'user', content: message },34 ],35 });36
37 return llmResponse.content[0].text;38}Two important things in this code:
ctxis cached daily per user. One API call per user per day is the baseline. If a user sends 50 messages in a day, you still only made one API call to us.- The system prompt instructs the LLM to use ONLY the provided data. This is the difference between grounded and hallucinated output.
Caching strategy
Three layers:
| Layer | What's cached | TTL |
|---|---|---|
| User natal chart | Birth chart from /p/natal-api | Forever (chart never changes) |
| Daily chat context | Output of /p/astrology-chat-api/context | 24 hours |
| Conversation transcripts | Last 10 turns per user | 7 days (for LLM context) |
Cost math at 10,000 users
Assumptions: 10,000 monthly active users, average 8 messages per active user per month, 30% of users active on any given day.
- Natal chart calls: ~500 new users a month = 500 requests.
- Daily chat context: 3,000 daily active users × 30 days = 90,000 requests. With aggressive daily caching this drops to one per user per active day, which is the same 90,000 unless users spread their activity unevenly.
- Tarot draws (estimated 10% of messages): 8 × 10,000 × 10% = 8,000 requests/month.
- Synastry on demand: ~2,000 requests/month.
This sits between our Professional tier ($37/mo, 55,000 requests) and Business tier ($99/mo, 220,000 requests). At 10k users you want Business — both for the volume headroom and the additional endpoint coverage (Business unlocks all premium endpoints).
Where MCP fits in
When MCP makes sense:
- Personal-use chatbots inside Claude Desktop.
- Custom GPTs distributed through the OpenAI GPT store.
- Internal team tools where your dev team uses Claude as the UI.
When MCP does not make sense:
- Standalone consumer mobile app with your own UI.
- Discord or Slack bot (you still need your own host).
- Web chat where you need fine-grained control over prompts.
For consumer apps with your own UI, the direct API integration pattern shown above is the right call.
What can go wrong
- Context window blow-out. If you naively dump the full natal chart JSON into the prompt, you hit token limits fast and pay extra inference cost per turn. The
/p/astrology-chat-api/contextendpoint pre-summarizes specifically to avoid this. Use it. - LLM invents placements. Always include an explicit instruction in the system prompt to use only the provided data. Add a fallback: if the user asks about something not in the provided context, the bot should say "I don't have that data on hand" rather than guess.
- Conversation drift. Conversation transcripts grow unbounded. Truncate to the last 10 turns or use sliding-window summarization to keep token costs predictable.
- Daily transit cache miss spikes. If your cache evicts at 03:00 UTC for all users at once, you get a thundering herd against the API at peak. Stagger cache TTLs by user-hash modulo a few hours to spread the load.
Internal links
- /p/astrology-chat-api — LLM-shaped chart and transit summary endpoint.
- /p/personalized-horoscopes-api — daily personalized horoscope content.
- /p/natal-api — birth chart foundation, cached per user.
- /p/tarot-api — tarot draws as a chatbot feature.
- /p/synastry-api — compatibility between two users.
- /p/mcp-astrology — Model Context Protocol server for Claude Desktop and custom GPTs.
- /pricing — request tier breakdown.
Wrap-up
10,000 chatbot users runs comfortably on our Business tier ($99/mo, 220,000 requests) with the right caching. The expensive part is LLM inference, not our API. Cache natal data forever, cache daily transit summaries daily, and use the dedicated chat-API endpoint to stay inside LLM token budgets. Skip the integration layer entirely if you're building inside Claude Desktop — MCP makes the bot work out of the box.
The free tier (50 requests/month) is enough to prototype the conversation flow with a single test user before committing to a paid plan.



