Why an AI persona chat is a clean 2-weekend app

A single-persona AI chat app is the shortest path to a working LLM product on mobile. One screen, one API call per message, a cleanly defined system prompt. The twist that separates shipped apps from demos is streaming — without streaming, the first-word latency is 2-4 seconds and the app feels dead.

I built streaming chat inside one of my four apps. The server-sent events plumbing on React Native is the part that catches most builders off guard.

What you actually need to build

Single chat screen: Message list, input box, send button. No group chat, no attachments, no voice.
Streaming response: SSE or chunked HTTP from the backend to the app. Target first token in under 500ms so the UI feels alive on tap.
System prompt per persona: A hardcoded system prompt that defines the persona. Do not let users edit it — that is a jailbreak vector and a support headache.
Context window management: Trim history to the last 10-20 messages before every call. Anthropic and OpenAI will both charge you for context you forgot about.
Per-user daily message cap: 50 messages per day as a free tier. Enforced server-side, not in the app.
Conversation history: Saved per user so they see their chat on next launch.

No user-created personas, no memory system, no tools/function calling in v1. Ship it simple.

The stack I use

React Native + Expo.
Supabase — auth, conversations, messages. RLS on everything.
NestJS — streaming proxy to the LLM API. Streaming is the reason you need a backend at all; you cannot ship API keys to the client.
eventsource-polyfill or a fetch streaming reader for React Native SSE.
Claude Code + 11 AI agents — scaffold chat UI and streaming hook.

Real build time

With the boilerplate, 2 weekends.

Chat UI + message list: ~5 hours.
Streaming endpoint + SSE proxy: ~6 hours.
Context window trim + persona prompt: ~3 hours.
Daily cap + quota enforcement: ~3 hours.
Store submission + AI content disclosures: ~4 hours.

About 21 hours.

Where people get stuck

React Native fetch does not stream by default: You must use ReadableStream with a polyfill, or a dedicated library. Many builders spend half a day debugging why their stream arrives all at once at the end.
Context window overflow: Without trimming, a user who sends 200 messages sends your token count through the roof on message 201. Trim to the last 10-20 messages before every call and log the cut-off for debugging.
Apple review asks about content moderation: A free-form AI chat will get reviewed for unsafe output. Add a server-side content filter or at minimum pass through an OpenAI moderation check. Document it in the review notes so the reviewer does not have to guess.

Skip the setup

Streaming proxy, React Native streaming fetch hook, Supabase schema with RLS, daily quota, content filter hook — pre-wired. The 11 AI agents scaffold the chat screen from a prompt so you can spend your time on the persona.

See pricing

Build a Chat App with AI Persona — Streaming, Native, in 2 Weekends