Build a Podcast App with AI — Transcripts + AI Highlights

Native podcast app with AI help. React Native + expo-av + transcript + AI highlights. Real build time from a developer who shipped 4 apps in a month.

Build/Build a Podcast App with AI — Transcripts + AI Highlights
podcast~4 weekends

Build a Podcast App with AI — Transcripts + AI Highlights

Native podcast app with AI. Background audio, cached ASR transcripts, AI highlights. Real build time from a developer who shipped 4 apps in one month.

Stack highlights

React Native + Exporeact-native-track-playerGlobal ASR cacheEpisode-hash invalidationAI highlights

Why a podcast app needs ASR before it needs an algorithm

The AI feature that matters in a podcast app is not recommendations. It is the transcript. A searchable transcript with AI-generated chapter highlights turns a 60-minute episode into a 2-minute skim, and that is the feature people pay for. Everything else is a standard audio player.

I have shipped audio playback inside a meditation-adjacent experiment and the background audio plumbing is shared. ASR is the new piece, and it is the cost center.

What you actually need to build

  • Podcast feed parsing: Import via RSS. Same RSS story as a news reader, different endpoint.
  • Audio player: Play, pause, seek, speed, background mode, lock-screen controls.
  • Transcript panel: Show transcript synced with playback. Tap a line to seek.
  • AI highlights: 5 bullet-point highlights per episode, generated once from the transcript, cached globally per episode.
  • Download for offline: Users listen on planes and subway. Background download.
  • Subscribe + new episode push: One push per subscribed show when a new episode drops.

Do not build a discovery algorithm in v1. A search bar and a "trending" list from episode counts beats a homegrown recommender.

The stack I use

  • React Native + Expo.
  • react-native-track-player for audio playback with full lock-screen controls.
  • Backend RSS ingest + audio URL extraction.
  • ASR: Whisper API (OpenAI) or a cheaper provider. One call per episode, globally cached. Batch so that any repeat request hits cache, not the ASR.
  • NestJS — transcript endpoint, highlights endpoint. Both cached on episode_hash.
  • Claude Code + 11 AI agents — scaffold the player, transcript, and highlights screens.

Real build time

With the boilerplate, 4 weekends.

  • RSS ingest + episodes table: ~6 hours.
  • Player + background audio + lock screen: ~10 hours.
  • Transcript panel + tap-to-seek: ~6 hours.
  • ASR pipeline + highlights generation + cache: ~8 hours.
  • Download-for-offline: ~4 hours.
  • Store submission: ~4 hours.

About 38 hours.

Where people get stuck

  • ASR cost runaway: Transcribing every episode on demand scales with DAU. Batch ASR calls so the same episode is only transcribed once globally. At Whisper prices that is roughly $0.36 per hour of audio, paid once per episode, not per listener. Cache hit ratio on popular shows trends toward 99%.
  • Lock-screen controls showing nothing: If you use expo-av alone you get barebones lock-screen controls. react-native-track-player gives you proper artwork, title, and scrubbing.
  • Transcript drift after edits: Podcasters re-upload edited episodes with the same URL. Store a hash of the audio file and invalidate the transcript when the hash changes.

Skip the setup

RSS ingest, player + lock-screen scaffold, ASR pipeline with global cache, transcript and highlights endpoints — pre-wired. The 11 AI agents scaffold the listening and transcript screens.

See pricing

Skip the setup. Start shipping.

Every piece of the stack above is pre-configured in Shippen. 11 AI agents scaffold the rest.

ShippenBuildBuild a Podcast App with AI — Transcripts + AI Highlights