Skip to main contentSkip to navigationSkip to footer
Back to Blog
StreamingLLMReact NativeSSEOpenAIClaudeReal-Time AI

Streaming LLM Responses in React Native: The Complete Guide

Implement real-time streaming AI responses in React Native, token by token, like ChatGPT. Covers fetch streaming, SSE, OpenAI + Claude APIs, and smooth word-by-word rendering.

Streaming LLM responses in React Native mobile apps
12 min read
React NativeAI Mobile

Why Streaming Is Non-Negotiable for AI UX

Without streaming, a 500-token AI response takes 3–8 seconds of blank screen before anything appears. With streaming, the user sees the first word within 200ms and reads along as the response generates. This single change increases user satisfaction scores by 40–60% in every AI app we've shipped.

ChatGPT, Claude, every production AI product streams. Users now expect it. Non-streaming AI feels broken by comparison. The web implementation is well-documented, but React Native has specific gotchas that catch most people off-guard the first time.

The React Native Streaming Challenge

Web apps handle streaming easily via the browser's native EventSource API or the Fetch API's ReadableStream. React Native has neither by default:

  • No native EventSource: React Native's JS environment doesn't include the browser's SSE API, you need a polyfill or a different approach
  • Fetch streaming support: React Native 0.72+ supports the Fetch API's response.body as a ReadableStream, this is the recommended approach
  • Text decoding: You need TextDecoder to convert byte chunks to strings, available in React Native 0.70+ without polyfill

React Native's Fetch streaming works well in practice. It's the recommended path. Here's how to set it up:

Core Streaming Hook (OpenAI)

This hook handles everything: streaming fetch, token parsing, state management, and cancellation:

typescript
import { useState, useCallback, useRef } from 'react'

const OPENAI_API_URL = 'https://api.openai.com/v1/chat/completions'
const OPENAI_API_KEY = process.env.EXPO_PUBLIC_OPENAI_API_KEY!

type Message = { role: 'user' | 'assistant' | 'system'; content: string }

export function useStreamingChat() {
  const [messages, setMessages] = useState<Message[]>([])
  const [streamingContent, setStreamingContent] = useState('')
  const [isStreaming, setIsStreaming] = useState(false)
  const abortControllerRef = useRef<AbortController | null>(null)

  const sendMessage = useCallback(async (userMessage: string) => {
    const newMessages: Message[] = [
      ...messages,
      { role: 'user', content: userMessage },
    ]
    setMessages(newMessages)
    setStreamingContent('')
    setIsStreaming(true)

    // Allow cancellation
    abortControllerRef.current = new AbortController()

    try {
      const response = await fetch(OPENAI_API_URL, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${OPENAI_API_KEY}`,
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini',
          messages: newMessages,
          stream: true,
          max_tokens: 1000,
        }),
        signal: abortControllerRef.current.signal,
      })

      if (!response.ok) throw new Error(`HTTP ${response.status}`)
      if (!response.body) throw new Error('No response body')

      const reader = response.body.getReader()
      const decoder = new TextDecoder('utf-8')
      let fullContent = ''

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })
        const lines = chunk.split('\n')

        for (const line of lines) {
          const trimmed = line.trim()
          if (!trimmed || !trimmed.startsWith('data: ')) continue
          const data = trimmed.slice(6) // Remove "data: "
          if (data === '[DONE]') break

          try {
            const parsed = JSON.parse(data)
            const delta = parsed.choices?.[0]?.delta?.content
            if (delta) {
              fullContent += delta
              setStreamingContent(fullContent)
            }
          } catch {
            // Malformed chunk, skip it
          }
        }
      }

      // Commit the full assistant message to history
      setMessages(prev => [
        ...prev,
        { role: 'assistant', content: fullContent },
      ])
      setStreamingContent('')
    } catch (err: unknown) {
      if (err instanceof Error && err.name !== 'AbortError') {
        console.error('Streaming error:', err)
      }
    } finally {
      setIsStreaming(false)
    }
  }, [messages])

  const cancelStream = useCallback(() => {
    abortControllerRef.current?.abort()
  }, [])

  return { messages, streamingContent, isStreaming, sendMessage, cancelStream }
}

Streaming with Anthropic Claude

Claude uses a slightly different SSE format. Here's the parser for Claude's streaming API:

typescript
async function streamClaude(
  userMessage: string,
  onToken: (token: string) => void
) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.EXPO_PUBLIC_ANTHROPIC_KEY!,
      'anthropic-version': '2023-06-01',
    },
    body: JSON.stringify({
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 1024,
      stream: true,
      messages: [{ role: 'user', content: userMessage }],
    }),
  })

  if (!response.body) return
  const reader = response.body.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value, { stream: true })
    const lines = chunk.split('\n').filter(l => l.startsWith('data: '))

    for (const line of lines) {
      try {
        const event = JSON.parse(line.slice(6))
        if (event.type === 'content_block_delta') {
          const token = event.delta?.text
          if (token) onToken(token)
        }
      } catch {
        // Skip
      }
    }
  }
}

Building the Streaming Chat UI

The UI component that renders the streaming response. Key detail: use ScrollView with auto-scroll so the user always sees the latest tokens:

tsx
import React, { useRef, useEffect } from 'react'
import { View, Text, TextInput, Pressable, ScrollView, ActivityIndicator } from 'react-native'
import { useStreamingChat } from './useStreamingChat'

export default function StreamingChat() {
  const { messages, streamingContent, isStreaming, sendMessage, cancelStream } = useStreamingChat()
  const [input, setInput] = React.useState('')
  const scrollRef = useRef<ScrollView>(null)

  // Auto-scroll as tokens arrive
  useEffect(() => {
    scrollRef.current?.scrollToEnd({ animated: true })
  }, [streamingContent, messages.length])

  const handleSend = () => {
    if (!input.trim() || isStreaming) return
    sendMessage(input.trim())
    setInput('')
  }

  return (
    <View style={{ flex: 1, backgroundColor: '#0f0f0f' }}>
      <ScrollView
        ref={scrollRef}
        style={{ flex: 1, padding: 16 }}
        onContentSizeChange={() => scrollRef.current?.scrollToEnd({ animated: true })}
      >
        {messages.map((msg, i) => (
          <View key={i} style={{
            alignSelf: msg.role === 'user' ? 'flex-end' : 'flex-start',
            maxWidth: '80%',
            backgroundColor: msg.role === 'user' ? '#6366f1' : '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>{msg.content}</Text>
          </View>
        ))}

        {/* Streaming message bubble */}
        {isStreaming && streamingContent ? (
          <View style={{
            alignSelf: 'flex-start',
            maxWidth: '80%',
            backgroundColor: '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>
              {streamingContent}
              {/* Blinking cursor */}
              <Text style={{ color: '#6366f1' }}></Text>
            </Text>
          </View>
        ) : isStreaming ? (
          <ActivityIndicator color="#6366f1" style={{ alignSelf: 'flex-start', margin: 8 }} />
        ) : null}
      </ScrollView>

      <View style={{ flexDirection: 'row', padding: 16, gap: 8, borderTopWidth: 1, borderTopColor: '#2a2a3e' }}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Message..."
          placeholderTextColor="#666"
          style={{ flex: 1, backgroundColor: '#1e1e2e', borderRadius: 24, paddingHorizontal: 16, paddingVertical: 10, color: '#fff' }}
          onSubmitEditing={handleSend}
          returnKeyType="send"
        />
        <Pressable
          onPress={isStreaming ? cancelStream : handleSend}
          style={{ backgroundColor: '#6366f1', borderRadius: 24, width: 44, alignItems: 'center', justifyContent: 'center' }}
        >
          <Text style={{ color: '#fff', fontSize: 16 }}>{isStreaming ? '' : ''}</Text>
        </Pressable>
      </View>
    </View>
  )
}

Rendering Optimization: Prevent Too Many Re-Renders

Streaming can trigger 10–30 state updates per second. Without optimization, this causes visible jank on mid-range Android devices. Here are three techniques:

1. Batch Tokens with a 16ms Flush Interval

Instead of calling setState for every token, accumulate them and flush every animation frame:

typescript
const tokenBuffer = useRef('')
const flushTimeout = useRef<ReturnType<typeof setTimeout>>()

const flushBuffer = () => {
  setStreamingContent(prev => prev + tokenBuffer.current)
  tokenBuffer.current = ''
}

// In your token handler:
const onToken = (token: string) => {
  tokenBuffer.current += token
  clearTimeout(flushTimeout.current)
  flushTimeout.current = setTimeout(flushBuffer, 16) // ~60fps
}

2. Memoize Previous Messages

Wrap the messages list render in React.memo and use a separate component for the streaming bubble. This way, only the streaming text re-renders during generation, not the full message list.

3. Use InteractionManager for Long Responses

Defer non-critical updates (saving to AsyncStorage, analytics) until after the stream completes using InteractionManager.runAfterInteractions.

Security: Never Expose API Keys in the App

Critical: Always proxy through your backend

Never call OpenAI or Claude directly from your React Native app with a hardcoded API key. API keys in mobile apps are trivially extractable from the app binary. Always route through your own API endpoint that authenticates the user, enforces rate limits, and injects the API key server-side. The code examples above assume you're calling your own API which then calls OpenAI.

Expert Implementation

Want a streaming AI chat that ships?

CasaInnov ships complete AI chat implementations with streaming, RAG, and real security in 2-week sprints.

Free 30-minute consultation
Custom solution proposal
No-obligation assessment

Trusted by 10+ companies | Free consultation | 100% confidential