Streaming LLM Responses in React Native: The Complete Guide

Why Streaming Is Non-Negotiable for AI UX

Without streaming, a 500-token AI response takes 3–8 seconds of blank screen before anything appears. With streaming, the user sees the first word within 200ms and reads along as the response generates. This single change increases user satisfaction scores by 40–60% in every AI app we've shipped.

ChatGPT, Claude, every production AI product streams. Users now expect it. Non-streaming AI feels broken by comparison. The web implementation is well-documented, but React Native has specific gotchas that catch most people off-guard the first time.

The React Native Streaming Challenge

Web apps handle streaming easily via the browser's native EventSource API or the Fetch API's ReadableStream. React Native has neither by default:

No native EventSource: React Native's JS environment doesn't include the browser's SSE API, you need a polyfill or a different approach
Fetch streaming support: React Native 0.72+ supports the Fetch API's response.body as a ReadableStream, this is the recommended approach
Text decoding: You need TextDecoder to convert byte chunks to strings, available in React Native 0.70+ without polyfill

React Native's Fetch streaming works well in practice. It's the recommended path. Here's how to set it up:

Core Streaming Hook (OpenAI)

This hook handles everything: streaming fetch, token parsing, state management, and cancellation:

typescript

import { useState, useCallback, useRef } from 'react'

const OPENAI_API_URL = 'https://api.openai.com/v1/chat/completions'
const OPENAI_API_KEY = process.env.EXPO_PUBLIC_OPENAI_API_KEY!

type Message = { role: 'user' | 'assistant' | 'system'; content: string }

export function useStreamingChat() {
  const [messages, setMessages] = useState<Message[]>([])
  const [streamingContent, setStreamingContent] = useState('')
  const [isStreaming, setIsStreaming] = useState(false)
  const abortControllerRef = useRef<AbortController | null>(null)

  const sendMessage = useCallback(async (userMessage: string) => {
    const newMessages: Message[] = [
      ...messages,
      { role: 'user', content: userMessage },
    ]
    setMessages(newMessages)
    setStreamingContent('')
    setIsStreaming(true)

    // Allow cancellation
    abortControllerRef.current = new AbortController()

    try {
      const response = await fetch(OPENAI_API_URL, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${OPENAI_API_KEY}`,
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini',
          messages: newMessages,
          stream: true,
          max_tokens: 1000,
        }),
        signal: abortControllerRef.current.signal,
      })

      if (!response.ok) throw new Error(`HTTP ${response.status}`)
      if (!response.body) throw new Error('No response body')

      const reader = response.body.getReader()
      const decoder = new TextDecoder('utf-8')
      let fullContent = ''

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })
        const lines = chunk.split('\n')

        for (const line of lines) {
          const trimmed = line.trim()
          if (!trimmed || !trimmed.startsWith('data: ')) continue
          const data = trimmed.slice(6) // Remove "data: "
          if (data === '[DONE]') break

          try {
            const parsed = JSON.parse(data)
            const delta = parsed.choices?.[0]?.delta?.content
            if (delta) {
              fullContent += delta
              setStreamingContent(fullContent)
            }
          } catch {
            // Malformed chunk, skip it
          }
        }
      }

      // Commit the full assistant message to history
      setMessages(prev => [
        ...prev,
        { role: 'assistant', content: fullContent },
      ])
      setStreamingContent('')
    } catch (err: unknown) {
      if (err instanceof Error && err.name !== 'AbortError') {
        console.error('Streaming error:', err)
      }
    } finally {
      setIsStreaming(false)
    }
  }, [messages])

  const cancelStream = useCallback(() => {
    abortControllerRef.current?.abort()
  }, [])

  return { messages, streamingContent, isStreaming, sendMessage, cancelStream }
}

Streaming with Anthropic Claude

Claude uses a slightly different SSE format. Here's the parser for Claude's streaming API:

typescript

async function streamClaude(
  userMessage: string,
  onToken: (token: string) => void
) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.EXPO_PUBLIC_ANTHROPIC_KEY!,
      'anthropic-version': '2023-06-01',
    },
    body: JSON.stringify({
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 1024,
      stream: true,
      messages: [{ role: 'user', content: userMessage }],
    }),
  })

  if (!response.body) return
  const reader = response.body.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value, { stream: true })
    const lines = chunk.split('\n').filter(l => l.startsWith('data: '))

    for (const line of lines) {
      try {
        const event = JSON.parse(line.slice(6))
        if (event.type === 'content_block_delta') {
          const token = event.delta?.text
          if (token) onToken(token)
        }
      } catch {
        // Skip
      }
    }
  }
}

Building the Streaming Chat UI

The UI component that renders the streaming response. Key detail: use ScrollView with auto-scroll so the user always sees the latest tokens:

tsx

import React, { useRef, useEffect } from 'react'
import { View, Text, TextInput, Pressable, ScrollView, ActivityIndicator } from 'react-native'
import { useStreamingChat } from './useStreamingChat'

export default function StreamingChat() {
  const { messages, streamingContent, isStreaming, sendMessage, cancelStream } = useStreamingChat()
  const [input, setInput] = React.useState('')
  const scrollRef = useRef<ScrollView>(null)

  // Auto-scroll as tokens arrive
  useEffect(() => {
    scrollRef.current?.scrollToEnd({ animated: true })
  }, [streamingContent, messages.length])

  const handleSend = () => {
    if (!input.trim() || isStreaming) return
    sendMessage(input.trim())
    setInput('')
  }

  return (
    <View style={{ flex: 1, backgroundColor: '#0f0f0f' }}>
      <ScrollView
        ref={scrollRef}
        style={{ flex: 1, padding: 16 }}
        onContentSizeChange={() => scrollRef.current?.scrollToEnd({ animated: true })}
      >
        {messages.map((msg, i) => (
          <View key={i} style={{
            alignSelf: msg.role === 'user' ? 'flex-end' : 'flex-start',
            maxWidth: '80%',
            backgroundColor: msg.role === 'user' ? '#6366f1' : '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>{msg.content}</Text>
          </View>
        ))}

        {/* Streaming message bubble */}
        {isStreaming && streamingContent ? (
          <View style={{
            alignSelf: 'flex-start',
            maxWidth: '80%',
            backgroundColor: '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>
              {streamingContent}
              {/* Blinking cursor */}
              <Text style={{ color: '#6366f1' }}></Text>
            </Text>
          </View>
        ) : isStreaming ? (
          <ActivityIndicator color="#6366f1" style={{ alignSelf: 'flex-start', margin: 8 }} />
        ) : null}
      </ScrollView>

      <View style={{ flexDirection: 'row', padding: 16, gap: 8, borderTopWidth: 1, borderTopColor: '#2a2a3e' }}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Message..."
          placeholderTextColor="#666"
          style={{ flex: 1, backgroundColor: '#1e1e2e', borderRadius: 24, paddingHorizontal: 16, paddingVertical: 10, color: '#fff' }}
          onSubmitEditing={handleSend}
          returnKeyType="send"
        />
        <Pressable
          onPress={isStreaming ? cancelStream : handleSend}
          style={{ backgroundColor: '#6366f1', borderRadius: 24, width: 44, alignItems: 'center', justifyContent: 'center' }}
        >
          <Text style={{ color: '#fff', fontSize: 16 }}>{isStreaming ? '' : ''}</Text>
        </Pressable>
      </View>
    </View>
  )
}

Rendering Optimization: Prevent Too Many Re-Renders

Streaming can trigger 10–30 state updates per second. Without optimization, this causes visible jank on mid-range Android devices. Here are three techniques:

1. Batch Tokens with a 16ms Flush Interval

Instead of calling setState for every token, accumulate them and flush every animation frame:

typescript

const tokenBuffer = useRef('')
const flushTimeout = useRef<ReturnType<typeof setTimeout>>()

const flushBuffer = () => {
  setStreamingContent(prev => prev + tokenBuffer.current)
  tokenBuffer.current = ''
}

// In your token handler:
const onToken = (token: string) => {
  tokenBuffer.current += token
  clearTimeout(flushTimeout.current)
  flushTimeout.current = setTimeout(flushBuffer, 16) // ~60fps
}

2. Memoize Previous Messages

Wrap the messages list render in React.memo and use a separate component for the streaming bubble. This way, only the streaming text re-renders during generation, not the full message list.

3. Use InteractionManager for Long Responses

Defer non-critical updates (saving to AsyncStorage, analytics) until after the stream completes using InteractionManager.runAfterInteractions.

Security: Never Expose API Keys in the App

Critical: Always proxy through your backend

Never call OpenAI or Claude directly from your React Native app with a hardcoded API key. API keys in mobile apps are trivially extractable from the app binary. Always route through your own API endpoint that authenticates the user, enforces rate limits, and injects the API key server-side. The code examples above assume you're calling your own API which then calls OpenAI.

Hands-on help

Want a streaming AI chat that ships?

CasaInnov ships complete AI chat implementations with streaming, RAG, and real security in 2-week sprints.

Free 30-minute call

A clear plan for your project

No obligation either way

Explore AI Mobile Development Book a free call

Trusted by 10+ companies | Free first call | Kept confidential

Why Streaming Is Non-Negotiable for AI UX

The React Native Streaming Challenge

Web apps handle streaming easily via the browser's native EventSource API or the Fetch API's ReadableStream. React Native has neither by default:

No native EventSource: React Native's JS environment doesn't include the browser's SSE API, you need a polyfill or a different approach
Fetch streaming support: React Native 0.72+ supports the Fetch API's response.body as a ReadableStream, this is the recommended approach
Text decoding: You need TextDecoder to convert byte chunks to strings, available in React Native 0.70+ without polyfill

React Native's Fetch streaming works well in practice. It's the recommended path. Here's how to set it up:

Core Streaming Hook (OpenAI)

This hook handles everything: streaming fetch, token parsing, state management, and cancellation:

typescript

import { useState, useCallback, useRef } from 'react'

const OPENAI_API_URL = 'https://api.openai.com/v1/chat/completions'
const OPENAI_API_KEY = process.env.EXPO_PUBLIC_OPENAI_API_KEY!

type Message = { role: 'user' | 'assistant' | 'system'; content: string }

export function useStreamingChat() {
  const [messages, setMessages] = useState<Message[]>([])
  const [streamingContent, setStreamingContent] = useState('')
  const [isStreaming, setIsStreaming] = useState(false)
  const abortControllerRef = useRef<AbortController | null>(null)

  const sendMessage = useCallback(async (userMessage: string) => {
    const newMessages: Message[] = [
      ...messages,
      { role: 'user', content: userMessage },
    ]
    setMessages(newMessages)
    setStreamingContent('')
    setIsStreaming(true)

    // Allow cancellation
    abortControllerRef.current = new AbortController()

    try {
      const response = await fetch(OPENAI_API_URL, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${OPENAI_API_KEY}`,
        },
        body: JSON.stringify({
          model: 'gpt-4o-mini',
          messages: newMessages,
          stream: true,
          max_tokens: 1000,
        }),
        signal: abortControllerRef.current.signal,
      })

      if (!response.ok) throw new Error(`HTTP ${response.status}`)
      if (!response.body) throw new Error('No response body')

      const reader = response.body.getReader()
      const decoder = new TextDecoder('utf-8')
      let fullContent = ''

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })
        const lines = chunk.split('\n')

        for (const line of lines) {
          const trimmed = line.trim()
          if (!trimmed || !trimmed.startsWith('data: ')) continue
          const data = trimmed.slice(6) // Remove "data: "
          if (data === '[DONE]') break

          try {
            const parsed = JSON.parse(data)
            const delta = parsed.choices?.[0]?.delta?.content
            if (delta) {
              fullContent += delta
              setStreamingContent(fullContent)
            }
          } catch {
            // Malformed chunk, skip it
          }
        }
      }

      // Commit the full assistant message to history
      setMessages(prev => [
        ...prev,
        { role: 'assistant', content: fullContent },
      ])
      setStreamingContent('')
    } catch (err: unknown) {
      if (err instanceof Error && err.name !== 'AbortError') {
        console.error('Streaming error:', err)
      }
    } finally {
      setIsStreaming(false)
    }
  }, [messages])

  const cancelStream = useCallback(() => {
    abortControllerRef.current?.abort()
  }, [])

  return { messages, streamingContent, isStreaming, sendMessage, cancelStream }
}

Streaming with Anthropic Claude

Claude uses a slightly different SSE format. Here's the parser for Claude's streaming API:

typescript

async function streamClaude(
  userMessage: string,
  onToken: (token: string) => void
) {
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.EXPO_PUBLIC_ANTHROPIC_KEY!,
      'anthropic-version': '2023-06-01',
    },
    body: JSON.stringify({
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 1024,
      stream: true,
      messages: [{ role: 'user', content: userMessage }],
    }),
  })

  if (!response.body) return
  const reader = response.body.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value } = await reader.read()
    if (done) break

    const chunk = decoder.decode(value, { stream: true })
    const lines = chunk.split('\n').filter(l => l.startsWith('data: '))

    for (const line of lines) {
      try {
        const event = JSON.parse(line.slice(6))
        if (event.type === 'content_block_delta') {
          const token = event.delta?.text
          if (token) onToken(token)
        }
      } catch {
        // Skip
      }
    }
  }
}

Building the Streaming Chat UI

The UI component that renders the streaming response. Key detail: use ScrollView with auto-scroll so the user always sees the latest tokens:

tsx

import React, { useRef, useEffect } from 'react'
import { View, Text, TextInput, Pressable, ScrollView, ActivityIndicator } from 'react-native'
import { useStreamingChat } from './useStreamingChat'

export default function StreamingChat() {
  const { messages, streamingContent, isStreaming, sendMessage, cancelStream } = useStreamingChat()
  const [input, setInput] = React.useState('')
  const scrollRef = useRef<ScrollView>(null)

  // Auto-scroll as tokens arrive
  useEffect(() => {
    scrollRef.current?.scrollToEnd({ animated: true })
  }, [streamingContent, messages.length])

  const handleSend = () => {
    if (!input.trim() || isStreaming) return
    sendMessage(input.trim())
    setInput('')
  }

  return (
    <View style={{ flex: 1, backgroundColor: '#0f0f0f' }}>
      <ScrollView
        ref={scrollRef}
        style={{ flex: 1, padding: 16 }}
        onContentSizeChange={() => scrollRef.current?.scrollToEnd({ animated: true })}
      >
        {messages.map((msg, i) => (
          <View key={i} style={{
            alignSelf: msg.role === 'user' ? 'flex-end' : 'flex-start',
            maxWidth: '80%',
            backgroundColor: msg.role === 'user' ? '#6366f1' : '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>{msg.content}</Text>
          </View>
        ))}

        {/* Streaming message bubble */}
        {isStreaming && streamingContent ? (
          <View style={{
            alignSelf: 'flex-start',
            maxWidth: '80%',
            backgroundColor: '#1e1e2e',
            borderRadius: 16,
            padding: 12,
            marginBottom: 8,
          }}>
            <Text style={{ color: '#fff', lineHeight: 22 }}>
              {streamingContent}
              {/* Blinking cursor */}
              <Text style={{ color: '#6366f1' }}></Text>
            </Text>
          </View>
        ) : isStreaming ? (
          <ActivityIndicator color="#6366f1" style={{ alignSelf: 'flex-start', margin: 8 }} />
        ) : null}
      </ScrollView>

      <View style={{ flexDirection: 'row', padding: 16, gap: 8, borderTopWidth: 1, borderTopColor: '#2a2a3e' }}>
        <TextInput
          value={input}
          onChangeText={setInput}
          placeholder="Message..."
          placeholderTextColor="#666"
          style={{ flex: 1, backgroundColor: '#1e1e2e', borderRadius: 24, paddingHorizontal: 16, paddingVertical: 10, color: '#fff' }}
          onSubmitEditing={handleSend}
          returnKeyType="send"
        />
        <Pressable
          onPress={isStreaming ? cancelStream : handleSend}
          style={{ backgroundColor: '#6366f1', borderRadius: 24, width: 44, alignItems: 'center', justifyContent: 'center' }}
        >
          <Text style={{ color: '#fff', fontSize: 16 }}>{isStreaming ? '' : ''}</Text>
        </Pressable>
      </View>
    </View>
  )
}

Rendering Optimization: Prevent Too Many Re-Renders

Streaming can trigger 10–30 state updates per second. Without optimization, this causes visible jank on mid-range Android devices. Here are three techniques:

1. Batch Tokens with a 16ms Flush Interval

Instead of calling setState for every token, accumulate them and flush every animation frame:

typescript

const tokenBuffer = useRef('')
const flushTimeout = useRef<ReturnType<typeof setTimeout>>()

const flushBuffer = () => {
  setStreamingContent(prev => prev + tokenBuffer.current)
  tokenBuffer.current = ''
}

// In your token handler:
const onToken = (token: string) => {
  tokenBuffer.current += token
  clearTimeout(flushTimeout.current)
  flushTimeout.current = setTimeout(flushBuffer, 16) // ~60fps
}

2. Memoize Previous Messages

Wrap the messages list render in React.memo and use a separate component for the streaming bubble. This way, only the streaming text re-renders during generation, not the full message list.

3. Use InteractionManager for Long Responses

Defer non-critical updates (saving to AsyncStorage, analytics) until after the stream completes using InteractionManager.runAfterInteractions.

Security: Never Expose API Keys in the App

Critical: Always proxy through your backend

Hands-on help

Want a streaming AI chat that ships?

CasaInnov ships complete AI chat implementations with streaming, RAG, and real security in 2-week sprints.

Free 30-minute call

A clear plan for your project

No obligation either way

Explore AI Mobile Development Book a free call

Trusted by 10+ companies | Free first call | Kept confidential

Streaming LLM Responses in React Native: The Complete Guide

Why Streaming Is Non-Negotiable for AI UX

The React Native Streaming Challenge

Core Streaming Hook (OpenAI)

Streaming with Anthropic Claude

Building the Streaming Chat UI

Rendering Optimization: Prevent Too Many Re-Renders

1. Batch Tokens with a 16ms Flush Interval

2. Memoize Previous Messages

3. Use InteractionManager for Long Responses

Security: Never Expose API Keys in the App

Critical: Always proxy through your backend

Want a streaming AI chat that ships?

Loading...

Streaming LLM Responses in React Native: The Complete Guide

Why Streaming Is Non-Negotiable for AI UX

The React Native Streaming Challenge

Core Streaming Hook (OpenAI)

Streaming with Anthropic Claude

Building the Streaming Chat UI

Rendering Optimization: Prevent Too Many Re-Renders

1. Batch Tokens with a 16ms Flush Interval

2. Memoize Previous Messages

3. Use InteractionManager for Long Responses

Security: Never Expose API Keys in the App

Critical: Always proxy through your backend

Want a streaming AI chat that ships?