Why Streaming Is Non-Negotiable for AI UX
Without streaming, a 500-token AI response takes 3–8 seconds of blank screen before anything appears. With streaming, the user sees the first word within 200ms and reads along as the response generates. This single change increases user satisfaction scores by 40–60% in every AI app we've shipped.
ChatGPT, Claude, every production AI product streams. Users now expect it. Non-streaming AI feels broken by comparison. The web implementation is well-documented, but React Native has specific gotchas that catch most people off-guard the first time.
The React Native Streaming Challenge
Web apps handle streaming easily via the browser's native EventSource API or the Fetch API's ReadableStream. React Native has neither by default:
- No native EventSource: React Native's JS environment doesn't include the browser's SSE API, you need a polyfill or a different approach
- Fetch streaming support: React Native 0.72+ supports the Fetch API's
response.bodyas a ReadableStream, this is the recommended approach - Text decoding: You need
TextDecoderto convert byte chunks to strings, available in React Native 0.70+ without polyfill
React Native's Fetch streaming works well in practice. It's the recommended path. Here's how to set it up:
Core Streaming Hook (OpenAI)
This hook handles everything: streaming fetch, token parsing, state management, and cancellation:
import { useState, useCallback, useRef } from 'react'
const OPENAI_API_URL = 'https://api.openai.com/v1/chat/completions'
const OPENAI_API_KEY = process.env.EXPO_PUBLIC_OPENAI_API_KEY!
type Message = { role: 'user' | 'assistant' | 'system'; content: string }
export function useStreamingChat() {
const [messages, setMessages] = useState<Message[]>([])
const [streamingContent, setStreamingContent] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef<AbortController | null>(null)
const sendMessage = useCallback(async (userMessage: string) => {
const newMessages: Message[] = [
...messages,
{ role: 'user', content: userMessage },
]
setMessages(newMessages)
setStreamingContent('')
setIsStreaming(true)
// Allow cancellation
abortControllerRef.current = new AbortController()
try {
const response = await fetch(OPENAI_API_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: newMessages,
stream: true,
max_tokens: 1000,
}),
signal: abortControllerRef.current.signal,
})
if (!response.ok) throw new Error(`HTTP ${response.status}`)
if (!response.body) throw new Error('No response body')
const reader = response.body.getReader()
const decoder = new TextDecoder('utf-8')
let fullContent = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
const lines = chunk.split('\n')
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || !trimmed.startsWith('data: ')) continue
const data = trimmed.slice(6) // Remove "data: "
if (data === '[DONE]') break
try {
const parsed = JSON.parse(data)
const delta = parsed.choices?.[0]?.delta?.content
if (delta) {
fullContent += delta
setStreamingContent(fullContent)
}
} catch {
// Malformed chunk, skip it
}
}
}
// Commit the full assistant message to history
setMessages(prev => [
...prev,
{ role: 'assistant', content: fullContent },
])
setStreamingContent('')
} catch (err: unknown) {
if (err instanceof Error && err.name !== 'AbortError') {
console.error('Streaming error:', err)
}
} finally {
setIsStreaming(false)
}
}, [messages])
const cancelStream = useCallback(() => {
abortControllerRef.current?.abort()
}, [])
return { messages, streamingContent, isStreaming, sendMessage, cancelStream }
}Streaming with Anthropic Claude
Claude uses a slightly different SSE format. Here's the parser for Claude's streaming API:
async function streamClaude(
userMessage: string,
onToken: (token: string) => void
) {
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.EXPO_PUBLIC_ANTHROPIC_KEY!,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'claude-3-5-haiku-20241022',
max_tokens: 1024,
stream: true,
messages: [{ role: 'user', content: userMessage }],
}),
})
if (!response.body) return
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
const lines = chunk.split('\n').filter(l => l.startsWith('data: '))
for (const line of lines) {
try {
const event = JSON.parse(line.slice(6))
if (event.type === 'content_block_delta') {
const token = event.delta?.text
if (token) onToken(token)
}
} catch {
// Skip
}
}
}
}Building the Streaming Chat UI
The UI component that renders the streaming response. Key detail: use ScrollView with auto-scroll so the user always sees the latest tokens:
import React, { useRef, useEffect } from 'react'
import { View, Text, TextInput, Pressable, ScrollView, ActivityIndicator } from 'react-native'
import { useStreamingChat } from './useStreamingChat'
export default function StreamingChat() {
const { messages, streamingContent, isStreaming, sendMessage, cancelStream } = useStreamingChat()
const [input, setInput] = React.useState('')
const scrollRef = useRef<ScrollView>(null)
// Auto-scroll as tokens arrive
useEffect(() => {
scrollRef.current?.scrollToEnd({ animated: true })
}, [streamingContent, messages.length])
const handleSend = () => {
if (!input.trim() || isStreaming) return
sendMessage(input.trim())
setInput('')
}
return (
<View style={{ flex: 1, backgroundColor: '#0f0f0f' }}>
<ScrollView
ref={scrollRef}
style={{ flex: 1, padding: 16 }}
onContentSizeChange={() => scrollRef.current?.scrollToEnd({ animated: true })}
>
{messages.map((msg, i) => (
<View key={i} style={{
alignSelf: msg.role === 'user' ? 'flex-end' : 'flex-start',
maxWidth: '80%',
backgroundColor: msg.role === 'user' ? '#6366f1' : '#1e1e2e',
borderRadius: 16,
padding: 12,
marginBottom: 8,
}}>
<Text style={{ color: '#fff', lineHeight: 22 }}>{msg.content}</Text>
</View>
))}
{/* Streaming message bubble */}
{isStreaming && streamingContent ? (
<View style={{
alignSelf: 'flex-start',
maxWidth: '80%',
backgroundColor: '#1e1e2e',
borderRadius: 16,
padding: 12,
marginBottom: 8,
}}>
<Text style={{ color: '#fff', lineHeight: 22 }}>
{streamingContent}
{/* Blinking cursor */}
<Text style={{ color: '#6366f1' }}></Text>
</Text>
</View>
) : isStreaming ? (
<ActivityIndicator color="#6366f1" style={{ alignSelf: 'flex-start', margin: 8 }} />
) : null}
</ScrollView>
<View style={{ flexDirection: 'row', padding: 16, gap: 8, borderTopWidth: 1, borderTopColor: '#2a2a3e' }}>
<TextInput
value={input}
onChangeText={setInput}
placeholder="Message..."
placeholderTextColor="#666"
style={{ flex: 1, backgroundColor: '#1e1e2e', borderRadius: 24, paddingHorizontal: 16, paddingVertical: 10, color: '#fff' }}
onSubmitEditing={handleSend}
returnKeyType="send"
/>
<Pressable
onPress={isStreaming ? cancelStream : handleSend}
style={{ backgroundColor: '#6366f1', borderRadius: 24, width: 44, alignItems: 'center', justifyContent: 'center' }}
>
<Text style={{ color: '#fff', fontSize: 16 }}>{isStreaming ? '' : ''}</Text>
</Pressable>
</View>
</View>
)
}Rendering Optimization: Prevent Too Many Re-Renders
Streaming can trigger 10–30 state updates per second. Without optimization, this causes visible jank on mid-range Android devices. Here are three techniques:
1. Batch Tokens with a 16ms Flush Interval
Instead of calling setState for every token, accumulate them and flush every animation frame:
const tokenBuffer = useRef('')
const flushTimeout = useRef<ReturnType<typeof setTimeout>>()
const flushBuffer = () => {
setStreamingContent(prev => prev + tokenBuffer.current)
tokenBuffer.current = ''
}
// In your token handler:
const onToken = (token: string) => {
tokenBuffer.current += token
clearTimeout(flushTimeout.current)
flushTimeout.current = setTimeout(flushBuffer, 16) // ~60fps
}2. Memoize Previous Messages
Wrap the messages list render in React.memo and use a separate component for the streaming bubble. This way, only the streaming text re-renders during generation, not the full message list.
3. Use InteractionManager for Long Responses
Defer non-critical updates (saving to AsyncStorage, analytics) until after the stream completes using InteractionManager.runAfterInteractions.
Security: Never Expose API Keys in the App
Critical: Always proxy through your backend
Never call OpenAI or Claude directly from your React Native app with a hardcoded API key. API keys in mobile apps are trivially extractable from the app binary. Always route through your own API endpoint that authenticates the user, enforces rate limits, and injects the API key server-side. The code examples above assume you're calling your own API which then calls OpenAI.
Want a streaming AI chat that ships?
CasaInnov ships complete AI chat implementations with streaming, RAG, and real security in 2-week sprints.
Trusted by 10+ companies | Free consultation | 100% confidential