I was zen-coding my AI project until I hit a wall. I wanted to stream the LLM response from a FastAPI backend to a React frontend. It took me hours to find a proper solution. I’m writing this in case someone else ends up stuck in the same boat—hopefully, this saves you some time.
What is HTTP Streaming?
HTTP streaming involves sending data in small, sequential chunks over a standard HTTP response. No, this is not a WebSocket connection. The transaction is one-way: the server sends the data, and the client consumes it.
FastAPI Endpoint for Streaming Text Data
Here’s a FastAPI endpoint that streams text data (Mocking the LLM Response)
This endpoint uses StreamingResponse
to send data chunks in real-time. The number_generator
function simulates an ongoing stream of JSON data.
Consuming the Stream on Client Side
The client can consume this stream using an EventSource
or by directly reading the response body with a ReadableStream
.
The code above fetches the /stream
endpoint and reads the response body using a ReadableStream
. It decodes each chunk, splits it into individual messages, and parses them as JSON.
Streamed event data isn’t inherently structured, but we can follow the Server-Sent Events (SSE) standard to send and parse JSON effectively. This ensures your client-side code can handle the incoming data consistently.