Suraj TC - surajtc.dev

I was zen-coding my AI project until I hit a wall. I wanted to stream the LLM response from a FastAPI backend to a React frontend. It took me hours to find a proper solution. I’m writing this in case someone else ends up stuck in the same boat—hopefully, this saves you some time.

What is HTTP Streaming?

HTTP streaming involves sending data in small, sequential chunks over a standard HTTP response. No, this is not a WebSocket connection. The transaction is one-way: the server sends the data, and the client consumes it.

FastAPI Endpoint for Streaming Text Data

Here’s a FastAPI endpoint that streams text data (Mocking the LLM Response)

async def number_generator():
    counter = 0
    while True:
        await asyncio.sleep(0.1)
        data = json.dumps({"number": counter})
        yield f"data: {data}\n\n".encode("utf-8")
        counter += 1
        if counter == 10:
            break
 
@router.get("/stream")
async def stream_numbers():
    return StreamingResponse(
        number_generator(),
        media_type="text/event-stream",
        headers={
            "Connection": "keep-alive",
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
        },
    )

This endpoint uses StreamingResponse to send data chunks in real-time. The number_generator function simulates an ongoing stream of JSON data.

Consuming the Stream on Client Side

The client can consume this stream using an EventSource or by directly reading the response body with a ReadableStream.

const res = await fetch(`/stream`, {
  headers: {
    Accept: "text/event-stream",
  },
});
if (!res.body) throw new Error("No response body");
 
const reader = res.body.getReader();
const decoder = new TextDecoder();
let done = false;
 
while (!done) {
  const { value, done: readerDone } = await reader.read();
  done = readerDone;
 
  if (value) {
    const chunk = decoder.decode(value, { stream: true });
 
    const messages = chunk
      .split("\n\n")
      .filter((msg) => msg.startsWith("data:"));
 
    for (const message of messages) {
      const jsonData = message.slice(5).trim();
      try {
        const parsedData = JSON.parse(jsonData);
        const content = parsedData.message?.content || "";
        setMessageStream((prev) => prev + content);
      } catch (e) {
        console.error("Failed to parse JSON:", e);
      }
    }
  }
}

The code above fetches the /stream endpoint and reads the response body using a ReadableStream. It decodes each chunk, splits it into individual messages, and parses them as JSON.

Streamed event data isn’t inherently structured, but we can follow the Server-Sent Events (SSE) standard to send and parse JSON effectively. This ensures your client-side code can handle the incoming data consistently.

Streaming data from an HTTP endpoint

What is HTTP Streaming?

FastAPI Endpoint for Streaming Text Data

Consuming the Stream on Client Side