CCLee / Blog / HTTP/1.1 Streaming Notes

1.
HTTP Chunked Transfer

1.1.
Mechanism

When the server omits Content-Length and sets Transfer-Encoding: chunked, it can send the response body in pieces:

1HTTP/1.1 200 OK
2Transfer-Encoding: chunked
3Content-Type: application/x-ndjson
4
51a\r\n
6{"type":"query","data":"..."}\n
7\r\n
82f\r\n
9{"type":"tags","data":[...]}\n
10\r\n
110\r\n        ← zero signals end of stream
12\r\n

The client processes each chunk as it arrives without waiting for the full body.

1.2.
Response Format: NDJSON (Newline-Delimited JSON)

Each chunk is one JSON object followed by \n. Simple to produce, simple to parse.

1{"type":"query","data":"what is websocket?"}\n
2{"type":"tags","data":["networking","web-socket"]}\n
3{"type":"titles","data":["Article A","Article B"]}\n
4{"type":"token","data":"Web"}\n
5{"type":"token","data":"sockets"}\n
6{"type":"done"}\n

Alternatively, SSE (text/event-stream) uses the same chunked mechanism but with a specific format (data: ...\n\n) and built-in reconnection logic. Plain NDJSON streaming is simpler when we don't need to reconnect.

2.
HTTP/1.1 Streaming from Backend

2.1.
Python / FastAPI

StreamingResponse sets Transfer-Encoding: chunked automatically. Each yield flushes one chunk immediately.

1from fastapi.responses import StreamingResponse
2import json
3
4@app.get("/articles/stream")
5async def answer_stream(question: str):
6    async def generate():
7        yield json.dumps({"type": "query", "data": "..."}) + "\n"
8        yield json.dumps({"type": "tags",  "data": [...]}) + "\n"
9
10        ## stream OpenAI tokens
11        stream = client.chat.completions.create(model=..., messages=..., stream=True)
12        for chunk in stream:
13            delta = chunk.choices[0].delta.content
14            if delta:
15                yield json.dumps({"type": "token", "data": delta}) + "\n"
16
17        yield json.dumps({"type": "done"}) + "\n"
18
19    return StreamingResponse(generate(), media_type="application/x-ndjson")

2.2.
Node.js / Express

res.write() sends a chunk immediately. res.end() closes the stream. Express uses chunked transfer automatically when res.write() is called before res.end().

1app.get("/articles/stream", async (req, res) => {
2    res.setHeader("Content-Type", "application/x-ndjson");
3
4    res.write(JSON.stringify({ type: "query", data: query }) + "\n");
5    res.write(JSON.stringify({ type: "tags",  data: tags  }) + "\n");
6
7    const stream = await openai.chat.completions.create({ stream: true, ... });
8    for await (const chunk of stream) {
9        const delta = chunk.choices[0]?.delta?.content;
10        if (delta) res.write(JSON.stringify({ type: "token", data: delta }) + "\n");
11    }
12
13    res.write(JSON.stringify({ type: "done" }) + "\n");
14    res.end();
15});

3.
Receive HTTP/1.1 (NDJSON) Stream from Frontend

3.1.
Client: `fetch` (Browser / Node.js)

fetch exposes the response body as a ReadableStream. Read chunks with .getReader().

1const response = await fetch("/articles/stream?question=...");
2const reader = response.body.getReader();
3const decoder = new TextDecoder();
4let buffer = "";
5
6try {
7    while (true) {
8        const { done, value } = await reader.read();
9        if (done) break;
10
11        buffer += decoder.decode(value);
12        const lines = buffer.split("\n");
13        buffer = lines.pop(); // keep incomplete last line in buffer
14
15        for (const line of lines) {
16            if (!line.trim()) continue;
17            const event = JSON.parse(line);
18
19            if (event.type === "query")  setQuery(event.data);
20            if (event.type === "tags")   setTags(event.data);
21            if (event.type === "titles") setTitles(event.data);
22            if (event.type === "token")  setAnswer(prev => prev + event.data);
23        }
24    }
25} catch (err) {
26    // Node's undici throws UND_ERR_SOCKET when server closes connection after stream ends
27    if (err?.cause?.code === "UND_ERR_SOCKET") {
28        // not a real error — stream was fully consumed
29    } else {
30        throw err;
31    }
32}

Remark (Why buffer splitting matters). A single read() call may contain multiple JSON lines, or a line may be split across two read() calls. Always accumulate into a buffer and split on \n.

3.2.
Client: `axios` (Browser only)

axios has limited streaming support via onDownloadProgress, note that it gives the cumulative text so far, not just the new chunk.

1await axios.get("/articles/stream", {
2    params: { question },
3    responseType: "text",
4    onDownloadProgress: (e) => {
5        const fullText = e.event.target.responseText; // cumulative
6        // parse lines from fullText
7    }
8});

In comparison, the native fetch for streaming gives true chunk-by-chunk control.

4.
Lambda Streaming (AWS)

Standard Lambda + API Gateway always buffers to produce a full response, chunked encoding is stripped at the gateway.

To stream from Lambda we must:

Use Lambda Function URL (not API Gateway) with InvokeMode: RESPONSE_STREAM instead of using API gateway
Python may need to remove Mangum (it's a buffered adapter) or use Mangum v0.17+ streaming mode.
Make sure to remove all authentication type
Create inline policies to allow public access

The function URL looks like: https://<id>.lambda-url.<region>.on.aws/

5.
Comparison: SSE vs WebSocket vs Chunked HTTP

	Chunked HTTP	SSE	WebSocket
Direction	Server → Client only	Server → Client only	Bidirectional
Protocol	Plain HTTP	HTTP (`text/event-stream`)	Protocol upgrade
Reconnect	None	Built-in	Manual
Format	Any	`data: ...\n\n`	Any
Complexity	Lowest	Low	Higher
Use case	One-shot stream (RAG answer)	Live feeds, notifications	Chat, real-time collab

Contents

Contents

1.
HTTP Chunked Transfer

1.1.
Mechanism

1.2.
Response Format: NDJSON (Newline-Delimited JSON)

2.
HTTP/1.1 Streaming from Backend

2.1.
Python / FastAPI

2.2.
Node.js / Express

3.
Receive HTTP/1.1 (NDJSON) Stream from Frontend

3.1.
Client: `fetch` (Browser / Node.js)

3.2.
Client: `axios` (Browser only)

4.
Lambda Streaming (AWS)

5.
Comparison: SSE vs WebSocket vs Chunked HTTP

Blog Explorer

Contents

Contents

1.HTTP Chunked Transfer

1.1.Mechanism

1.2.Response Format: NDJSON (Newline-Delimited JSON)

2.HTTP/1.1 Streaming from Backend

2.1.Python / FastAPI

2.2.Node.js / Express

3.Receive HTTP/1.1 (NDJSON) Stream from Frontend

3.1.Client: fetch (Browser / Node.js)

3.2.Client: axios (Browser only)

4.Lambda Streaming (AWS)

5.Comparison: SSE vs WebSocket vs Chunked HTTP

Blog Explorer

1.
HTTP Chunked Transfer

1.1.
Mechanism

1.2.
Response Format: NDJSON (Newline-Delimited JSON)

2.
HTTP/1.1 Streaming from Backend

2.1.
Python / FastAPI

2.2.
Node.js / Express

3.
Receive HTTP/1.1 (NDJSON) Stream from Frontend

3.1.
Client: `fetch` (Browser / Node.js)

3.2.
Client: `axios` (Browser only)

4.
Lambda Streaming (AWS)

5.
Comparison: SSE vs WebSocket vs Chunked HTTP