JSON River – Parse JSON Incrementally as It Streams In

Posted3 months agoActive3 months ago

rickcarlino

238 points

96 comments

github.comTechstoryHigh profile

calmpositive

Debate

40/100

JSON ParsingStreaming DataLarge Language Models

Key topics

JSON Parsing

Streaming Data

Large Language Models

JSON River is a JavaScript library that parses JSON incrementally as it streams in, useful for handling large or unbounded JSON data, with discussion around its use cases, design choices, and comparisons to other libraries.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

50m

Peak period

Day 6

Avg / period

13.7

Comment distribution96 data points

Loading chart...

Based on 96 loaded comments

Key moments

01Story posted
Oct 8, 2025 at 12:38 PM EDT
3 months ago
Step 01
02First comment
Oct 8, 2025 at 1:28 PM EDT
50m after posting
Step 02
03Peak activity
84 comments in Day 6
Hottest window of the conversation
Step 03
04Latest activity
Oct 22, 2025 at 10:01 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (96 comments)

Showing 96 comments

magicalhippo

3 months ago

1 reply

I wrote a more traditional JSON parser for my microcontooller project. You could iterate over elements and it would return "needs more data" if it was unable to proceed. You could then call it again after fetching more. Then just simple state machines to consume the objects.

The benefit with that was that you didn't need the memory to store the deserialized JSON object in memory.

This seems to be more oriented towards interactivity, which is an interesting use-case I hadn't thought about.

rickcarlinoAuthor

3 months ago

I found this because I am interested in streaming responses that populate a user interface quickly, or use spinners if it is loading still

alganet

3 months ago

3 replies

Interesting approach.

I would expect an object JSON stream to be more like a SAX parser though. It's familiar, fast and simple.

Any thougts on not chosing the SAX approach?

benatkin

3 months ago

1 reply

I think this is a lot like etree in python's streaming approach for XML, but with a simpler API, and incremental text parsing. With etree in python, you can access the incomplete tree data and not have to worry about events. So it's missing the SAX API part of a SAX approach, but is built like some real world libraries that use the SAX approach, which end up having a hybrid of events and trees.

alganet

3 months ago

1 reply

It seems to be convenient for some cases. A large object with many keys, for example.

I don't see it as particularly convenient if I want to stream a large array of small independent objects and read each one of them once, then discard it. The incremental parsed array would get bigger and bigger, eventually containing all the objects I wanted to discard. I would also need to move my array pointer to the last element at each increment.

jq and JSON.sh have similar incremental "mini-object-before-complete" approaches to parsing JSON. However, they do include some tools to shape those mini-objects (pruning, selecting, and so on). Also, they're tuned for pipes (new line is the event), which caters to shell and text-processing tools. I wonder what would be the analogue for that in a higher language.

benatkin

3 months ago

1 reply

This is more versatile than it seems at first glance. Under invariants, it shows that you have arrays/objects only being mutated, so you have stable references. You could use a WeakSet to observe new children of an item coming in. You also may not even need manage this directly - you could debounce and just re-render a UI component by returning a modified virtual DOM. Or if you had a visualization in d3, it would automatically notice which ones are new.

alganet

3 months ago

It does sound very practical indeed.

philbo

3 months ago

A nice thing about the SAX approach is it lets you layer other APIs on top too. I did something like that in BFJ:

https://www.npmjs.com/package/bfj

rictic

3 months ago

SAX is often better if you don't need the full final result, especially if you can throw away most of the data after it's been processed. The nice part about this API is that you just get a DeepPartial<FinalResult> so the code to handle a partial result is basically the same as the code to handle the final result.

codesnik

3 months ago

2 replies

I can't imagine a usecase. Ok, you receive incremental updates, which could be useful, but how to find out that json object is actually received in full already?

philipallstar

3 months ago

1 reply

When its closing brace or square bracket appears.

EDIT: this is totally wrong and the question is right.

rising-sky

3 months ago

Actually, not quite how this works. You always get valid JSON, as in this sequence from the readme:

```json {"name": "Al"} {"name": "Ale"} ```

So the braces are always closed

Supermancho

3 months ago

1 reply

When you want to pull multi-gig JSON files and not wait for the full file before processing is where I first used this.

rictic

3 months ago

Funnily enough, this was one of the first users of jsonriver at google. A team needed to parse more JSON than most JS VMs will allow you to fit into a single string, so they had no choice but to use a streaming parser.

florians

3 months ago

1 reply

Noteworthy: Contributions by Claude

rictic

3 months ago

Is true. I wrote a ton of tests, testing just about everything I can think of, including using a reverse parser I wrote to exhaustively generate the simplest 65k json values, ensuring that it succeeds with the same values and fails on the same cases as JSON.parse.

Then added benchmarks and started doing optimization, getting it ~10x faster than my initial naive implementation. Then I threw agents at it, and between Claude, Gemini, and Codex we were able to make it an additional 2x faster.

seanalltogether

3 months ago

4 replies

Maybe I'm wrong but it seems like you would only want to parse partial values for objects and arrays, but not strings or numbers. Objects and arrays can be unbounded so it makes sense to process what you can, when you can, whereas a string or number usually is not.

xg15

3 months ago

There is json that has very long string literals. Usually, it's either long-ish text or HTML content, or base64-encoded binary data.

So I'd definitely count strings as "unbounded" as well.

rictic

3 months ago

Numbers, booleans, and nulls are atomic with jsonriver, you get them all at once only when they're complete.

For my use case I wanted streaming parse of strings, I was rendering JSON produced by an LLM, for incrementally rendering a UI, and some of the strings were long enough (descriptions) that it was nice to see them render incrementally.

AaronFriel

3 months ago

If you're generating long reports, code, etc. with an LLM, partial strings matter quite a lot for user experience.

everforward

3 months ago

It could be useful if you're doing something with the string that operates sequentially anyways (i.e. block-by-block AES, or SHA sums).

I _think_ the intended use of this is for people with bad internet connections so your UI can show data that's already been received without waiting for a full response. I.e. if their connection is 1KB/s and you send an 8KB JSON blob that's mostly a single text field, you can show them the first kilobyte after a second rather than waiting 8 seconds to get the whole blob.

At first I thought maybe it was for handling gigantic JSON blobs that you don't want to entirely load into memory, but the API looks like it still loads the whole thing into memory.

holdenc137

3 months ago

3 replies

I don't get it (and I'd call this cumulative not incremental)

Why not at least wait until the key is complete - what's the use in a partial key?

simonw

3 months ago

2 replies

If you're building a UI that renders output from a streaming LLM you might get back something which looks like this:

  {"role": "assistant", "text": "Here's that Python code you aske

Incomplete parsing with incomplete strings is still useful in order to render that to your end user while it's still streaming in.

cozzyd

3 months ago

4 replies

incomplete strings could be fun in certain cases

{"cleanup_cmd":"rm -rf /home/foo/.tmp" }

rictic

3 months ago

1 reply

Yeah, another fun one is string enums. Could tread "DeleteIfEmpty" as "Delete".

Waterluvian

3 months ago

I imagine if you reason about incomplete strings as a sort of “unparsed data” where you might store or transport or render it raw (like a string version of printing response.data instead of response.json()), but not act on it (compare, concat, etc), it’s a reasonably safe model?

I’m imagining it in my mental model as being typed “unknown”. Anything that prevents accidental use as if it were a whole string… I imagine a more complex type with an “isComplete” flag of sorts would be more powerful but a bit of a blunderbuss.

sublee

3 months ago

1 reply

Incremental JSON parsing is key for LLM apps, but safe progressive UIs also need to track incompleteness and per-chunk diffs. LangDiff [1] would help with that.

[1]: https://github.com/globalaiplatform/langdiff/tree/main/ts

cozzyd

3 months ago

Why not just chunk the json packets instead?

stronglikedan

3 months ago

If any part of that value actually made it, unchecked, to execution, then you have bigger problems than partial JSON keys/values.

xg15

3 months ago

Only because you have access to the incomplete value doesn't mean you should treat it like the complete one...

trevor-e

3 months ago

In this example the value is incomplete, not the key.

xg15

3 months ago

1 reply

Doesn't it do exactly that?

> As a consequence of 1 and 5, we only add a property to an object once we have the entire key and enough of the value to know that value's type.

0x6c6f6c

3 months ago

1 reply

Their example in the README is extremely misleading then. It indicates your stream output is

name: A name: Al name: Ale name: Alex

Which would suggest you are getting unfinished strings out in the stream.

__jonas

3 months ago

How is it misleading? It shows that it gives back unfinished values but finished keys.

rictic

3 months ago

Cumulative is a good term too. I come from the browser world where it's typically called incremental parsing, e.g. when web browsers parse and render HTML as it streams in over the wire. I was doing the same thing with JSON from LLMs.

simonw

3 months ago

1 reply

If anyone needs to do this in Python I've had success with both ijson and jiter - notes here: https://til.simonwillison.net/json/ijson-stream and https://simonwillison.net/2024/Sep/22/jiter/

Tmpod

3 months ago

+1 for ijson. I wrote some pretty fast and lightweight parsers a while back, using ijson's basic stream. Never heard of jiter, thanks for the posts!

syx

3 months ago

2 replies

For those wondering about the use case, this is very useful when enabling streaming for structured output in LLM responses, such as JSON responses. For my local Raspberry Pi agent I needed something performant, I've been using streaming-json-js [1], but development appears to have been a bit dormant over the past year. I'll definitely take a look at your jsonriver and see how it compares!

[1] https://github.com/karminski/streaming-json-js

rokkamokka

3 months ago

2 replies

For LLMs I recommend just doing NDJSON, that is, newline delimited json. It's much simpler to implement

rictic

3 months ago

1 reply

Do any LLMs support constrained generation of newline delimited json? Or have you found that they're generally reliable enough that you don't need to do constrained sampling?

sprobertson

3 months ago

not for the standard hosted APIs using structured output or function calling, best you can get is an array

stevage

3 months ago

I love NDJSON in general. I use it a lot for spatial data processing (GDAL calls it GeoJsonSeq).

cjonas

3 months ago

Particularly for REACT style agents that use a "final" tool call to end the run.

zahlman

3 months ago

1 reply

> If you gave this to jsonriver one byte at a time it would yield this sequence of values:

Does it create a new value each time, or just mutate the existing one and keep yielding it?

rictic

3 months ago

It mutates the existing value and yields it again (unless the toplevel value is a string, because strings are immutable in JS).

rictic

3 months ago

2 replies

Hi HN! Didn't expect this to be on the front page today! I should really release all the optimizations that've been landing lately, the version on github is about twice as fast as what's released on npm.

I wrote it when I was doing prototyping on doing streaming rendering of UIs defined by JSON generated by LLMs. Using constrained generation you can essentially hand the model a JSON serializable type, and it will always give you back a value that obeys that type, but the big models are slow enough that incremental rendering makes a big difference in the UX.

I'm pretty proud of the testing that's gone into this project. It's fairly exhaustively tested. If you can find a value that it parses differently than JSON.parse, or a place where it disobeys the 5+1 invariants documented in the README I'd be impressed (and thankful!).

This API, where you get a series of partial values, is designed to be easy to render with any of the `UI = f(state)` libraries like React or Lit, though you may need to short circuit some memoization or early exiting since whenever possible jsonriver will mutate existing values rather than creating new ones.

stevage

3 months ago

2 replies

Suggestion: make it clearer in the readme what happens with malformed input.

I can imagine it being useful to have a made where you never emit strings until they are final, also. I don't entirely understand why strings are emitted incrementally but numbers aren't.

xp84

3 months ago

1 reply

Seems useful to me in the context of something like a progressively rendered UI. A large block of text appearing a few characters at a time would be fine, but a number that represents something like a display metric (say, a position, or font-size) going from 0 to 0.5 or from 1 to 1000, would result in goofy gyrations on-screen that don't make any sense. Or imagine if it was just fields in the app's data.

Name: John Smith. Birth Year: A.D. 1 [Customer is a Senior: 2,024 years old]

Name: John Smith. Birth year: A.D. 19 [Customer is a Senior: 2,006 years old]

Name: John Smith. Birth year: A.D. 199 [Customer is a Senior: 1,826 years old]

Name: John Smith. Birth year: 1997

tags2k

3 months ago

3 replies

If you're updating the UI every time you receive a single character from this library, you've got bigger problems than font size.

spankalee

3 months ago

If your UI layer can't efficiently update when you get new characters, you've got bigger problems than JSON parsing.

Seriously, you should be able to update the UI with a new character, and much more, at 60fps easily.

sysguest

3 months ago

hmm this makes sense for LLM usage

(but for other uses - nope)

xp84

3 months ago

Isn't that one of the main points of React and its ilk? The state is just a big JSON object, and sometimes you might be fetching a bunch of data that makes up that state, and streaming it in. If latency is high and volume of data is high, seems perfectly reasonable to get the UI rendering as the state comes in instead of waiting for the last byte to do anything.

For instance, imagine you don't fully control the backend to split up a large response into several smaller API calls, but you could render the top part of the UI, which may be the most useful part, from the first couple of keys in the JSON, while a large "transaction history" after that is still downloading.

rictic

3 months ago

Good feedback! Just updated the README with the following:

> The parse function also matches JSON.parse's behavior for invalid input. If the input stream cannot be parsed as the start of a valid JSON document, then parsing halts and an error is thrown. More precisely, the promise returned by the next method on the AsyncIterable rejects with an Error. Likewise if the input stream closes prematurely.

As for why strings are emitted incrementally, it's just that I was often dealing with long strings produced slowly by LLMs. JSON encoded numbers can be big in theory, but there's no practical reason to do so as almost everyone decodes them as 64bit floats.

rictic

3 months ago

1 reply

I've just published v1.0.1. It's about 2x faster, and should have no other observable changes. The speedup is mainly from avoiding allocation and string slicing as much as possible, plus an internal refactor to bind the parser and tokenizer more tightly together.

Previously the parser would get an array of tokens each time it pushed data into the tokenizer. This was easy to write, but it meant we needed to allocate token objects. Now the tokenizer has a reference to the parser and calls token-specific methods directly on it. Since most of the tokens carry no data, this keeps us from jumping all over the heap so much. If we were parsing a more complicated language this might become a huge pain in the butt, but JSON is simple enough, and the test suite is exhaustive enough, that we can afford a little nightmare spaghetti if it improves on speed.

Inviz

3 months ago

1 reply

I want to ditch stream-json so hard (needs polyfills in browser, cumbersome to use), but I need only one feature: invoke callback by path (e.g. `user.posts` need to invoke for each post in array) only for complete objects. Is this something that json river can support?

rictic

3 months ago

1 reply

jsonriver's invariants do give you enough info to notice which values are and aren't complete. They also mean that you can mutate the objects and arrays it returns to drop data that you don't care about.

There might be room for some helper functions in something like a 'jsonriver/helpers.js' module. I'll poke around at it.

Inviz

3 months ago

1 reply

Please consider it a feature request

rictic

3 months ago

For anyone else following along, see https://github.com/rictic/jsonriver/issues/39

carterschonwald

3 months ago

3 replies

Oh fun, I wrote a similar library in 2015 for Haskell. There is an annoying gotcha to deal with: there are sequences of valid characters that can be parsed incorrectly if you’re doing incremental chunks, namely if “0.0” is split across two input chunks you can get a token stream with two valid float literals rather than 1! Namely “0” and “.0”, which is just a really annoying wart of json float syntax.

tracnar

3 months ago

1 reply

Don't you need to wait for some kind of delimiter (like ",", "]", "}", newline, EOF) before parsing something else than a string?

rictic

3 months ago

1 reply

Only for numbers! Strings, objects, arrays, true, false, and null all have an unambiguous ending.

stefs

3 months ago

1 reply

but you don't do this for strings either, as shown in the examples - partial strings are pushed even though they're not yet ended:

    {"name": "Ale"}

rictic

3 months ago

Oh this isn't about the public API, it's about the internal logic of the parser.

rictic

3 months ago

Yeah, getting numbers correct was one of the trickier wrinkles in the project. https://github.com/rictic/jsonriver/blob/5515be978bb564e9bdc...

yonatan8070

3 months ago

An "off the top of my head" solution to this would be not to yield tokens until a terminating character (comma, \n, }).

eric-p7

3 months ago

2 replies

"has no dependencies, and uses only standard features of JavaScript so it works in any JS environment."

Then I see a Node style import and npm. When did Node/NPM stop being dependencies and become standardized by JavaScript? Where's my raw es6 module?

rictic

3 months ago

1 reply

Bare module specifiers aren't just for Node! Deno and browsers support import maps e.g.

The library doesn't use any APIs beyond those in the JS standard, so I'm pretty confident it will work everywhere, but happy to publish in more places and run more tests. Any in particular that you'd like to see?

o11c

3 months ago

1 reply

Mostly unrelated, but does anyone know how are you supposed to make path-less module specifiers work for Node if you are not using npm but rather system-installed JS packages (Debian etc. install node-* packages into /usr/share/nodejs/)? With `require` it just works, but with `import` it errors and suggests passing the absolute path (even though it clearly knows what path ...).

For some reason everybody in the JS world takes "download and execute random software from the Internet" as the only way to do things.

rictic

3 months ago

1 reply

Try import maps, something like:

    {
      "imports": {
        "express": "/usr/share/nodejs/express/index.js",
        "another-module": "/usr/share/nodejs/another-module/index.js"
      }
    }

Then run node like: `node --import-map=./import-map.json app.js`

The Debian approach of having global versions of libraries seems like it's solving a different problem than the ones I have. I want each application to track and version its own dependencies, so that upgrading a dependency for one doesn't break another, and so that I can go back to an old project and be reasonably confident it'll still work. That ultimately led me to nix.

o11c

3 months ago

I have a simpler solution to the latter problem: if upgrading a dependency package breaks anything (barring multi-year deprecation, limited-time experimental previews, etc.), I blacklist it and never install that package ever again. After all, they are clearly lacking on either their testing infrastructure or their development guidelines.

It's amazing how much the quality of installed software improves when you do this. Something our industry desperately needs.

jcla1

3 months ago

FWIW the import syntax is now part of standard JS, according to the ECMAScript 2026 specification:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

And node seems to be used only as a dev dependency, to test, benchmark and build/package the project. If you'd be inclined you can use the project's code as-is elsewhere, i.e. in the browser.

jlundberg

3 months ago

3 replies

I really like just encoding each object as JSON and then concatinating them with a new line between.

Allows parsing and streaming without any special libraries and allow for an unlimited amount of data (with objects being reasonably sized).

Usually gives these files the .jsonlines suffix when stored on disk.

Allows for batch process without requiring huge amounts of memory.

0x6c6f6c

3 months ago

1 reply

Based on this thread that's called NDJSON

Newline Delimited JSON

TIL

keitmo

3 months ago

1 reply

It's also known as JSONL (JSON Lines).

tracker1

3 months ago

I'm pretty sure jsonl was a bit earlier as a term, but ndjson is now the more prominent term used for this... been using this approach for years though, when I first started using Mongo/Elastic for denormalized data, I'd also backup that same data to S3 as .jsonl.gz Leaps and bounds better than XMl at least.

codesnik

3 months ago

jq allows to covert normal json document to jsonlines and back, though it does it much faster if it can slurp an original doc into memory (no --stream option)

kondro

3 months ago

Me too, and it's quite a common technique.

https://en.wikipedia.org/wiki/JSON_streaming

hk1337

3 months ago

1 reply

This looks really nice. I think I could find a use for this.

The title made me think of Star Trek DS9 and Nog talking about The Great Material Continuum.

“Nog: The river will provide”

sholladay

3 months ago

“Oh, that river. It can be very treacherous.” - Rom

ww520

3 months ago

This is nice to parse incomplete JSON as they come in.

I did something similar for streaming but built it with a streaming protocol at the frame level wrapping the JSON messages [1]. The streaming protocol has support for both the LF based scheme and the HTTP Content-Length header based scheme. It's for supporting MCP and LSP.

[1] https://github.com/williamw520/zigjr/?tab=readme-ov-file#str...

AaronFriel

3 months ago

Oh, this is quite similar to an online parser I'd written a few years ago[1]. I have some worked examples on how to use it with the now-standard Chat Completions API for LLMs to stream and filter structured outputs (aka JSON). This is the underlying technology for a "Copilot" or "AI" application I worked on in my last role.

Like yours, I'm sure, these incremental or online parser libraries are orders of magnitude faster[2] than alternatives for parsing LLM tool calls for the very simple reason that alternative approaches repeatedly parse the entire concatenated response, which requires buffering the entire payload, repeatedly allocating new objects, and for an N token response, you parse the first token N times! All of the "industry standard" approaches here are quadratic, which is going to scale quite poorly as LLMs generate larger and larger responses to meet application needs, and users want low latency outputs.

One of the most useful features of this approach is filtering LLM tool calls on the server and passing through a subset of the parse events to the client. This makes it relatively easy to put moderation, metadata capture, and other requirements in a single tool call, while still providing low latency streaming UI. It also avoids the problem with many moderation APIs where for cost or speed reasons, one might delegate to a smaller, cheaper model to generate output in a side-channel of the normal output stream. This not only doesn't scale, but it also means the more powerful model is unaware of these requirements, or you end up with a "flash of unapproved content" due to moderation delays, etc.

I found that it was extremely helpful to work at the level of parse events, but recognize that building partial values is also important, so I'm working on something similar in Rust[3], but taking a more holistic view and building more of an "AI SDK" akin to Vercel's, but written in Rust.

[1] https://github.com/aaronfriel/fn-stream

[2] https://github.com/vercel/ai/pull/1883

[3] https://github.com/aaronfriel/jsonmodem

(These are my own opinions, not those of my employer, etc. etc.)

rixed

3 months ago

So SAX, but for json?

Plus ça change, et plus c'est la même chose.

mattvr

3 months ago

You could also use JSON Merge Patch (RFC 7396) for a similar use case.

(The downside of JSON Merge Patch is it doesn't support concatenating string values, so you must send a value like `{"msg": "Hello World"}` as one message, you can't join `{"msg": "Hello"}` with `{"msg": " World")`.)

[1] https://github.com/pierreinglebert/json-merge-patch

hamburglar

3 months ago

If you get partial string values that are replaced with longer string values as they are streamed in, how do you know when the value is finished being read and is safe to use?

keleftheriou

3 months ago

Thanks for sharing!

Roughly how does it compare with https://github.com/promplate/partial-json-parser-js ?

Xmd5a

3 months ago

I wrote something similar that can also produce JSON incrementally from other streaming data sources. It combines a streaming JSON parser with streaming strings and a streaming regex engine.

Concretely, it means I can call an LLM, wrap its output stream in a streaming string, and treat it like a regular string. No need for print loops, it’s all handled behind the scenes. I can chain transformations (joining strings, splitting them with regexes, capturing substrings, etc.) and serialize the results into JSON progressively, building lazy sequences or maps on the fly.

The benefit is that I can start processing and emitting structured data immediately, without waiting for the LLM’s full response. Filtered output can be shown to users as it arrives, with near-zero added latency (aside from regex lookaheads).

chrissdot

3 months ago

This is awesome work! I had Claude port this to Python to consume LLM output in parallel for massive latency hiding.

Install with `uv install jsonriver`

https://github.com/chrisschnabl/jsonriver-py

EGreg

3 months ago

I recently also wrote a streaming JSON parser in PHP. In case anyone is interested, I would love to get your feedback. It’s designed to work independently or with the rest of our system.

https://github.com/Qbix/Platform/blob/main/platform/classes/...

zeroimpl

3 months ago

I couldn’t find a library like this in PHP, but realized for my use case I could easily hack something together. Algorithm is simply:

- trim off all trailing delimiters: },"

- then add on a fixed suffix: "]}

- then try parsing as a standard json. Ignore results if fails to parse.

This works since the schema I’m parsing had a fairly simple structure where everything of interest was at a specific depth in the hierarchy and values were all strings.

jauntywundrkind

3 months ago

It's no longer active, but Oboe.js did great stuff for a decade+ in this field! It has some very nice APIs for consuming. https://github.com/jimhigson/oboe.js/

It's less about incrementally parsing objects, and more about picking paths and shapes out from a feed. If you're doing something like array/newline delimited json, it's a great tool for reading things out as they arrive. Also great for example for feed parsing.

paulddraper

3 months ago

Interesting that it streams a string object value prior to being complete, but not the object key.

marenVoyant88

3 months ago

Finally, a way to make JSON as fluid as our UI needs to be.

saidinesh5

3 months ago

I wrote something similar in my last job where we had to parse and query data from huge (50+ GB? I remember they weren't even fitting in my laptop) json files that were stored in an S3 Bucket..

We used the streaming parser to create an index of the file locally {json key: (byte offset, byte size)} and then simply used http range queries to access the data we needed.

Here is the full write up about it:

https://dinesh.cloud/2022/streaming-json-for-fun-and-profit/

And here is the open sourced code:

https://github.com/multiversal-ventures/json-buffet

chrchr

3 months ago

I did something like this for Python [1]. The application I worked on at the time had a feature allowing users to import and export their data as a JSON document, and users often had enough data to make this cumbersome, especially with serialization and deserialization overhead. My implementation can also generate JSON documents as they stream out, from Python generators. The incremental JSON parsing was a little difficult to use, but incremental generation was an immediate win. We generated JSON documents from database results row-by-row and streamed the output to the web server, never producing the entire document in memory.

[1] https://github.com/chrchr/flojay

quotemstr

3 months ago

Awesome. You know what would be EVEN COOLER?

Given a schema and a JSON message prefix, parse the complete message but substitute missing field values with Promise objects. Likewise, represent lists as lazy sequences. Add a pubsub system.

klntsky

3 months ago

Nice. You should pair it with Immer + React for a nice UI demo. This will hype hard if the value is properly demonstrated.

View full discussion on Hacker News

ID: 45518033Type: storyLast synced: 11/20/2025, 8:42:02 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN