From ca3f3cc49d981d2888ca91c4f3097d7c5ec1bd93 Mon Sep 17 00:00:00 2001 From: Daniel Lemire Date: Fri, 20 Aug 2021 14:09:30 -0400 Subject: [PATCH] Update iterate_many.md --- doc/iterate_many.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/doc/iterate_many.md b/doc/iterate_many.md index 74dfa35a..36036b6a 100644 --- a/doc/iterate_many.md +++ b/doc/iterate_many.md @@ -1,8 +1,14 @@ iterate_many ========== -An interface providing features to work with files or streams containing multiple small JSON documents. -As fast and convenient as possible. +An interface providing features to work with files or streams containing multiple small JSON documents. Given an input such as +```JSON +{"text":"a"} +{"text":"b"} +{"text":"c"} +... +``` +... you want to read the entries (individual JSON documents) as quickly and as conveniently as possible. Importantly, the input might span several gigabytes, but you want to use a small (fixed) amount of memory. Ideally, you'd also like the parallelize the processing (using more than one core) to speed up the process. Contents -------- @@ -226,4 +232,4 @@ This will print: 39 bytes ``` -Importantly, you should only call `truncated_bytes()` after iterating through all of the documents since the stream cannot tell whether there are truncated documents at the very end when it may not have accessed that part of the data yet. \ No newline at end of file +Importantly, you should only call `truncated_bytes()` after iterating through all of the documents since the stream cannot tell whether there are truncated documents at the very end when it may not have accessed that part of the data yet.