1
0
mirror of https://github.com/stedolan/jq.git synced 2024-05-11 05:55:39 +00:00

Add Streaming parser (--stream)

Streaming means that outputs are produced as soon as possible.  With the
`foreach` syntax one can write programs which reduce portions of the
streaming parse of a large input (reduce into proper JSON values, for
example), and discard the rest, processing incrementally.

This:

    $ jq -c --stream .

should produce the same output as this:

    $ jq -c '. as $dot | path(..) as $p | $dot | getpath($p) | [$p,.]'

The output of `jq --stream .` should be a sequence of`[[<path>],<leaf>]`
and `[[<path>]]` values.  The latter indicate that the array/object at
that path ended.

Scalars and empty arrays and objects are leaf values for this purpose.

For example, a truncated input produces a path as soon as possible, then
later the error:

    $ printf '[0,\n'|./jq -c --stream .
    [[0],0]
    parse error: Unfinished JSON term at EOF at line 3, column 0
    $
This commit is contained in:
Nicolas Williams
2014-12-22 23:06:27 -06:00
parent 906d2537b9
commit 5bfb9781f7
10 changed files with 480 additions and 80 deletions

View File

@@ -103,6 +103,17 @@ sections:
RS. This more also parses the output of jq without the `--seq`
option.
* `--stream`:
Parse the input in streaming fashion, outputing arrays of path
and leaf values (scalars and empty arrays or empty objects).
For example, `"a"` becomes `[[],"a"]`, and `[[],"a",["b"]]`
becomes `[[0],[]]`, `[[1],"a"]`, and `[[1,0],"b"]`.
This is useful for processing very large inputs. Use this in
conjunction with filtering and the `reduce` and `foreach` syntax
to reduce large inputs incrementally.
* `--slurp`/`-s`:
Instead of running the filter for each JSON object in the
@@ -2205,6 +2216,29 @@ sections:
input: '1'
output: ['[1,2,4,8,16,32,64]']
- title: 'I/O'
body: |
At this time jq has minimal support for I/O, mostly in the
form of control over when inputs are read. Two builtins functions
are provided for this, `input` and `inputs`, that read from the
same sources (e.g., `stdin`, files named on the command-line) as
jq itself. These two builtins, and jq's own reading actions, can
be interleaved with each other.
- title: "`input`"
body: |
Outputs one new input.
- title: "`inputs`"
body: |
Outputs all remaining inputs, one by one.
This is primarily useful for reductions over a program's
inputs.
- title: Assignment
body: |