8. The JSON I/O Layer¶
Subsections in this chapter are informative unless normativity is explicitly stated.
8.1. Introduction¶
While the original Waterloo docstring format is already designed to be machine-readable – both comprehensible to an LLM and parsable by a simple state machine – practical experience shows that JSON is the most suitable format for providing structured information to LLMs (last verified April 2026).
Compared to HTML, JSON avoids a significant amount of structural overhead. Even when presentation, layout, and functionality are strictly separated, HTML artifacts still contain a considerable amount of information that is irrelevant for machine interpretation (often referred to as “human-oriented noise”).
More notably, JSON is typically preferred over XML, even when XML is reduced to its essential content without styling or presentation-related elements. YAML is generally accepted, but is in practice considered less robust due to its indentation-based syntax.
The following exchange (translated from German) illustrates a typical assessment of common data formats with respect to their suitability for large language models.
Question. Which format is best suited for LLM consumption when choosing between HTML (for human-readable browser output), XML (without styling), YAML, and JSON? Assume that all four formats can be transformed into each other by a suitable state machine, i.e. the information content is identical.
Answer.
1. JSON: The clear recommendation for LLMs
Advantages:
Structured hierarchy: JSON is natively hierarchical and supports nested objects, which is ideal for docstrings with normative sections, variables, and conditions.
LLM optimization: Most LLMs are heavily exposed to JSON during training and tend to parse it more reliably than XML or YAML. JSON schemas can also serve as formal specifications.
Tooling: Straightforward integration (e.g.
json.loadsin Python), validation via JSON Schema, and direct use in APIs.Minimalism: No markup overhead as in HTML or XML, while remaining fully machine-readable.
Disadvantages:
Less intuitive for human readers compared to rendered HTML.
2. XML: Structured but unnecessarily complex
Advantages:
Expressive: Supports attributes and namespaces.
Validatable: DTD and XSD schemas are available.
Disadvantages:
Overhead: Tags and attributes increase token count and parsing effort.
LLM practice: Less robust handling of deeply nested structures.
Tooling: More complex parsing compared to JSON.
3. YAML: Human-friendly but risky for LLMs
Advantages:
Readable for humans.
Compact syntax without brackets or tags.
Disadvantages:
Whitespace-sensitive: Prone to indentation errors.
No universally enforced schema system.
Ambiguities (e.g.
yesvs."yes").4. HTML: Unsuitable for LLMs
Advantages:
Ideal for human-facing rendering.
Disadvantages:
High noise level due to markup.
Large token overhead.
Structural complexity (DOM) that is irrelevant for semantic extraction.
This observation is consistent across multiple LLMs and informal tests.
For these reasons, the Waterloo toolkit provides JSON output for documentation artifacts and diagnostic data, and supports JSON input for more complex tasks. This chapter demonstrates how machine-readable normative documentation is generated from Waterloo docstrings, how it can be enriched with informative example code, and how the resulting JSON artifacts can be used to generate a conventional, human-readable interactive HTML documentation site.
8.2. Validating JSON input, output and diagnostics¶
Any Waterloo-related JSON artifact can be validated with
waterlint validate-json:
waterlint validate-json--inpath/to/validate.json
It attempts to infer the category of the input data (documentation, diagnostics, or example references) and validates it against the corresponding schema:
schema/wtrl-json-*.*.*.schema.jsonfor documentationschema/wtrl-explain-section-json-*.*.*.schema.jsonfor section explanationsschema/wtrl-explain-subsection-json-*.*.*.schema.jsonfor subsection explanationsschema/wtrl-tracer-json-*.*.*.schema.jsonfor diagnosticsschema/wtrl-example-refs-json-*.*.*.schema.jsonfor example referencesschema/wtrl-walk-json-*.*.*.schema.jsonfor structured and detailed output of subcommandwalk.
The JSON file to be validated contains the version number of the schema to validate against. If the category cannot be inferred, the schema can be specified explicitly using
--schemapath/to/schema.json.
The directory schema is a resource located in the package
directory of sdv.doc.waterloo. A list of available schemas
and their locations can be obtained with
waterlint list-schemas
The command supports the following options:
--out-diagpath/to/diagnostics
for human-readable diagnostics, and
--out-diag-jsonpath/to/diagnostics.json
for machine-readable diagnostics.
A summary of these options is displayed by
waterlint help--topicvalidate-json
8.3. Creating LLM-readable documentation¶
Given a module or a set of modules with Waterloo Docstrings, an LLM-readable JSON artifact can be created using the following command:
waterlint render-json--basedirpath/to/basedir--objmodule1 [module2...]
The two options are intentionally independent: --basedir must point
to the import root that makes the target modules resolvable, while
--obj names the importable modules themselves. In other words,
–basedir is not the module directory to document, but the directory that
contains the Python package root for the objects named by --obj.
For a project using the common src/ layout, that usually means pointing
--basedir at the src directory and passing fully qualified module
names such as sdv.doc.waterloo.waterlint.
The output path is specified either by option
--outpath/to/output.json
or by
--out-dirpath/to/dir/
In this case, waterlint generates a filename
which contains the scope (e.g. “core”, “public”) and the flavour (mostly “rfc-2119”)
as substrings, according to a fixed scheme.
If multiple modules are provided, an option --out-prefix myprefix is required
for --out-dir because the filename cannot be inferred uniquely in case of more than one input module.
Option --scope allows restricting the content to docstrings
with the given scope, taking into account the monotonicity rules
for scopes (e.g. the set of “core” docstrings contains “extension”
docstrings, which in turn contain “public” docstrings).
Option --flavour allows specifying how normativity keywords are rendered.
Since normativity in Waterloo Docstrings is defined by structure instead of typography,
this is mainly a matter of taste. Since the target audience – LLMs – is familiar with RFC 2119,
passing rfc-2119 (the default) is usually a good choice.
Diagnostics are written in human-readable form by default. The target directories for human- and machine-readable formats are specified by
--out-diagpath/to/diagnostics--out-diag-jsonpath/to/diagnostics.json
A summary of these options is displayed by
waterlint help--topicrender-json
Note
When rendering large module trees, invalid objects may be reported as
standardized warnings with rule TOOL-009. In that case,
passing --ignore TOOL-009 is often useful if the
invalid objects are expected and should simply be skipped.
Example
Consider the following minimal module:
"""
Preamble:
profile:
module
normative_sections:
Contract
Contract:
general:
|Must| demonstrate the minimal module docstring.
"""
located for instance in doc/input-python
We render this as JSON by
waterlint render-json--basedirdoc/input-python--objmypkg.test_module_minimal--out-dirdoc/output-json/
Since we did not explicitly specify a target file name, scope, or flavour, the resulting file is
doc/output-json/mypkg.test_module_minimal.wtrl.core.rfc-2119.json
Let us have a look at the details. The header provides a reference to the JSON Schema
for the output and a unique hashed identifier. Node __WTRL_VERSION__
contains the version of module sdv.doc.waterloo and the JSON Schema version
to validate against.
{
"$schema": "https://sci-d-vis.com/schema/wtrl-json-0.1.0.schema.json",
"$id": "urn:waterlint:wtrl-json:0.8.1:core:rfc-2119:3e64950b9b650...",
"__WTRL_VERSION__": {
"waterloo": "0.6.1",
"schema": "0.1.0"
},
"...":"..."
}
The next node contains metadata such as creation time, scope and flavour.
{
"...":"...",
"__WTRL_META__": {
"generated_at": "2026-04-20T11:26:41+02:00",
"generator": "waterlint",
"scope": "core",
"flavour": "rfc-2119"
},
"...":"..."
}
Since docstrings are meant to be rendered as human-readable HTML (be it interactive or as Sphinx output), they contain semantic roles, and we should allow the LLM to understand the meaning of these roles:
{
"...":"...",
"__WTRL_ROLES__": {
"attr": "Attribute name",
"cmd": "Shell or CLI command",
"dfn": "Definition of a term",
"file": "File or path",
"func": "Function or callable",
"key": "Key on the keyboard",
"label": "Section/Subsection label",
"lit": "Literal text or code",
"mod": "Module name",
"op": "Operator symbol",
"opt": "Command-line option or flag",
"tag": "Tag or marker",
"term": "Domain-specific term",
"type": "Type name or annotation",
"value": "Concrete value",
"var": "Variable name",
"var_type": "Variable and type, like 'var:type'"
},
"...":"..."
}
In principle, Waterloo allows user-defined scopes, although this is not yet fully supported. The JSON artifact already reflects this capability by embedding the scope specification:
{
"...":"...",
"__WTRL_SCOPES__": {
"public": { "value": 10,"description": "" },
"extension": { "value": 20,"description": "" },
"core": { "value": 30,"description": "" }
},
"...":"..."
}
The next block is the table of contents.
Documented objects are grouped by their category, and each entry points
to an entry in subtree __WTRL_OBJECTS__.
In our minimal case there is only a single module and no other objects, so we have:
{
"...":"...",
"__WTRL_TOC_MODULES__": {
"mypkg.test_module_minimal": "/__WTRL_OBJECTS__/mypkg.test_module_minimal"
},
"__WTRL_TOC_CLASSES__": {},
"__WTRL_TOC_CALLABLES__": {},
"__WTRL_TOC_TYPES__": {},
"__WTRL_TOC_VARIABLES__": {},
"__WTRL_TOC_CONSTANTS__": {},
"...":"..."
}
Node __WTRL_OBJECTS__, finally, contains the docstring
in LLM-friendly form, i.e. sections and subsections are encoded as JSON nodes.
{
"...":"...",
"__WTRL_OBJECTS__": {
"mypkg.test_module_minimal": {
"path": "/path/to/doc/input-python/mypkg/test_module_minimal.py",
"doc": {
"Preamble": {
"profile": "module",
"normative_sections": [ "Contract" ]
},
"Contract": {
"general": [ "MUST demonstrate the minimal module docstring." ]
}
}
}
}
}
8.4. Adding examples to a JSON document¶
When documentation based on Waterloo docstrings is rendered by Sphinx, code examples can easily be added in the reST code base. This raises the question of how to include code examples in the JSON output.
The solution is to introduce a JSON node __WTRL_EXAMPLES__,
added at the same level in the JSON tree as __WTRL_OBJECTS__.
In this section, we show how this can be done with waterlint.
Assume you have example programs or snippets for some of the objects documented in the JSON output. These examples are located in your project directory, and each file is an example for one or more Python objects. Technically, there is an m-to-n relation between documented objects and examples: any documented module, class, or function can have zero or more examples, and each example can be associated with one or more documented objects.
This relation is represented using a dedicated JSON format. A template can be generated with the following command:
waterlint gen-example-template-json
Apart from version numbers, the result should look like this:
{
"$schema": "https://sci-d-vis.com/schema/wtrl-example-refs-json-0.1.1.schema.json",
"$id": "urn:none:local:wtrl-example-refs-json:0.1.1",
"__WTRL_VERSION__": {
"waterloo": "0.8.1",
"waterlint_min": "0.1.0",
"schema": "0.1.1"
},
"__WTRL_EXAMPLE_REFS__": {
"my_module.my_function": [
"path/to/example1.py",
"path/to/example2.py"
]
}
}
Examples are added to node __WTRL_EXAMPLE_REFS__
by creating one entry per documented object and mapping it to a list
of paths pointing to example files.
{
"...":"...",
"__WTRL_EXAMPLE_REFS__": {
"mymod.myfunc_1": [
"path/to/example_1_1.py",
"path/to/example_1_2.py",
"..."
],
"mymod.myfunc_2": [
"path/to/example_2_1.py",
"path/to/example_2_2.py",
"..."
],
"...":"..."
}
}
The examples are then added to the JSON document using the following command:
waterlint add-example-json--inpath/to/doc.json--outpath/to/doc_with_examples.json--examplespath/to/examples.json--basedirpath/to/examples/
Example
In the following example, we add a Python example to the JSON document from the previous section. Let us assume our files are located in the filesystem as follows:
doc
├── input-python
│ └── mypkg
│ └── test_module_minimal.py
├── input-json
│ └── test_module_minimal_examples.json
├── output-json
│ └── mypkg.test_module_minimal.wtrl.core.rfc-2119.json
└── examples-python
└── example_module_minimal.py
Here, mypkg/test_module_minimal.py is the original module.
mypkg.test_module_minimal.wtrl.core.rfc-2119.json is the JSON document
generated in the previous section.
example_module_minimal.py is a corresponding Python example:
import mypkg.test_module_minimal as m
if __name__ == "__main__":
print("Module mypkg.test_module_minimal imported.")
test_module_minimal_examples.json is the specification file
containing the mapping from documented objects to example paths:
{
"$schema": "https://sci-d-vis.com/schema/wtrl-example-refs-json-0.1.1.schema.json",
"$id": "urn:none:local:wtrl-example-refs-json:0.1.1",
"__WTRL_VERSION__": {
"waterloo": "0.6.1",
"waterlint_min": "0.8.1",
"schema": "0.1.1"
},
"__WTRL_EXAMPLE_REFS__": {
"mypkg.test_module_minimal": [
"example_module_minimal.py"
]
}
}
The specification file should be validated with
waterlint validate-json--indoc/input-json/test_module_minimal_examples.json
Then we embed the examples using the following command:
waterlint add-example-json--basedirdoc/examples-python--indoc/output-json/mypkg.test_module_minimal.wtrl.core.rfc-2119.json--outdoc/output-json/mypkg.test_module_minimal.with_examples.wtrl.core.rfc-2119.json--examplesdoc/input-json/test_module_minimal_examples.json
Option --basedir specifies the path to the Python examples referenced in test_module_minimal_examples.json.
The resulting JSON file mypkg.test_module_minimal.with_examples.wtrl.core.rfc-2119.json
looks similar to the input mypkg.test_module_minimal.wtrl.core.rfc-2119.json
but the documented object is now equipped with a reference to the example node:
{
"...":"...",
"__WTRL_OBJECTS__": {
"mypkg.test_module_minimal": {
"path": "...",
"doc": {
"Preamble": { "...":"..." },
"Contract": { "...":"..." }
},
"examples": [
"/__WTRL_EXAMPLES__/sha256_0a50ade00ffbebea..."
]
}
},
"...":"..."
}
The document also contains an additional node __WTRL_EXAMPLES__ with the example code
(formatted below for readability):
{
"...":"...",
"__WTRL_EXAMPLES__": {
"sha256_0a50ade00ffbebea...": {
"lang": "python",
"hash": "0a50ade00ffbebea...",
"code": "import mypkg.test_module_minimal as m\\n\\nif __name__ == \\\"__main__\\\":\\n\\tprint(\\\"Module mypkg.test_module_minimal imported.\\\")\\n",
"referenced_by": [
"mypkg.test_module_minimal"
]
}
}
}
Note that the example code is fully embedded in the JSON output. The resulting LLM-readable document therefore remains a single file.
8.5. JSON document categories and conventions¶
This section is normative.
The reference tooling emits and expects category-specific
$id values for these JSON categories.
- Waterloo API JSON (from
render-json): urn:waterlint:wtrl-json:<waterlint-version>:<scope>:<flavour>:<hash>
- Waterloo API JSON (from
- Explain-section JSON (from
explain-section): urn:waterlint:wtrl-explain-section-json:<waterlint-version>:<timestamp>
- Explain-section JSON (from
- Explain-subsection JSON (from
explain-subsection): urn:waterlint:wtrl-explain-subsection-json:<waterlint-version>:<timestamp>
- Explain-subsection JSON (from
- Tracer diagnostics JSON:
urn:waterlint:wtrl-tracer-json:<waterlint-version>:<timestamp>
- Example-reference mapping JSON:
Recommended pattern:
urn:<org-or-project>:<domain>:wtrl-example-refs-json:<schema-version>
- Output of
walk: urn:waterlint:wtrl-walk-json:<waterlint-walk-version>:<timestamp>
- Output of
The hash digest should be SHA256.
The $id value should be globally unique for each produced document.
For interoperability and diagnostics, the category marker
(wtrl-json, wtrl-tracer-json, wtrl-example-refs-json)
should be present.
8.6. Inspecting JSON documents with jq¶
In this section, we present a few examples of using the JSON
command-line processor jq with Waterloo JSON files.
You can try these examples with the accompanying file
PATH=sdv/doc/waterloo/doc-json/docitem.wtrl.core.rfc-2119.json,
which is shipped with this package.
The examples below illustrate only a small subset of what can be achieved
with jq. For a comprehensive reference, consult the official
jq documentation at jqlang.github.io/jq.
Extract a JSON node, in this case the list of documented modules:
jq .__WTRL_TOC_MODULES__ ${PATH}
Extract a selected entry of a JSON object:
jq '.__WTRL_TOC_MODULES__["sdv.doc.waterloo.docitem"]' ${PATH}
Extract the values (without keys) as JSON strings. When applied to an array,
[]emits each element. When applied to an object, it emits each value.jq '.__WTRL_TOC_MODULES__[]' ${PATH}
Extract the values (without keys) as raw strings:
jq -r '.__WTRL_TOC_MODULES__[]' ${PATH}
Extract the qualified identifiers of all classes (look for
profileclass). The filterto_entries[]converts the object into key-value pairs, which can then be accessed via.keyand.value. The filterselectpasses through only those entries that satisfy the specified condition.jq -r '.__WTRL_OBJECTS__ | to_entries[] | select(.value.doc.Preamble.profile == "class") | .key' ${PATH}
Find the keys of all functions that are marked with the trait
generator:jq -r '.__WTRL_OBJECTS__ | to_entries[] | select(.value.doc.Preamble.profile == "function") | select(.value.traits | any(. == "generator")?) | .key' ${PATH}
Extract examples assigned to for a given object. The code snippet below extracts the python example in
doc-json/tde4_with_examples.wtrl.core.rfc-2119.jsonfor the documented functiontde4.getFirstCamera.jq -r '. as $root | "__WTRL_EXAMPLES__" , ( $root.__WTRL_EXAMPLES__ | to_entries[] | select((.value.referenced_by // []) | index("tde4.getFirstCamera")) | "---- " + .key + " ----\n" + (.value.code // "") ) ' doc-json/tde4_with_examples.wtrl.core.rfc-2119.json