encoding

The encoding library provides functions to convert data into built-in arr.ai values. The following functions are available by accessing the //encoding attribute.

`//encoding.xml.decode(xml <: string|bytes) <: array`#

decode takes either a string or bytes that represents a XML object and transforms it into an two-dimensional string array.

For details of how Arr.ai encodes XML, see Encoding below.

Usage:

example	equals
`//encoding.xml.decode('<?xml version="1.0"?><root></root>')`	`[(xmldecl: 'version="1.0"'), (elem: 'root')]`

`//encoding.xml.decoder(config <: (:trimSurroundingWhitespace <: bool)).decode(xml <: string|bytes) <: array`#

decoder takes a tuple used to configure decoding and returns the decoding function:

config	description
`trimSurroundingWhitespace`	Strips newline strings `'\n'` used only for xml file formatting

Usage:

example	equals
`//encoding.xml.decoder((trimSurroundingWhitespace: true)).decode('<?xml version="1.0"?>\n')`	`[(xmldecl: 'version="1.0"')]`
`//encoding.xml.decoder((trimSurroundingWhitespace: false)).decode('<?xml version="1.0"?>\n')`	`[(xmldecl: 'version="1.0"'), '\n']`

`//encoding.xml.encode(xml <: array) <: bytes`#

encode takes an array of tuples and converts it into a XML object.

For details of how Arr.ai encodes XML, see Encoding below.

For details of the limitations of XML encoding, see Limitations below.

Usage:

example	equals
`//encoding.xml.encode([(xmldecl: 'version="1.0"')])`	`<?xml version="1.0"?>`

`//encoding.csv.decode(csv <: string|bytes) <: array`#

decode takes either a string or bytes that represents a CSV object and transforms it into an two-dimensional string array.

Usage:

example	equals
`//encoding.csv.decode('a,b,c\n1,2,3')`	`[['a', 'b', 'c'], ['1', '2', '3']]`

`//encoding.csv.decoder(config <: (comma <: int, comment <: int)) <: ((csv <: string|bytes) <: array)`#

decoder takes a tuple used to configure decoding and returns the decoding function.

config	description
`comma`	Configures the separator used (defaults to `%,`).
`comment`	Ignores lines from the input that start with the given character (defaults to regarding all lines as value input).
`trimLeadingSpace`	Leading white space in a field is ignored. This is ignored even if the field delimiter, comma, is white space.
`fieldsPerRecord`	The number of expected fields per record. If positive, each record must have the given number of fields. If zero, each record must have the same number as the first row. If negative, no check is made and records may have a variable number of fields.
`lazyQuotes`	If true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

Usage:

example	equals
`//encoding.csv.decoder((comma: %:))('a:b:c\n1:2:3')`	`[['a', 'b', 'c'], ['1', '2', '3']]`
`//encoding.csv.decoder((comment: %#))('a,b,c\n#1,2,3')`	`[['a', 'b', 'c']]`

`//encoding.csv.encode(csv <: array) <: bytes`#

encode takes a two-dimensional string array and converts it into a CSV object.

Usage:

example	equals
`//encoding.csv.encode([['a', 'b', 'c'], ['1', '2', '3']])`	`<<'a,b,c\n1,2,3'>>`

`//encoding.csv.encoder(config <: (comma <: int, crlf <: bool)) <: (\(csv <: array) <: bytes)`#

encoder takes a tuple used to configure encoding and returns the encoding function:

config	description
`comma`	Configures the separator used (defaults to `%,`).
`crlf`	Encodes new lines as either `'\r\n'` when `true` or `'\n'` when `false` (defaults to `false`).

Usage:

example	equals
`//encoding.csv.encoder((comma: %:))([['a', 'b', 'c'], ['1', '2', '3']])`	`<<'a:b:c\n1:2:3'>>`
`//encoding.csv.encoder((crlf: true))([['a', 'b', 'c'], ['1', '2', '3']])`	`<<'a,b,c\r\n1,2,3'>>`

`//encoding.json.decode(json <: string|bytes) <: set`#

decode takes either a string or bytes that represents a JSON object. json is then converted to a built-in arr.ai value.

Because empty sets are indistinguishable to "", false, and [], decode maps incoming JSON values as follows:

JSON encoding	maps to…	notes
`"abc"`	`(s: "abc")`
`[1, 2, 3]`	`(a: [1, 2, 3])`
`false`/`true`	`(b: false)`/`(b: true)`
`null`	`()`
`{"a": [2, 4, 8]}`	`{"a": (a: [2, 4, 8])}`	Objects are mapped directly to dicts.
`42`	`42`	Numbers, including zero, cannot be confused with other values.

Usage:

example	equals
`//encoding.json.decode('{"hi": "abc", "hello": 123}')`	`{'hello': 123, 'hi': (s: 'abc')}`

`//encoding.json.decoder(config <: (strict <: bool)) <: ((json <: string|bytes) <: array)`#

decoder takes a tuple used to configure decoding and returns the decoding function:

config	description
`strict`	For types that are indistinguishable when empty (strings, bools, and arrays), wrap values in tuples with a discriminating key (defaults to `true`).

Usage:

example	equals
`//encoding.json.decoder(())('{"arr": [1], "null": null, "str": "2"}')`	`<<'{'arr': (a: [1]), 'null': (), 'str': (s: '2')}'>>`
`//encoding.json.decoder((strict: false))('{"arr": [1], "null": null, "str": "2"}')`	`<<'{"arr": [1], "null": (), "str": "2"}'>>`

`//encoding.json.encode(jsonDefinition <: set) <: bytes`#

encode is the reverse of decode. It takes a built-in arr.ai value to bytes that represents a JSON object.

Usage:

example	equals
`//encoding.json.encode({'hello': 123, 'hi': (s: 'abc'), 'yo': (a: [1,2,3])})`	`'{"hello":123,"hi":"abc","yo":[1,2,3]}'`

`//encoding.json.encode_indent(jsonDefinition <: set) <: bytes`#

encode_indent is like encode but applies indentations to format the output.

`//encoding.json.encoder(config <: (strict <: bool, prefix <: string, indent: string, escapeHTML <: bool)) <: ((jsonDefinition <: set) <: bytes)`#

encoder takes a tuple used to configure encoding and returns the encoding function:

config	description
`prefix`	The string to prepend to each line of encoded output (default `""`).
`indent`	The string to use for each indent on each line of encoded output (default `""`). If empty, the output will be encoded on a single line.
`escapeHTML`	Whether problematic HTML characters should be escaped inside JSON quoted strings (default `false`).
`strict`	For types that are indistinguishable when empty (strings, bools, and arrays), require values to be wrapped in tuples with a discriminating key (defaults to `true`). If `false`, all empty sets will be encoded as `null`.

Example:

//encoding.json.encoder((prefix: '↘️', indent: '➡️', escapeHTML: true, strict: false))(    (a: {"b": "c", "d": true}, bool: false, number: 0, set: {}, array: [], html: "<script/>"))
{↘️➡️"a": {↘️➡️➡️"b": "c",↘️➡️➡️"d": true↘️➡️},↘️➡️"array": null,↘️➡️"bool": null,↘️➡️"html": "\\u003cscript/\\u003e",↘️➡️"number": 0,↘️➡️"set": null↘️}

`//encoding.yaml.decode(json <: string|bytes) <: set`#

Exactly the same as //encoding.json.decode but takes either a string or bytes that represents a YAML object.

`//encoding.yaml.encode(yamlDefinition <: set) <: bytes`#

Exactly the same as //encoding.json.encode but returns bytes that represents a YAML object.

`//encoding.yaml.encoder(config <: (strict <: bool, indent: int)) <: ((yamlDefinition <: set) <: bytes)`#

encoder takes a tuple used to configure encoding and returns the encoding function:

config	description
`indent`	The number of spaces to indent sections with (default `4`).
`strict`	For types that are indistinguishable when empty (strings, bools, and arrays), require values to be wrapped in tuples with a discriminating key (defaults to `true`). If `false`, all empty sets will be encoded as `null`.

`//encoding.proto.descriptor(protobufDefinition <: bytes) <: tuple`#

This method accepts protobuf binary files and returns a tuple representation of a FileDescriptorSet, which describes message types in the binary file. This tuple can be passed as the first parameter to decode.

For example:

//encoding.proto.descriptor(//os.file('sys.pb'))

References: sysl.pb

`//encoding.proto.decode(descriptor <: tuple, messageTypeName <: string, messageBytes <: bytes) <: tuple`#

This method accepts three parameters:

a tuple representation of a FileDescriptorSet (as produced by //encoding.proto.descriptor).
the name of the message to be decoded.
the content of an encoded protobuf message.

It returns a tuple representation of the encoded message.

Sample code for converting a Sysl protobuf message to arr.ai values:

let syslDescriptor = //encoding.proto.descriptor(//os.file('sysl.pb'));let shop = //encoding.proto.decode(syslDescriptor, 'Module', //os.file('petshop.pb'));shop.apps('PetShopApi').attrs('package').s

It will output

'io.sysl.demo.petshop.api'

The first line constructs a protobuf file descriptor. //os.file('sysl.pb') is the binary output of compiling sysl.proto with protoc.

The second line uses the sysl file descriptor to parse //os.file('petshop.pb'), a compiled Sysl Module message.

The output is shop, a tuple representing a Module. It contains a field apps, which maps names to tuple representations of Application. Application contains a field attrs, which maps names to tuple representation of Attribute. The data type of attribute package is string, so .s will get its string value.

More sample code and data details

`//encoding.xlsx.decodeToRelation((sheet <: int, headRow <: int) <: tuple, xlsx <: bytes) <: relation`#

decodeToRelation transforms one sheet of an Excel workbook (XLSX format, loaded as bytes) to an arr.ai relation: a set of tuples (rows) with attributes names corresponding to the column headers and values to the cells.

decodeToRelation can only decode relatively simple tabular spreadsheets with a single header given by headRow. The decoding:

ignores columns without heading values;
ignores rows with no cell values;
converts heading/column names to snake_case, replacing various special characters with _.

Note that unlike standard decode functions, this is not reversible; its output cannot be passed to an encode function to produce the original XLSX. Expect this function to be superseded by more canonical decoding functions in the future.

XML#

Encoding#

Description	XML Encoding	Arr.ai Encoding
Declaration	`<?xml version="1.0"?>`	`[(xmldecl: 'version="1.0"')]`
Directive	`<!DOCTYPE foo <!ELEMENT foo (#PCDATA)>>`	`(directive: 'DOCTYPE foo <!ELEMENT foo (#PCDATA)>')`
Text	`Hello world`	`'Hello world'`
Comment	`<!-- hello world -->`	`(comment: " helloworld ")`
Element	`<root><child/></root>`	`[(elem: root, children: [(elem: child)])]`
Element with namespace	`<root xmlns="foo"><child/></root>`	`[(elem: 'root', attrs: {(name: 'xmlns', value: 'foo')}, children: [(elem: 'child', ns: 'foo')], name: 'root', ns: 'foo')]`
Attribute	`<root key="value"/>`	`[(elem: 'root', attrs: {(name: 'key', value: 'value')})]`
Attribute with namespace	`<root xmlns:foo="foo.com" foo:key="value"/>`	`[(elem: 'root', attrs: {(name: 'foo', ns: 'xmlns', value: 'foo.com'), (name: 'key', ns: 'foo.com', value: 'value')})]`

Limitations#

XML encoding does not currently support documents that have items with explicit namespaces (e.g. <namespace:element /> or namespace:attribute="value"). This is due to a limitation of the underlying XML parser. Attempting to encode an XML document that includes explicit namespaces may result in an invalid document.

//encoding.xml.decode(xml <: string|bytes) <: array#

//encoding.xml.decoder(config <: (:trimSurroundingWhitespace <: bool)).decode(xml <: string|bytes) <: array#

//encoding.xml.encode(xml <: array) <: bytes#

//encoding.csv.decode(csv <: string|bytes) <: array#

//encoding.csv.decoder(config <: (comma <: int, comment <: int)) <: ((csv <: string|bytes) <: array)#

//encoding.csv.encode(csv <: array) <: bytes#

//encoding.csv.encoder(config <: (comma <: int, crlf <: bool)) <: (\(csv <: array) <: bytes)#

//encoding.json.decode(json <: string|bytes) <: set#

//encoding.json.decoder(config <: (strict <: bool)) <: ((json <: string|bytes) <: array)#

//encoding.json.encode(jsonDefinition <: set) <: bytes#

//encoding.json.encode_indent(jsonDefinition <: set) <: bytes#

//encoding.json.encoder(config <: (strict <: bool, prefix <: string, indent: string, escapeHTML <: bool)) <: ((jsonDefinition <: set) <: bytes)#

//encoding.yaml.decode(json <: string|bytes) <: set#

//encoding.yaml.encode(yamlDefinition <: set) <: bytes#

//encoding.yaml.encoder(config <: (strict <: bool, indent: int)) <: ((yamlDefinition <: set) <: bytes)#

//encoding.proto.descriptor(protobufDefinition <: bytes) <: tuple#

//encoding.proto.decode(descriptor <: tuple, messageTypeName <: string, messageBytes <: bytes) <: tuple#

//encoding.xlsx.decodeToRelation((sheet <: int, headRow <: int) <: tuple, xlsx <: bytes) <: relation#

XML#

Encoding#

Limitations#

`//encoding.xml.decode(xml <: string|bytes) <: array`#

`//encoding.xml.decoder(config <: (:trimSurroundingWhitespace <: bool)).decode(xml <: string|bytes) <: array`#

`//encoding.xml.encode(xml <: array) <: bytes`#

`//encoding.csv.decode(csv <: string|bytes) <: array`#

`//encoding.csv.decoder(config <: (comma <: int, comment <: int)) <: ((csv <: string|bytes) <: array)`#

`//encoding.csv.encode(csv <: array) <: bytes`#

`//encoding.csv.encoder(config <: (comma <: int, crlf <: bool)) <: (\(csv <: array) <: bytes)`#

`//encoding.json.decode(json <: string|bytes) <: set`#

`//encoding.json.decoder(config <: (strict <: bool)) <: ((json <: string|bytes) <: array)`#

`//encoding.json.encode(jsonDefinition <: set) <: bytes`#

`//encoding.json.encode_indent(jsonDefinition <: set) <: bytes`#

`//encoding.json.encoder(config <: (strict <: bool, prefix <: string, indent: string, escapeHTML <: bool)) <: ((jsonDefinition <: set) <: bytes)`#

`//encoding.yaml.decode(json <: string|bytes) <: set`#

`//encoding.yaml.encode(yamlDefinition <: set) <: bytes`#

`//encoding.yaml.encoder(config <: (strict <: bool, indent: int)) <: ((yamlDefinition <: set) <: bytes)`#

`//encoding.proto.descriptor(protobufDefinition <: bytes) <: tuple`#

`//encoding.proto.decode(descriptor <: tuple, messageTypeName <: string, messageBytes <: bytes) <: tuple`#

`//encoding.xlsx.decodeToRelation((sheet <: int, headRow <: int) <: tuple, xlsx <: bytes) <: relation`#