ASAsense Binary Table (ABT) format: verschil tussen versies

Uit ASAsense Documentation
(Nieuwe pagina aangemaakt met 'floatThe ASAsense binary table format is an alternative for the CSV format, but for binary data. The structure is as follows: {| class="wikitable" |+ !name !length (bytes) !type !allowed values !explanation |- |file_type |1 |u8 |1 |The internal number of the file type and version |- |n_cols |1 |u8 |1-255 |the amount of headers |- |col_widths |1 x n_cols |u8[n_cols] |1-255 per column |n_cols u8 integers describing the byte-width of the columns |- |metadata |unt...')
(geen verschil)

Versie van 4 okt 2024 10:42

floatThe ASAsense binary table format is an alternative for the CSV format, but for binary data. The structure is as follows:

name length (bytes) type allowed values explanation
file_type 1 u8 1 The internal number of the file type and version
n_cols 1 u8 1-255 the amount of headers
col_widths 1 x n_cols u8[n_cols] 1-255 per column n_cols u8 integers describing the byte-width of the columns
metadata until '\0' char[] C-style string (ends when '\0' character is encountered, encoded in UTF8) JSON-encoded metadata
data variable, until EOF byte[] any The actual data

The JSON-metadata should be an object with the following structure

field required type explanation
n_rows false number The amount of rows in the file, if known upfront (can be omitted for streaming data)
comment false string a free string
columns true a list of column definition objects an list of objects describing the columns in more details (should have n_cols entries)

A "column definition object" should have the following format

field required type explanation
name false string an optional name of the column
comment false string a free string
datatype true string An identifier for the datatype of the column

The datatype identifiers can be anything, but the following should at least be supported by the reader:

identifier description
utf8 interpret the data as an UTF8-string (with fixed length, does not need to be terminated with '\0')
int interpret the data as signed integer
uint interpret the data as unsigned integer
float interpret as floating point data (only possible for 4 or 8 byte fields)
numtype/div "div" should be replaced by any floating point number. The data is then first interpreted as "numtype" (int, uint or float) and then divided by "div". This is mainly useful for storing decimal numbers in integer fields.
unix_timestamp Interprets the data as "uint" , but takes into account that it is a unix_timestamp (offset seconds since the Unix Epoch on January 1st, 1970 at UTC)
bool boolean value (0 = false, anything else = true)

Other, unknown datatype identifiers, should be treated as opaque binary data (bytes) by the reader.