ASAsense Binary Table (ABT) format: verschil tussen versies
Uit ASAsense Documentation
Geen bewerkingssamenvatting |
Geen bewerkingssamenvatting |
||
| Regel 24: | Regel 24: | ||
|u32[n_cols] | |u32[n_cols] | ||
|1-u32::max per column | |1-u32::max per column | ||
|n_cols | |n_cols u32 integers describing the byte-width of the columns | ||
|- | |- | ||
|metadata | |metadata | ||
Versie van 4 okt 2024 11:26
The ASAsense binary table format is an alternative for the CSV format, but for binary data with fixed row lengths. The structure is as follows:
| name | length (bytes) | type | allowed values | explanation |
|---|---|---|---|---|
| file_type | 1 | u8 | 1 | The internal number of the file type and version |
| n_cols | 4 | u32 | 1-u32::max | the amount of headers |
| col_widths | 1 x n_cols | u32[n_cols] | 1-u32::max per column | n_cols u32 integers describing the byte-width of the columns |
| metadata | until '\0' | char[] | C-style string (ends when '\0' character is encountered, encoded in UTF8) | JSON-encoded metadata |
| data | variable, until EOF | byte[] | any | The actual data |
The JSON-metadata should be an object with the following structure
| field | required | type | explanation |
|---|---|---|---|
| n_rows | false | number | The amount of rows in the file, if known upfront (can be omitted for streaming data) |
| comment | false | string | a free string |
| columns | true | a list of column definition objects | an list of objects describing the columns in more details (should have n_cols entries) |
A "column definition object" should have the following format
| field | required | type | explanation |
|---|---|---|---|
| name | false | string | an optional name of the column |
| comment | false | string | a free string |
| datatype | true | string | An identifier for the datatype of the column |
The datatype identifiers can be anything, but the following should at least be supported by the reader:
| identifier | description |
|---|---|
| utf8 | interpret the data as an UTF8-string (with fixed length, does not need to be terminated with '\0') |
| int | interpret the data as signed integer |
| uint | interpret the data as unsigned integer |
| float | interpret as floating point data (only possible for 4 or 8 byte fields) |
| numtype/div | "div" should be replaced by any floating point number. The data is then first interpreted as "numtype" (int, uint or float) and then divided by "div". This is mainly useful for storing decimal numbers in integer fields. |
| unix_timestamp | Interprets the data as "uint" , but takes into account that it is a unix_timestamp (offset seconds since the Unix Epoch on January 1st, 1970 at UTC) |
| bool | boolean value (0 = false, anything else = true) |
Other, unknown datatype identifiers, should be treated as opaque binary data (bytes) by the reader.