ASAsense Binary Table (ABT) format: verschil tussen versies

Versie van 5 okt 2024 07:37

The ASAsense binary table format is an alternative for the CSV format, but for binary data with fixed row lengths. The structure is as follows:


name	length (bytes)	type	allowed values	explanation
file_type	1	u8	1	The internal number of the file type and version
n_cols	4	u32	1-u32::max	the amount of headers
col_widths	4 x n_cols	u32[n_cols]	1-u32::max per column	n_cols u32 integers describing the byte-width of the columns
metadata_length	4	u32	0-u32::max	the length of the metadata field (can be 0 is it needs to be skipped)
metadata	metadata_length	char[]	string of fixed length (so no '\0' character to mark the end of the string)	JSON-encoded metadata
data	variable, until EOF	byte[]	any	The actual data

The JSON-metadata should be an object with the following structure


field	required	type	explanation
n_rows	false	number	The amount of rows in the file, if known upfront (can be omitted for streaming data)
comment	false	string	a free string
columns	true	a list of column definition objects	an list of objects describing the columns in more details (should have n_cols entries)
extra	false	JSON object	a JSON object that can freely be specified

A "column definition object" should have the following format

field	required	type	explanation
name	false	string	an optional name of the column
comment	false	string	a free string
datatype	true	string	An identifier for the datatype of the column

The datatype identifiers can be anything, but the following should at least be supported by the reader:


identifier	description
utf8	interpret the data as an UTF8-string (with fixed length, does not need to be terminated with '\0')
int	interpret the data as signed integer
uint	interpret the data as unsigned integer
float	interpret as floating point data (only possible for 4 or 8 byte fields)
bool	boolean value (0 = false, anything else = true)
numtype/div	"div" should be replaced by any floating point number. The data is then first interpreted as "numtype" (int, uint or float) and then divided by "div". This is mainly useful for storing decimal numbers in integer fields.
unix_timestamp(numtype/div)	Interprets the data as specified within brackets (so any numtype, with optional divider), but takes into account that it is a unix_timestamp (offset seconds since the Unix Epoch on January 1st, 1970 at UTC)

Other, unknown datatype identifiers, should be treated as opaque binary data (bytes) by the reader.

@@ Regel 118: / Regel 118: @@
 |"div" should be replaced by any floating point number. The data is then first interpreted as "numtype" (int, uint or float) and then divided by "div". This is mainly useful for storing decimal numbers in integer fields.
 |-
-|unix_timestamp
+|unix_timestamp(numtype/div)
-|Interprets the data as "uint" , but takes into account that it is a unix_timestamp (offset seconds since the Unix Epoch on January 1st, 1970 at UTC)
+|Interprets the data as specified within brackets (so any numtype, with optional divider), but takes into account that it is a unix_timestamp (offset seconds since the Unix Epoch on January 1st, 1970 at UTC)
 |}
 Other, unknown datatype identifiers, should be treated as opaque binary data (bytes) by the reader.