- Index
- »
- docHaystack
- »
- Zinc
Zinc
Overview
Zinc stands for "Zinc Is Not CSV". Zinc is a plaintext syntax for serializing Haystack grids using a souped up CSV format. Unlike CSV, Zinc supports typed scalar values (such as Bool, Int, Float, Str, Date, etc) and arbitrary meta-data at the grid and column level. Unlike JSON, Zinc results in much higher compression for tabular data.
Zinc is represented by the def filetype:zinc.
Literals
The basic syntax of Zinc uses a custom literal syntax for each type:
- Null:
N - Marker:
M - Remove:
R - NA:
NA - Bool:
TorF(for true, false) - Number:
1,-34,10_000,5.4e-45,9.23kg,74.2°F,4min,INF,-INF,NaN - Str:
"hello","foo\nbar\"(uses all standard escape chars as C like languages) - Uri:
`http://project-haystack.com/` - Ref:
@17eb0f3a-ad607713,@xyz "Display Name" - Symbol:
^hot-water - Date:
2010-03-13(YYYY-MM-DD) - Time:
08:12:05(hh:mm:ss.FFF) - DateTime:
2010-03-11T23:55:00-05:00 New_Yorkor2009-11-09T15:39:00Z - Coord:
C(37.55,-77.45) - XStr:
Type("value") - List:
[1, 2, 3] - Dict:
{dis:"Building" site area:35000ft²} - Grid:
<<ver:"3.0" ... >>
Syntax
Every grid has one line of meta-data applied to the entire grid, followed by one line of column definitions, then zero or more lines of rows. Each line is separated by a "\n" newline character.
The meta-data line must always begin with a ver tag and a value of "3.0". Let's look at a simple example:
ver:"3.0" firstName,bday "Jack",1973-07-23 "Jill",1975-11-15
Note the first line defines the grid meta-data, which is just the version tag. The second line defines two columns named firstName and bday. There are two data rows each with a Str value for firstName and a Date value for bday. Every row must define a cell value for each column.
Metadata may be specified on the grid itself or on each column as a set of name/value tags. Tags are specified as "name: val" or if value is omitted, then it is a marker tag. Tags are separated by a space. Here is an example:
ver:"3.0" database:"test" dis:"Site Energy Summary" siteName dis:"Sites", val dis:"Value" unit:"kW" "Site 1", 356.214kW "Site 2", 463.028kW
It is common to have sparse tables where rows have a null value for a given column. This is indicated either using the N literal or by omitting a the cell entirely. For example these two rows are semantically identical:
"a",N,2,N,N,"z" "a",,2,,,"z"
If there is only one column, then a null row must be represented with the N character.
Nested lists, dicts, or grids may be used for any meta data value or cell:
ver:"3.0"
type,val
"list",[1,2,3]
"dict",{dis:"Dict!" foo}
"grid",<<
ver:"2.0"
a,b
1,2
3,4
>>
"scalar","simple string"
Nested dicts are optionally allowed to use a comma between name value pairs. However, commas are not allowed for grid and column meta-data.
Grammar
Grammar legend:
:= is defined as <x> non-terminal "x" literal [x] optional (x) grouping x+ one or more times x* zero or more times x|x or
The formal grammar for Zinc:
<grid> := <gridMeta> <cols> [<row>]*
<gridMeta> := <ver> <tagsNoComma> <nl>
<ver> := "ver:" <str> // must be "3.0"
<tagsNoComma> := <tag>* // separated by one space (0x20)
<tagsCommaOk> := (<tag>, [","])* // trailing comma allowed/optional
<tag> := <tagMarker> | <tagPair>
<tagMarker> := <id> // val is assumed to be Marker
<tagPair> := <id> ":" <val>
<cols> := <col> ("," <col>)* <nl>
<col> := <id> <tagsNoComma>
<row> := <cell> ["," <cell>]* <nl>
<cell> := <val> // empty cell is same as null
<val> := <scalar> | <list> | <dict> | <grid>
<list> := "[" (<val> ",")* "]" // trailing comma allowed/optional
<dict> := "{" <tagsCommaOk> "}"
<grid> := "<<" <grid> ">>"
Zinc tokens:
<id> := <alphaLo> (<alphaLo> | <alphaHi> | <digit> | '_')*
<scalar> := <null> | <marker> | <remove> | <na> | <bool> | <ref> | <symbol> | <str> |
<uri> | <number> | <date> | <time> | <dateTime> | <coord> | <xstr>
<null> := "N"
<marker> := "M"
<remove> := "R"
<na> := "NA"
<bool> := "T" | "F"
<symbol> := "^" <refChar>+
<ref> := "@" <refChar>+ [ " " <str> ]
<refChar> := <alpha> | <digit> | "_" | ":" | "-" | "." | "~"
<str> := """ <strChar>* """
<uri> := "`" <uriChar>* "`"
<strChar> := <unicodeChar> | <strEscChar>
<uriChar> := <unicodeChar> | <uriEscChar>
<unicodeChar> := any 16-bit Unicode char >= 0x20 (except str/uri quote)
<strEscChar> := "\b" | "\f" | "\n" | "\r" | "\r" | "\t" | "\"" | "\\" | "\$" | <uEscChar>
<uriEscChar> := "\:" | "\/" | "\?" | "\#" | "\[" | "\]" | "\@" | "\`" | "\\" | "\&" | "\=" | "\;" | <uEscChar>
<uEscChar> := "\u" <hexDigit> <hexDigit> <hexDigit> <hexDigit>
<xstr> := <xstrType> "(" <str> ")"
<xstrType> := <alphaHi> (<alphaLo> | <alphaHi> | <digit> | '_')*
<number> := <decimal> | "INF" | "-INF" | "NaN"
<decimal> := ["-"] <digits> ["." <digits>] [<exp>] [<unit>]
<exp> := ("e"|"E") ["+"|"-"] <digits>
<unit> := <unitChar>*
<unitChar> := <alpha> | "%" | "_" | "/" | "$" | any char > 128 // see Units
<date> := YYYY-MM-DD
<time> := hh:mm:ss.FFFFFFFFF
<dateTime> := YYYY-MM-DD'T'hh:mm:ss.FFFFFFFFFz zzzz
<coord> := "C(" <coordDeg> "," <coordDeg> ")"
<coordDeg> := ["-"] <digits> ["." <digits>]
<alphaLo> := ('a' - 'z')
<alphaHi> := ('A' - 'Z')
<alpha> := <alphaLo> | <alphaHi>
<digit> := ('0' - '9')
<digits> := <digit> (<digit> | "_")*
<hexDigit> := ('a'-'f') | ('A'-'F') | digit
The space character 0x20 is allowed between tokens.
Notes
The following are notes for implementators:
Identifiers vs Keywords
Identifiers must start with a lower case letter. Keywords begin with an upper case letter: "N", "T", "F", "M", "NA", "INF", "NaN", etc
URIs
Escape chars in URIs are used to remove special meaning for reserved characters. For example if a filename contains the # character, then it must be escaped so that the # is not treated as a fragment identifier:
`file \#2`
Parsers should be prepared to encounter and preserve the backslash in these cases.
Number Tokens
When parsing, a leading digit may be a number, date, time, or datetime. You can use the following technique to consume these scalars:
- consume all the various chars into a string
- if dashes and no colons must be date
- if colons and no dashes must be time
- if colons and dashes must be dateTime, check for
Zor timezone - must be number with optional unit
DateTime
DateTime scalars are encoded using both offset and the timezone name:
2010-11-28T07:23:02.773-08:00 Los_Angeles // negative offset and timezone 2010-11-28T23:19:29.741+08:00 Taipei // positive offset and timezone 2010-11-28T18:21:58+03:00 GMT-3 // timezone may include '-' 2010-11-28T12:22:27-03:00 GMT+3 // timezone may include '+' 2010-01-08T05:00:00Z UTC // UTC example 2010-01-08T05:00:00Z // UTC may omit timezone name
Version History
Zinc 1.0
- initial version
- Bin format:
Bin mime:"text/plain"
Zinc 2.0
- change hex RecId syntax to @ Ref syntax
- remove support for cell display strings and metadata
- remove support for column display strings (use dis metadata tag)
- update Bin format:
Bin(text/plain)
Zinc 3.0
- add nested lists, dicts, grids
- add NA
- add XStr
- remove Bin format to use XStr syntax
Zinc 3.0 Haystack 4 features
- Version remains the same "3.0"
- Symbol literals
- Allow commas in nested dict literals