Zinc

Overview

Zinc stands for "Zinc Is Not CSV". Zinc is a plaintext syntax for serializing Haystack grids using a souped up CSV format. Unlike CSV, Zinc supports typed scalar values (such as Bool, Int, Float, Str, Date, etc) and arbitrary meta-data at the grid and column level. Unlike JSON, Zinc results in much higher compression for tabular data.

Zinc is represented by the def ZincFile.

Literals

The basic syntax of Zinc uses a custom literal syntax for each type:

  • Null: N
  • Marker: M
  • Remove: R
  • NA: NA
  • Bool: T or F (for true, false)
  • Number: 1, -34, 10_000, 5.4e-45, 9.23kg, 74.2°F, 4min, INF, -INF, NaN
  • Str: "hello", "foo\nbar\" (uses all standard escape chars as C like languages)
  • Uri: http://project-haystack.com/
  • Ref: @17eb0f3a-ad607713, @xyz "Display Name"
  • Symbol: ^hot-water
  • Date: 2010-03-13 (YYYY-MM-DD)
  • Time: 08:12:05 (hh:mm:ss.FFF)
  • DateTime: 2010-03-11T23:55:00-05:00 New_York or 2009-11-09T15:39:00Z
  • Coord: C(37.55,-77.45)
  • XStr: Type("value")
  • List: [1, 2, 3]
  • Dict: {dis:"Building" site area:35000ft²}
  • Grid: <<ver:"3.0" ... >>

Syntax

Every grid has one line of meta-data applied to the entire grid, followed by one line of column definitions, then zero or more lines of rows. Each line is separated by a "\n" newline character.

The meta-data line must always begin with a ver tag and a value of "3.0". Let's look at a simple example:

ver:"3.0"
firstName,bday
"Jack",1973-07-23
"Jill",1975-11-15

Note the first line defines the grid meta-data, which is just the version tag. The second line defines two columns named firstName and bday. There are two data rows each with a Str value for firstName and a Date value for bday. Every row must define a cell value for each column.

Metadata may be specified on the grid itself or on each column as a set of name/value tags. Tags are specified as "name: val" or if value is omitted, then it is a marker tag. Tags are separated by a space. Here is an example:

ver:"3.0" database:"test" dis:"Site Energy Summary"
siteName dis:"Sites", val dis:"Value" unit:"kW"
"Site 1", 356.214kW
"Site 2", 463.028kW

It is common to have sparse tables where rows have a null value for a given column. This is indicated either using the N literal or by omitting a the cell entirely. For example these two rows are semantically identical:

"a",N,2,N,N,"z"
"a",,2,,,"z"

If there is only one column, then a null row must be represented with the N character.

Nested lists, dicts, or grids may be used for any meta data value or cell:

ver:"3.0"
type,val
"list",[1,2,3]
"dict",{dis:"Dict!" foo}
"grid",<<
  ver:"2.0"
  a,b
  1,2
  3,4
  >>
"scalar","simple string"

Nested dicts are optionally allowed to use a comma between name value pairs. However, commas are not allowed for grid and column meta-data.

Grammar

Grammar legend:

:=      is defined as
<x>     non-terminal
"x"     literal
[x]     optional
(x)     grouping
x+      one or more times
x*      zero or more times
x|x     or

The formal grammar for Zinc:

<grid>        :=  <gridMeta> <cols> [<row>]*
<gridMeta>    :=  <ver> <tagsNoComma> <nl>
<ver>         :=  "ver:" <str>  // must be "3.0"
<tagsNoComma> :=  <tag>*   // separated by one space (0x20)
<tagsCommaOk> :=  (<tag>, [","])*  // trailing comma allowed/optional
<tag>         :=  <tagMarker> | <tagPair>
<tagMarker>   :=  <id>  // val is assumed to be Marker
<tagPair>     :=  <id> ":" <val>
<cols>        :=  <col> ("," <col>)* <nl>
<col>         :=  <id> <tagsNoComma>
<row>         :=  <cell> ["," <cell>]* <nl>
<cell>        :=  <val>  // empty cell is same as null
<val>         :=  <scalar> | <list> | <dict> | <grid>
<list>        :=  "[" (<val> ",")* "]"  // trailing comma allowed/optional
<dict>        :=  "{" <tagsCommaOk> "}"
<grid>        :=  "<<" <grid> ">>"

Zinc tokens:

<id>          :=  <alphaLo> (<alphaLo> | <alphaHi> | <digit> | '_')*

<scalar>      :=  <null> | <marker> | <remove> | <na> | <bool> | <ref> | <symbol> | <str> |
                  <uri> | <number> | <date> | <time> | <dateTime> | <coord> | <xstr>

<null>        := "N"
<marker>      := "M"
<remove>      := "R"
<na>          := "NA"
<bool>        := "T" | "F"

<symbol>      := "^" <refChar>+
<ref>         := "@" <refChar>+ [ " " <str> ]
<refChar>     := <alpha> | <digit> | "_" | ":" | "-" | "." | "~"

<str>         := """ <strChar>* """
<uri>         := "`" <uriChar>* "`"
<strChar>     := <unicodeChar> | <strEscChar>
<uriChar>     := <unicodeChar> | <uriEscChar>
<unicodeChar> := any 16-bit Unicode char >= 0x20 (except str/uri quote)
<strEscChar>  := "\b" | "\f" | "\n" | "\r" | "\r" | "\t" | "\"" | "\\" | "\$" | <uEscChar>
<uriEscChar>  := "\:" | "\/" | "\?" | "\#" | "\[" | "\]" | "\@" | "\`" | "\\" | "\&" | "\=" | "\;" | <uEscChar>
<uEscChar>    := "\u" <hexDigit> <hexDigit> <hexDigit> <hexDigit>

<xstr>        := <xstrType> "(" <str> ")"
<xstrType>    := <alphaHi> (<alphaLo> | <alphaHi> | <digit> | '_')*

<number>      := <decimal> | "INF" | "-INF" | "NaN"
<decimal>     := ["-"] <digits> ["." <digits>] [<exp>] [<unit>]
<exp>         := ("e"|"E") ["+"|"-"] <digits>
<unit>        := <unitChar>*
<unitChar>    := <alpha> | "%" | "_" | "/" | "$" | any char > 128  // see Units

<date>        := YYYY-MM-DD
<time>        := hh:mm:ss.FFFFFFFFF
<dateTime>    := YYYY-MM-DD'T'hh:mm:ss.FFFFFFFFFz zzzz

<coord>       := "C(" <coordDeg> "," <coordDeg> ")"
<coordDeg>    := ["-"] <digits> ["." <digits>]

<alphaLo>     := ('a' - 'z')
<alphaHi>     := ('A' - 'Z')
<alpha>       := <alphaLo> | <alphaHi>
<digit>       := ('0' - '9')
<digits>      := <digit> (<digit> | "_")*
<hexDigit>    := ('a'-'f') | ('A'-'F') | digit

The space character 0x20 is allowed between tokens.

Notes

The following are notes for implementators:

Identifiers vs Keywords

Identifiers must start with a lower case letter. Keywords begin with an upper case letter: "N", "T", "F", "M", "NA", "INF", "NaN", etc

URIs

Escape chars in URIs are used to remove special meaning for reserved characters. For example if a filename contains the # character, then it must be escaped so that the # is not treated as a fragment identifier:

 `file \#2`

Parsers should be prepared to encounter and preserve the backslash in these cases.

Number Tokens

When parsing, a leading digit may be a number, date, time, or datetime. You can use the following technique to consume these scalars:

  1. consume all the various chars into a string
  2. if dashes and no colons must be date
  3. if colons and no dashes must be time
  4. if colons and dashes must be dateTime, check for Z or timezone
  5. must be number with optional unit

DateTime

DateTime scalars are encoded using both offset and the timezone name:

2010-11-28T07:23:02.773-08:00 Los_Angeles // negative offset and timezone
2010-11-28T23:19:29.741+08:00 Taipei      // positive offset and timezone
2010-11-28T18:21:58+03:00 GMT-3           // timezone may include '-'
2010-11-28T12:22:27-03:00 GMT+3           // timezone may include '+'
2010-01-08T05:00:00Z UTC                  // UTC example
2010-01-08T05:00:00Z                      // UTC may omit timezone name

Version History

Zinc 1.0

  • initial version
  • Bin format: Bin mime:"text/plain"

Zinc 2.0

  • change hex RecId syntax to @ Ref syntax
  • remove support for cell display strings and metadata
  • remove support for column display strings (use dis metadata tag)
  • update Bin format: Bin(text/plain)

Zinc 3.0

  • add nested lists, dicts, grids
  • add NA
  • add XStr
  • remove Bin format to use XStr syntax

Zinc 3.0 Haystack 4 features

  • Version remains the same "3.0"
  • Symbol literals
  • Allow commas in nested dict literals

Haxall 4.0.5 ∙ 24-Feb-2026 14:33 EST