The DOT Format: The DOT Format : Revision 2

Author:
=======
Anoop Kumar Narayanan
anoop (dot) kn (at) gmail (dot) com
anoop (dot) kn (at) live (dot) in

Revision:
=========
2 - 20/09/2017

Purpose:
========
Easily readable.
Easily editable.
Easily portable.
Growable.
Appendable.
Able to represent hierarchical data.
Compatible with XML.
Data links to other nodes and attributes.
Requires no parser but can have one.
Its a data representation method and not a script.

Description
===========

The dot format is intended to be simple to understand, easily readable, portable while able to represent hierarchical data. Unlike JSON and XML formats, the dot format extensively relies on a line based information where each line represents data on a particular level. Each line is separated by a single '\n' and not '\r\n'. Empty lines are dropped and are not considered as data. Each tag starts with a '.' and the first set of lines without a '.' represents some configuration information which has the same format as a DOT line. Comment lines starts with a space. Each tag is followed by attribute value pair. The node specific data is represented with the attribute name '.'. Spaces are used as attribute and node specific value separator. The underscore is used as spaces so the there is no need for any demarcation. '*' or as Asterisk is used as a pointer to node data or an attribute. Multilevel pointers are not supported. In order to represent data of a child, the data in the next line having the tag should has to be preceded with an extra dot. The document cannot have multiple root nodes, if present the data will be appended to the original root node, the name of the new root node will be discarded.

Escapes:
========
Version 1 of the dot format. The '`' is used as escaping character, the reason being '\' is used as path separator in on windows devices.

Colon is represented as '`:'.
Low dash is represented as '`_' also called underscore, underline, underbar, low line)
Asterisk is represented as '`*'.
Period is represented as '`.'.
Newline is represented as '`n'.
Tab is represented as '`t'.
At is represented as '`@'.
Hash is represented as '`#'.
Question is represented as '`?'.
Tilde is represented as '`~'.
Comma is represented as '`,'.
GT is represented as '`>'.
LT is represented as '`<'.
Equals is represented as '`='.
Plus is represented as '`+'.
Minus is represented as '`-'.
Backquote is represented as '``'.

The '.' is '.' except when the line starts with it or is immediately followed by a ':'.

Parsing:
========
It can be parsed with simple string operations suchas readline(), string split() and string substitute(). Hence technically there is no need for a library as such.

Example of XML:
===============

<html>
<head>
<title>
This is a title.
</title>
</head>
<body class="bodyclass">
This is a body.
<h1>
This is a header1 line.
</h1>
This is also a body.
</body>
</html>

Example of DOT representing the above data:

.html
..head
...title .:This_is_a_title.
..body class:bodyclass .:This_is_a_body.
...h1 .:This_is_a_header1_line.
.. .:This_is_also_a_body.

[Correct Representation, will create two textnode within the same body node by making use of the last node on the same level]

or the explicit representation (this will not create the same output as the previous example)

.html
..head
...title .:This_is_a_title.
..body class:bodyclass .:This_is_a_body.
...h1 .:This_is_a_header1_line.
..body .:This_is_also_a_body.

[Incorrect Representation, will create two body nodes]

or with comments and configuration

version:1.0
author:Anoop
This is a comment
This is also a comment
This is also a comment
.html
..head
...title .:This_is_a_title.
..body @:bodymarker1 class:bodyclass .:This_is_a_body.
...h1 .:This_is_a_header1_line.
This is also a comment
This is also a comment
This is also a comment
..body @:bodymarker1 .:This_is_also_a_body.

[Correct Representation, will create two textnode within the same body node by making use of marker]

Special Attributes:
===================

. - Node data
@ - Unique Marker
# - Common Tag
- - Delete node(s) or attribute(s)
+ - Append attribute(s)

Examples:
---------
.:This_is_node_data_that_is_seperated_by_low_dash.
@:UniqueMarker1278940 equivalent to id in HTML/XHTML
@:3456789
#:ram,ddr4,ddr4_2400Mhz equivalent to class in HTML/XHTML
-:UniqueMarker1278940
+:*3456789,attr1,HelloWorld

Special Operators:
==================
Should be the first character after the attribute value pair.

$ - Associate a node or an attribute traced from the root node to the child node or its attribute
* - Associate a node or an attribute of a node marked with @ to the attribute
? - Associate a set of nodes tagged with #tag to the attribute
~ - Associate a set of nodes not tagged with #tag to the attribute, invert tag selection

Examples:
---------
result1:$html,body,h1 root->child->child
result2:$html,body,:bodyclass root->child->attribute
result3:*3456789,:attr1 getNodeWithMarker("3456789")->attribute
result4:*3456789 getNodeWithMarker("3456789")
result5:?ddr4_2400Mhz getNodesWithTag("ddr4_2400Mhz")
result6:~ddr4_2400Mhz,ddr4 getInvertedNodesWithTagInSuperset("ddr4_2400Mhz", "ddr4" )
result7:~ddr4_2400Mhz,ram getInvertedNodesWithTagInSuperset("ddr4_2400Mhz", "ram" )

The DOT Format

Tuesday, 19 September 2017

The DOT Format : Revision 2

0 Comments:

Post a Comment

About Me

Previous Posts