A Tree-sitter Syntax requires two things: a dynamic link library containing a Tree-sitter parser (.dll on Windows, .so on Linux, or .dylib on macOS), and a Syntax010 file. A Syntax010 file is a text file with extension .syntax010 that controls how symbols are mapped to syntax styles for syntax highlighting, how brace-matching is done, and how section lines are calculated. Installing a new Tree-sitter syntax involves compiling the dynamic link library and creating the Syntax010 file. To modify how syntax highlighting, brace-matching, or section lines are calculated just involves editing the Syntax010 file and see Editing a Syntax010 File below.
Installing a New Tree-sitter Syntax
A variety of Tree-sitter syntaxes are available on the internet and a good place to find syntaxes is to search on GitHub for tree-sitter-??? where ??? is the syntax name. If a syntax is found, download and extract the source code package. Creating the dynamic link library requires compiling just the files "src/parser.c" and optionally "src/scanner.c" or "src/scanner.cc". For example, to compile using gcc use:
gcc -o tree-sitter-???.so -shared src/parser.c src/scanner.c -I./src
If the parser.c file does not exist, install the Tree-sitter command line interface (tree-sitter-cli) and run:
tree-sitter generate
in the directory where the package was extracted. To create an entirely new Tree-sitter syntax involves writing a 'grammar.js' file which describes the rules for the syntax. See the site:
https://tree-sitter.github.io/
for more information. If installing a new syntax copy the file "queries/highlights.scm" to the file "???.syntax010". Next use the Syntax Options dialog and click the Add... button to add the "???.syntax010" file to 010 Editor.
Syntax Trees
When creating syntax files, it is important to understand that the Tree-sitter dynamic link libraries parse source code into a syntax tree. For example, for the C code:
void main()
{
return 5;
}
This code is parsed into the syntax tree:
(function_definition
type: (primitive_type)
declarator: (function_declarator
declarator: (identifier)
parameters: (parameter_list
"("
")"))
body: (compound_statement
"{"
(return_statement
"return"
(number_literal)
";")
"}"))
Syntax trees contain symbol names (e.g. function_definition or identifier), and strings (e.g. "{" or "return"). Field names as shown above in green are followed by a colon. Field names are conceptually named edges in a node graph and can be ignored in many syntax files. To view the syntax tree created for a section of source code see Printing Syntax Trees.
Editing a Syntax010 File
Open the Syntax010 file for the current syntax by clicking the 'View > Syntax > Edit Syntax File' menu option. To create a new Syntax010 file just create a new text file and save it with the extension ".syntax010". The Syntax010 file can be edited as a regular text file and once changes are made, click 'View > Syntax > Refresh Syntax' or click the icon in the top-right corner of the editor to apply the syntax changes to all open files that use that syntax. Syntax010 files are very similar to the "queries/highlights.scm" file that is included with many Tree-sitter packages but adds some extensions. Separate sections for brace matching, injections and code folding can all be included in a Syntax010 file using the '#section' keyword as described below. Syntax010 files use ';' to indicate comments and contain a series of S-expressions.
S-Expressions
A Syntax010 file consists of a series of S-expressions. The simplest S-expression contains a single symbol in brackets or a string, followed by '@' and a capture group name. For example:
(identifier) @variable
or
"return" @keyword
When a syntax is applied it would search the current file's syntax tree for all occurrences of the S-expression and every time an occurrence is found the node is returned along with the capture group. The capture groups may mean different things depending upon which #section of the Syntax010 file the S-expression is in. The first section of the Syntax010 file is always for syntax highlighting and the capture groups indicate syntax style names. In the example above, all nodes that match (identifier) would be colored as the "variable" syntax style and all "return" strings would be colored with the "keyword" syntax style.
S-expressions can search for child nodes as well using parenthesis. For example, to only match (function_declaractor) symbols that have an (identifier) symbol as a child use:
(function_declarator
(identifier) @function)
If this S-expression is found in the tree then (identifier) would be colored with the "function" syntax style instead of the "variable" syntax style. If multiple child symbols are specified then all the child symbols have to exist under the parent but note the children can exist in any order. For example:
(assignment_expression
(identifier) @variable
(function))
would only match if (identifier) and (function) were both children of (assignment_expression). Any number of levels of hierarchy can be specified using nested parentheses. Field names can also be specified using ':' and if a field name is present then it must match the tree in order for the whole S-expression to match. For example, the following S-expression would require the "left" and "right" field names be found:
(assignment_expression
left: (identifier) @variable
right: (function))
An alternation is written using square brackets '[' and ']' and only one of the symbols inside of the brackets have to match the syntax tree. For example, the following S-expression would color multiple strings using the "keyword" syntax style:
[
"return"
"for"
"while"
"if"
"else"
] @keyword
A regular expression can be specified to match a symbol by using the "#match?" keyword, followed by the capture name, followed by the regular expression to match. Note that in the regular expressions, a double '\\' should be used to indicate a single slash '\'. For example, the following S-expression would match all (identifier) symbols that are all uppercase and assign them the "constant" syntax style:
((identifier) @constant
(#match? @constant "^[A-Z][A-Z\\d_]*$"))
Editing Syntax Highlighting
The Syntax010 file contains a series of S-expressions. All S-expressions before the first '#section' line in the file are considered to be for syntax highlighting. The syntax will search the syntax tree of the current file for all occurrences of the S-expressions and each capture group marked with '@' indicates the name of a syntax style to color the found symbols or strings. For example:
(identifier) @variable
(function_declarator
(identifier) @function)
These statements would color all (identifier) symbols with the "function" syntax style if they have the parent (function_declarator), or would color them with the "variable" syntax style if they do not have the parent (function_declarator). The list of syntax style names is available at the end of the Theme/Color Options dialog.
Syntax styles names are multi-level, meaning they may have multiple periods (.) in their name. If a syntax style is not found, the last period is removed and the string afterwards and the syntax style is searched again. For example, if a syntax specifies the syntax style "function.builtin" but that syntax style is not found, the syntax highlighter will try to use the syntax style "function" instead. To use a separate color for "function.builtin" use the Theme/Color Options dialog to create the "function.builtin" syntax style and assign it a different color.
Editing Matching Braces/Tag
The 'match' section of the Syntax010 file controls which braces or tags are used for Highlighting Matching Braces/Tags. Use "#section match" in the Syntax010 file to define the match section and all S-expressions after that statement are used for matching rules. For example:
#section match
(parenthesized_expression
("(" @match.brace)
(")" @match.brace) ) @match
When defining which symbols or strings to match, use the capture group "@match.brace" with the two symbols or strings to match. Each of the symbols or strings should have a common parent. Also use the "@match" capture group for the whole expression. Once the file has been edit use 'View > Syntax > Refresh Syntax' to apply the changes. To determine the syntax symbol or string for a piece of text in a source code file, select the piece of text and click 'View > Syntax > Print Syntax Tree'.
Editing Section Lines/Folds
The text editor can draw vertical dotted lines between code sections and these lines are called section lines. In the future these sections will be foldable. Control where section lines are drawn by created a "#section fold" statement in the Syntax010 file. Each S-expression in the section controls which symbol in the file has section lines drawn. For example:
#section fold
(compound_statement) @fold
Mark symbols to draw section lines with the "@fold" capture group. Dotted lines are drawn vertically from the first non-whitespace character on the line where the symbol starts, to the line where the symbol ends. These lines can be hidden using the Syntax Menu.
Editing Injections
Use the 'injection' section of a Syntax010 file to specify how one syntax is injected into another syntax. Injections allow code from one syntax to be placed inside code from another and both syntaxes are highlighted according to their own rules. For example, HTML files may contain JavaScript or CSS code. To create an injection use "#section injection" in the Syntax010 file and afterwards define an S-expression using the "@injection.content" capture group. For example:
#section injection
((script_element
(raw_text) @injection.content)
(#set! injection.language "javascript"))
This will cause text inside the (raw_text) symbol to be treated as a different syntax. Use "#set! injection.language '<name>'" to specify which syntax to inject. The <name> must match one of the injection names listed in the Injection Name field of the Syntax Options dialog.
Printing Syntax Trees
A Tree-sitter syntax parses a text file into a syntax tree. This syntax tree can be printed to the Output tab of the Output Panel by clicking 'View > Syntax > Print Syntax Tree'. If no selection is made the syntax tree for the entire file is printing and this tree can be very large for some files. If a selection is made then only those nodes that intersect the selected range and the parent nodes are printed. For example, for the following code:
struct MYFILE {
int value;
} file;
the printed syntax tree would be:
Syntax Tree 010:
(source_file [0 35]
(declaration [0 31]
type: (struct_specifier [0 25]
"struct" [0 6]
body: (compound_statement [7 25]
"{" [7 8]
(declaration [13 23]
type: (primitive_type [13 16])
declarator: (identifier [17 22])
";" [22 23])
"}" [24 25]))
declarator: (identifier [26 30])
";" [30 31])
The numbers in square brackets '[' and ']' indicate the starting and ending byte address for each symbol or string in the file. Printing the tree is useful for debugging problems with syntaxes and for identifying symbols within a syntax. Selecting a single word and printing the syntax tree will show which symbol name corresponds to that word.
|