pt_astree(3)



pt::ast(3tcl)                    Parser Tools                    pt::ast(3tcl)

______________________________________________________________________________

NAME
       pt::ast - Abstract Syntax Tree Serialization

SYNOPSIS
       package require Tcl  8.5

       package require pt::ast  ?1.1?

       ::pt::ast verify serial ?canonvar?

       ::pt::ast verify-as-canonical serial

       ::pt::ast canonicalize serial

       ::pt::ast print serial

       ::pt::ast bottomup cmdprefix ast

       cmdprefix ast

       ::pt::ast topdown cmdprefix pe

       ::pt::ast equal seriala serialb

       ::pt::ast new0 s loc ?child...?

       ::pt::ast new s start end ?child...?

______________________________________________________________________________

DESCRIPTION
       Are  you  lost ?  Do you have trouble understanding this document ?  In
       that case please read the overview  provided  by  the  Introduction  to
       Parser  Tools.  This document is the entrypoint to the whole system the
       current package is a part of.

       This package provides commands to work with the serializations  of  ab-
       stract  syntax  trees  as managed by the Parser Tools, and specified in
       section AST serialization format.

       This is a supporting package in the Core Layer of Parser Tools.

       IMAGE: arch_core_support

API
       ::pt::ast verify serial ?canonvar?
              This command verifies that the content of serial is a valid  se-
              rialization  of  an abstract syntax tree and will throw an error
              if that is not the case. The result of the command is the  empty
              string.

              If  the  argument canonvar is specified it is interpreted as the
              name of a variable in the calling context. This variable will be
              written  to  if and only if serial is a valid regular serializa-
              tion. Its value will be a boolean, with True indicating that the
              serialization  is not only valid, but also canonical. False will
              be written for a valid, but non-canonical serialization.

              For the specification of serializations see the section AST  se-
              rialization format.

       ::pt::ast verify-as-canonical serial
              This  command  verifies  that  the  content of serial is a valid
              canonical serialization of an  abstract  syntax  tree  and  will
              throw  an  error if that is not the case. The result of the com-
              mand is the empty string.

              For the specification of canonical serializations see  the  sec-
              tion AST serialization format.

       ::pt::ast canonicalize serial
              This command assumes that the content of serial is a valid regu-
              lar serialization of an abstract syntax and will throw an  error
              if that is not the case.

              It  will then convert the input into the canonical serialization
              of the contained tree and return it as its result. If the  input
              is already canonical it will be returned unchanged.

              For  the  specification  of regular and canonical serializations
              see the section AST serialization format.

       ::pt::ast print serial
              This command assumes that the argument serial contains  a  valid
              serialization  of  an  abstract syntax tree and returns a string
              containing that tree in a human readable form.

              The exact format of this form is not specified and cannot be re-
              lied on for parsing or other machine-based activities.

              For  the specification of serializations see the section AST se-
              rialization format.

       ::pt::ast bottomup cmdprefix ast
              This command walks the abstract syntax tree ast from the  bottom
              up  to  the root, invoking the command prefix cmdprefix for each
              node. This implies that the children of a node N are handled be-
              fore N.

              The command prefix has the signature

              cmdprefix ast
                     I.e.  it  is  invoked  with the ast node the walk is cur-
                     rently at.

                     The result returned by the command prefix replaces ast in
                     the  node  it was a child of, allowing transformations of
                     the tree.

                     This also means that for all inner node the  contents  of
                     the children elements are the results of the command pre-
                     fix invoked for the children of this node.

       ::pt::ast topdown cmdprefix pe
              This command walks the abstract syntax tree ast  from  the  root
              down  to  the  leaves, invoking the command prefix cmdprefix for
              each node. This implies that the children of a node N  are  han-
              dled after N.

              The  command  prefix has the same signature as for bottomup, see
              above.

              The result returned by the command prefix is ignored.

       ::pt::ast equal seriala serialb
              This command tests the two sbstract syntax trees seriala and se-
              rialb  for  structural  equality. The result of the command is a
              boolean value. It will be set to true if the trees  are  identi-
              cal, and false otherwise.

              String  equality  is  usable  only if we can assume that the two
              trees are pure Tcl lists.

       ::pt::ast new0 s loc ?child...?
              This command command constructs the ast for a  nonterminal  node
              refering  refering to the symbol s at position loc in the input,
              and the set of child nodes child ..., from left right. The  lat-
              ter may be empty. The constructed node is returned as the result
              of the command. The end position is loc-1,  i.e.  one  character
              before  the  start. This type of node is possible for rules con-
              taining optional parts.

       ::pt::ast new s start end ?child...?
              This command command constructs the ast for a  nonterminal  node
              refering  to  the symbol s covering the range of positions start
              to end in the input, and the set of child nodes child ...,  from
              left right. The latter may be empty. The constructed node is re-
              turned as the result of the command.

AST SERIALIZATION FORMAT
       Here we specify the format used by the Parser Tools  to  serialize  Ab-
       stract  Syntax Trees (ASTs) as immutable values for transport, compari-
       son, etc.

       Each node in an AST represents a nonterminal symbol of a  grammar,  and
       the  range of tokens/characters in the input covered by it. ASTs do not
       contain terminal symbols, i.e. tokens/characters. These can  be  recov-
       ered from the input given a symbol's location.

       We  distinguish  between regular and canonical serializations.  While a
       tree may have more than one regular serialization only exactly  one  of
       them will be canonical.

       Regular serialization

              [1]    The  serialization of any AST is the serialization of its
                     root node.

              [2]    The serialization of any node is a Tcl list containing at
                     least three elements.

                     [1]    The  first  element is the name of the nonterminal
                            symbol stored in the node.

                     [2]    The second and third element are the locations  of
                            the  first  and last token in the token stream the
                            node represents (covers).

                            [1]    Locations are provided as non-negative  in-
                                   teger offsets from the beginning of the to-
                                   ken stream, with the first token  found  in
                                   the stream located at offset 0 (zero).

                            [2]    The  end  location  has  to  be equal to or
                                   larger than the start location.

                     [3]    All elements after the first three  represent  the
                            children  of the node, which are themselves nodes.
                            This means that the serializations of nodes  with-
                            out  children, i.e. leaf nodes, have exactly three
                            elements.  The children are  stored  in  the  list
                            with  the  leftmost child first, and the rightmost
                            child last.

       Canonical serialization
              The canonical serialization of an abstract syntax tree  has  the
              format  as specified in the previous item, and then additionally
              satisfies the constraints below, which make it unique among  all
              the possible serializations of this tree.

              [1]    The  string  representation of the value is the canonical
                     representation of a pure Tcl list. I.e. it does not  con-
                     tain superfluous whitespace.

   EXAMPLE
       Assuming the parsing expression grammar below

              PEG calculator (Expression)
                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
                  Sign       <- '-' / '+'                                     ;
                  Number     <- Sign? Digit+                                  ;
                  Expression <- Term (AddOp Term)*                            ;
                  MulOp      <- '*' / '/'                                     ;
                  Term       <- Factor (MulOp Factor)*                        ;
                  AddOp      <- '+'/'-'                                       ;
                  Factor     <- '(' Expression ')' / Number                   ;
              END;

       and the input string

               120+5
       then a parser should deliver the abstract syntax tree below (except for
       whitespace)

              set ast {Expression 0 4
                  {Factor 0 4
                      {Term 0 2
                          {Number 0 2
                              {Digit 0 0}
                              {Digit 1 1}
                              {Digit 2 2}
                          }
                      }
                      {AddOp 3 3}
                      {Term 4 4
                          {Number 4 4
                              {Digit 4 4}
                          }
                      }
                  }
              }

       Or, more graphical

       .nf +- Digit 0 0 | 1 |            | +- Term 0 2  ---  Number  0  2  -+-
       Digit   1   1   |   2   |                            |             |  |
       +- Digit 2 2 | 0 |                                        |  Expression
       0  4  ---  Factor  0  4 -+----------------------------- AddOp 3 3 | + |
       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will  undoubtedly  contain
       bugs  and other problems.  Please report such in the category pt of the
       Tcllib Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please  also
       report  any  ideas  for  enhancements  you  may have for either package
       and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the out-
       put of diff -u.

       Note  further  that  attachments  are  strongly  preferred over inlined
       patches. Attachments can be made by going  to  the  Edit  form  of  the
       ticket  immediately  after  its  creation, and then using the left-most
       button in the secondary navigation bar.

KEYWORDS
       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
       matching,  parser, parsing expression, parsing expression grammar, push
       down automaton, recursive descent, state, top-down  parsing  languages,
       transducer

CATEGORY
       Parsing and Grammars

COPYRIGHT
       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>

tcllib                                1.1                        pt::ast(3tcl)

Man(1) output converted with man2html
list of all man pages