fa(3)



grammar::fa(3tcl)    Finite automaton operations and usage   grammar::fa(3tcl)

______________________________________________________________________________

NAME
       grammar::fa - Create and manipulate finite automatons

SYNOPSIS
       package require Tcl  8.4

       package require snit  1.3

       package require struct::list

       package require struct::set

       package require grammar::fa::op  ?0.2?

       package require grammar::fa  ?0.4?

       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??

       faName option ?arg arg ...?

       faName destroy

       faName clear

       faName = srcFA

       faName --> dstFA

       faName serialize

       faName deserialize serialization

       faName states

       faName state add s1 ?s2 ...?

       faName state delete s1 ?s2 ...?

       faName state exists s

       faName state rename s snew

       faName startstates

       faName start add s1 ?s2 ...?

       faName start remove s1 ?s2 ...?

       faName start? s

       faName start?set stateset

       faName finalstates

       faName final add s1 ?s2 ...?

       faName final remove s1 ?s2 ...?

       faName final? s

       faName final?set stateset

       faName symbols

       faName symbols@ s ?d?

       faName symbols@set stateset

       faName symbol add sym1 ?sym2 ...?

       faName symbol delete sym1 ?sym2 ...?

       faName symbol rename sym newsym

       faName symbol exists sym

       faName next s sym ?--> next?

       faName !next s sym ?--> next?

       faName nextset stateset sym

       faName is deterministic

       faName is complete

       faName is useful

       faName is epsilon-free

       faName reachable_states

       faName unreachable_states

       faName reachable s

       faName useful_states

       faName unuseful_states

       faName useful s

       faName epsilon_closure s

       faName reverse

       faName complete

       faName remove_eps

       faName trim ?what?

       faName determinize ?mapvar?

       faName minimize ?mapvar?

       faName complement

       faName kleene

       faName optional

       faName union fa ?mapvar?

       faName intersect fa ?mapvar?

       faName difference fa ?mapvar?

       faName concatenate fa ?mapvar?

       faName fromRegex regex ?over?

______________________________________________________________________________

DESCRIPTION
       This  package  provides a container class for finite automatons (Short:
       FA).  It allows the incremental definition of the automaton, its manip-
       ulation  and  querying  of  the definition.  While the package provides
       complex operations on the automaton (via package  grammar::fa::op),  it
       does  not have the ability to execute a definition for a stream of sym-
       bols.  Use the packages grammar::fa::dacceptor  and  grammar::fa::dexec
       for that.  Another package related to this is grammar::fa::compiler. It
       turns a FA into an executor class which has the definition  of  the  FA
       hardwired into it. The output of this package is configurable to suit a
       large number of different implementation languages and paradigms.

       For more information about what a finite automaton is see  section  FI-
       NITE AUTOMATONS.

API
       The package exports the API described here.

       ::grammar::fa faName ?=|:=|<--|as|deserialize src|fromRegex re ?over??
              Creates  a  new  finite  automaton with an associated global Tcl
              command whose name is faName. This command may be used to invoke
              various  operations  on the automaton. It has the following gen-
              eral form:

              faName option ?arg arg ...?
                     Option and the args determine the exact behavior  of  the
                     command.  See  section  FA METHODS for more explanations.
                     The new automaton will be empty if no src  is  specified.
                     Otherwise  it  will contain a copy of the definition con-
                     tained in the src.  The src has to be a FA object  refer-
                     ence  for all operators except deserialize and fromRegex.
                     The deserialize operator requires src to be  the  serial-
                     ization  of  a  FA instead, and fromRegex takes a regular
                     expression in the form a of a syntax  tree.  See  ::gram-
                     mar::fa::op::fromRegex for more detail on that.

FA METHODS
       All automatons provide the following methods for their manipulation:

       faName destroy
              Destroys  the automaton, including its storage space and associ-
              ated command.

       faName clear
              Clears out the definition of the automaton contained in  faName,
              but does not destroy the object.

       faName = srcFA
              Assigns  the  contents  of  the  automaton contained in srcFA to
              faName, overwriting any existing definition.  This  is  the  as-
              signment  operator  for automatons. It copies the automaton con-
              tained in the FA object srcFA over the automaton  definition  in
              faName.  The  old  contents of faName are deleted by this opera-
              tion.

              This operation is in effect equivalent to

                  faName deserialize [srcFA serialize]

       faName --> dstFA
              This is the  reverse  assignment  operator  for  automatons.  It
              copies  the  automation  contained in the object faName over the
              automaton definition in the object dstFA.  The old  contents  of
              dstFA are deleted by this operation.

              This operation is in effect equivalent to

                  dstFA deserialize [faName serialize]

       faName serialize
              This  method serializes the automaton stored in faName. In other
              words it returns a tcl value completely describing that  automa-
              ton.   This allows, for example, the transfer of automatons over
              arbitrary channels, persistence, etc.  This method is  also  the
              basis for both the copy constructor and the assignment operator.

              The  result of this method has to be semantically identical over
              all implementations of the grammar::fa interface. This  is  what
              will  enable us to copy automatons between different implementa-
              tions of the same interface.

              The result is a list of three elements with the following struc-
              ture:

              [1]    The constant string grammar::fa.

              [2]    A  list  containing the names of all known input symbols.
                     The order of elements in this list is not relevant.

              [3]    The last item in the list is a  dictionary,  however  the
                     order  of the keys is important as well. The keys are the
                     states of the serialized FA, and their order is the order
                     in which to create the states when deserializing. This is
                     relevant  to  preserve  the  order  relationship  between
                     states.

                     The value of each dictionary entry is a list of three el-
                     ements describing the state in more detail.

                     [1]    A boolean flag. If its  value  is  true  then  the
                            state is a start state, otherwise it is not.

                     [2]    A  boolean  flag.  If  its  value is true then the
                            state is a final state, otherwise it is not.

                     [3]    The last element is a  dictionary  describing  the
                            transitions  for  the  state. The keys are symbols
                            (or the empty string), and the values are sets  of
                            successor states.

       Assuming  the  following FA (which describes the life of a truck driver
       in a very simple way :)

                  Drive -- yellow --> Brake -- red --> (Stop) -- red/yellow --> Attention -- green --> Drive
                  (...) is the start state.

       a possible serialization is

                  grammar::fa \
                  {yellow red green red/yellow} \
                  {Drive     {0 0 {yellow     Brake}} \
                   Brake     {0 0 {red        Stop}} \
                   Stop      {1 0 {red/yellow Attention}} \
                   Attention {0 0 {green      Drive}}}

       A possible one, because I did not care about creation order here

       faName deserialize serialization
              This is the complement to serialize. It replaces  the  automaton
              definition in faName with the automaton described by the serial-
              ization value. The old contents of faName are  deleted  by  this
              operation.

       faName states
              Returns the set of all states known to faName.

       faName state add s1 ?s2 ...?
              Adds  the  states  s1,  s2,  et  cetera  to the FA definition in
              faName. The operation will fail any of the new states is already
              declared.

       faName state delete s1 ?s2 ...?
              Deletes the state s1, s2, et cetera, and all associated informa-
              tion from the FA definition in faName. The latter means that the
              information  about  in-  or  outbound  transitions is deleted as
              well. If the deleted state was a start or final state then  this
              information  is  invalidated as well. The operation will fail if
              the state s is not known to the FA.

       faName state exists s
              A predicate. It tests whether the state s is known to the FA  in
              faName.   The  result is a boolean value. It will be set to true
              if the state s is known, and false otherwise.

       faName state rename s snew
              Renames the state s to snew. Fails if s is not  a  known  state.
              Also fails if snew is already known as a state.

       faName startstates
              Returns the set of states which are marked as start states, also
              known as initial states.  See FINITE AUTOMATONS for explanations
              what this means.

       faName start add s1 ?s2 ...?
              Mark the states s1, s2, et cetera in the FA faName as start (aka
              initial).

       faName start remove s1 ?s2 ...?
              Mark the states s1, s2, et cetera in the FA faName as not  start
              (aka not accepting).

       faName start? s
              A  predicate.  It tests if the state s in the FA faName is start
              or not.  The result is a boolean value. It will be set  to  true
              if the state s is start, and false otherwise.

       faName start?set stateset
              A  predicate. It tests if the set of states stateset contains at
              least one start state. They operation will fail if the set  con-
              tains  an  element  which is not a known state.  The result is a
              boolean value. It will be set  to  true  if  a  start  state  is
              present in stateset, and false otherwise.

       faName finalstates
              Returns the set of states which are marked as final states, also
              known as accepting states.  See FINITE AUTOMATONS  for  explana-
              tions what this means.

       faName final add s1 ?s2 ...?
              Mark the states s1, s2, et cetera in the FA faName as final (aka
              accepting).

       faName final remove s1 ?s2 ...?
              Mark the states s1, s2, et cetera in the FA faName as not  final
              (aka not accepting).

       faName final? s
              A  predicate.  It tests if the state s in the FA faName is final
              or not.  The result is a boolean value. It will be set  to  true
              if the state s is final, and false otherwise.

       faName final?set stateset
              A  predicate. It tests if the set of states stateset contains at
              least one final state. They operation will fail if the set  con-
              tains  an  element  which is not a known state.  The result is a
              boolean value. It will be set  to  true  if  a  final  state  is
              present in stateset, and false otherwise.

       faName symbols
              Returns the set of all symbols known to the FA faName.

       faName symbols@ s ?d?
              Returns the set of all symbols for which the state s has transi-
              tions.  If the empty symbol is present then s has epsilon  tran-
              sitions.  If  two  states are specified the result is the set of
              symbols which have transitions from s to  t.  This  set  may  be
              empty  if  there  are  no  transitions between the two specified
              states.

       faName symbols@set stateset
              Returns the set of all symbols for which at least one  state  in
              the set of states stateset has transitions.  In other words, the
              union of [faName symbols@ s] for all states s in  stateset.   If
              the empty symbol is present then at least one state contained in
              stateset has epsilon transitions.

       faName symbol add sym1 ?sym2 ...?
              Adds the symbols sym1, sym2, et cetera to the FA  definition  in
              faName.  The  operation  will fail any of the symbols is already
              declared. The empty string is not allowed as  a  value  for  the
              symbols.

       faName symbol delete sym1 ?sym2 ...?
              Deletes the symbols sym1, sym2 et cetera, and all associated in-
              formation from the FA definition in  faName.  The  latter  means
              that  all transitions using the symbols are deleted as well. The
              operation will fail if any of the symbols is not  known  to  the
              FA.

       faName symbol rename sym newsym
              Renames  the  symbol  sym to newsym. Fails if sym is not a known
              symbol. Also fails if newsym is already known as a symbol.

       faName symbol exists sym
              A predicate. It tests whether the symbol sym is known to the  FA
              in  faName.   The  result  is a boolean value. It will be set to
              true if the symbol sym is known, and false otherwise.

       faName next s sym ?--> next?
              Define or query transition information.

              If next is specified, then the method will add a transition from
              the  state s to the successor state next labeled with the symbol
              sym to the FA contained in faName. The operation will fail if s,
              or  next  are not known states, or if sym is not a known symbol.
              An exception to the latter is that sym  is  allowed  to  be  the
              empty  string.  In  that  case  the new transition is an epsilon
              transition which will not consume input when traversed. The  op-
              eration  will also fail if the combination of (s, sym, and next)
              is already present in the FA.

              If next was not specified, then the method will return  the  set
              of  states  which can be reached from s through a single transi-
              tion labeled with symbol sym.

       faName !next s sym ?--> next?
              Remove one or more transitions from the Fa in faName.

              If next was specified then the single transition from the  state
              s  to the state next labeled with the symbol sym is removed from
              the FA. Otherwise all transitions originating in state s and la-
              beled with the symbol sym will be removed.

              The  operation  will  fail  if  s  and/or  next are not known as
              states. It will also fail if a non-empty sym  is  not  known  as
              symbol.  The  empty string is acceptable, and allows the removal
              of epsilon transitions.

       faName nextset stateset sym
              Returns the set of states which can be reached by a single tran-
              sition  originating  in  a state in the set stateset and labeled
              with the symbol sym.

              In other words, this is the union of [faName next s symbol]  for
              all states s in stateset.

       faName is deterministic
              A  predicate. It tests whether the FA in faName is a determinis-
              tic FA or not.  The result is a boolean value. It will be set to
              true if the FA is deterministic, and false otherwise.

       faName is complete
              A  predicate. It tests whether the FA in faName is a complete FA
              or not. A FA is complete if it has at least one  transition  per
              state  and symbol. This also means that a FA without symbols, or
              states is also complete.  The result is a boolean value. It will
              be set to true if the FA is deterministic, and false otherwise.

              Note:  When a FA has epsilon-transitions transitions over a sym-
              bol for a state S can be indirect, i.e. not attached directly to
              S,  but  to a state in the epsilon-closure of S. The symbols for
              such indirect transitions count when computing completeness.

       faName is useful
              A predicate. It tests whether the FA in faName is an  useful  FA
              or  not.  A FA is useful if all states are reachable and useful.
              The result is a boolean value. It will be set to true if the  FA
              is deterministic, and false otherwise.

       faName is epsilon-free
              A  predicate.  It  tests whether the FA in faName is an epsilon-
              free FA or not. A FA is epsilon-free if it has no epsilon  tran-
              sitions.  This  definition  means that all deterministic FAs are
              epsilon-free as well, and epsilon-freeness is a  necessary  pre-
              condition  for  deterministic'ness.   The  result  is  a boolean
              value. It will be set to true if the FA  is  deterministic,  and
              false otherwise.

       faName reachable_states
              Returns the set of states which are reachable from a start state
              by one or more transitions.

       faName unreachable_states
              Returns the set of states which are not reachable from any start
              state by any number of transitions. This is

                 [faName states] - [faName reachable_states]

       faName reachable s
              A  predicate.  It tests whether the state s in the FA faName can
              be reached from a start state by one or more  transitions.   The
              result  is  a boolean value. It will be set to true if the state
              can be reached, and false otherwise.

       faName useful_states
              Returns the set of states which are able to reach a final  state
              by one or more transitions.

       faName unuseful_states
              Returns  the  set  of states which are not able to reach a final
              state by any number of transitions. This is

                 [faName states] - [faName useful_states]

       faName useful s
              A predicate. It tests whether the state s in the  FA  faName  is
              able to reach a final state by one or more transitions.  The re-
              sult is a boolean value. It will be set to true if the state  is
              useful, and false otherwise.

       faName epsilon_closure s
              Returns  the  set of states which are reachable from the state s
              in the FA faName by one or more epsilon transitions, i.e transi-
              tions  over  the  empty symbol, transitions which do not consume
              input. This is called the epsilon closure of s.

       faName reverse

       faName complete

       faName remove_eps

       faName trim ?what?

       faName determinize ?mapvar?

       faName minimize ?mapvar?

       faName complement

       faName kleene

       faName optional

       faName union fa ?mapvar?

       faName intersect fa ?mapvar?

       faName difference fa ?mapvar?

       faName concatenate fa ?mapvar?

       faName fromRegex regex ?over?
              These methods provide more complex operations on the FA.  Please
              see  the  same-named commands in the package grammar::fa::op for
              descriptions of what they do.

EXAMPLES
FINITE AUTOMATONS
       For the mathematically inclined, a FA is a 5-tuple (S,Sy,St,Fi,T) where

       o      S is a set of states,

       o      Sy a set of input symbols,

       o      St is a subset of S, the set of start states, also known as ini-
              tial states.

       o      Fi  is a subset of S, the set of final states, also known as ac-
              cepting.

       o      T is a function from S x (Sy + epsilon) to {S},  the  transition
              function.   Here  epsilon  denotes the empty input symbol and is
              distinct from all symbols in Sy; and {S} is the set  of  subsets
              of  S.  In  other words, T maps a combination of State and Input
              (which can be empty) to a set of successor states.

       In computer theory a FA is most often shown as a graph where the  nodes
       represent  the states, and the edges between the nodes encode the tran-
       sition function: For all n in S' = T (s, sy) we have one  edge  between
       the  nodes  representing  s and n resp., labeled with sy. The start and
       accepting states are encoded through distinct visual markers, i.e. they
       are attributes of the nodes.

       FA's are used to process streams of symbols over Sy.

       A  specific  FA is said to accept a finite stream sy_1 sy_2 ... sy_n if
       there is a path in the graph of the FA beginning at a state in  St  and
       ending at a state in Fi whose edges have the labels sy_1, sy_2, etc. to
       sy_n.  The set of all strings accepted by the FA is the language of the
       FA. One important equivalence is that the set of languages which can be
       accepted by an FA is the set of regular languages.

       Another important concept is that of deterministic FAs. A FA is said to
       be  deterministic  if for each string of input symbols there is exactly
       one path in the graph of the FA beginning at the start state and  whose
       edges  are labeled with the symbols in the string.  While it might seem
       that non-deterministic FAs to have more power of recognition,  this  is
       not  so. For each non-deterministic FA we can construct a deterministic
       FA which accepts the same language  (-->  Thompson's  subset  construc-
       tion).

       While  one of the premier applications of FAs is in parsing, especially
       in the lexer stage (where symbols == characters), this is not the  only
       possibility by far.

       Quite a lot of processes can be modeled as a FA, albeit with a possibly
       large set of states. For these the notion of accepting states is  often
       less  or  not relevant at all. What is needed instead is the ability to
       act to state changes in the FA, i.e. to generate  some  output  in  re-
       sponse  to  the  input.  This transforms a FA into a finite transducer,
       which has an additional set OSy of output symbols  and  also  an  addi-
       tional  output function O which maps from "S x (Sy + epsilon)" to "(Osy
       + epsilon)", i.e a combination of state and input, possibly empty to an
       output symbol, or nothing.

       For  the  graph representation this means that edges are additional la-
       beled with the output symbol to write when this edge is traversed while
       matching input. Note that for an application "writing an output symbol"
       can also be "executing some code".

       Transducers are not handled by this package. They will  get  their  own
       package in the future.

BUGS, IDEAS, FEEDBACK
       This  document,  and the package it describes, will undoubtedly contain
       bugs and other problems.  Please report such in the category grammar_fa
       of  the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].  Please
       also report any ideas for enhancements you may have for either  package
       and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the out-
       put of diff -u.

       Note further that  attachments  are  strongly  preferred  over  inlined
       patches.  Attachments  can  be  made  by  going to the Edit form of the
       ticket immediately after its creation, and  then  using  the  left-most
       button in the secondary navigation bar.

KEYWORDS
       automaton, finite automaton, grammar, parsing, regular expression, reg-
       ular grammar, regular languages, state, transducer

CATEGORY
       Grammars and finite automata

COPYRIGHT
       Copyright (c) 2004-2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>

tcllib                                0.4                    grammar::fa(3tcl)

Man(1) output converted with man2html
list of all man pages