re2c(1)



RE2C(1)                                                                RE2C(1)

NAME
       re2c - convert regular expressions to C/C++ code

SYNOPSIS
       re2c [OPTIONS] FILE

DESCRIPTION
       re2c is a lexer generator for C/C++. It finds regular expression speci-
       fications inside of C/C++ comments and replaces them with a  hard-coded
       DFA.  The  user must supply some interface code in order to control and
       customize the generated DFA.

OPTIONS
       -? -h --help
              Show a short help screen:

       -b --bit-vectors
              Implies -s. Use bit vectors as well to try to coax  better  code
              out  of  the  compiler. Most useful for specifications with more
              than a few keywords (e.g., for most programming languages).

       -c --conditions
              Used for (f)lex-like condition support.

       -d --debug-output
              Creates a parser that dumps information about the current  posi-
              tion  and the state the parser is in.  This is useful for debug-
              ging parser issues and states. If you use this switch, you  need
              to  define a YYDEBUG macro, which will be called like a function
              with two parameters: void YYDEBUG  (int  state,  char  current).
              The  first  parameter  receives  the  state or -1 and the second
              parameter receives the input at the current cursor.

       -D --emit-dot
              Emit Graphviz dot data, which can then be processed  with  e.g.,
              dot -Tpng input.dot > output.png. Please note that scanners with
              many states may crash dot.

       -e --ecb
              Generate a parser that supports EBCDIC. The generated  code  can
              deal  with  any character up to 0xFF. In this mode, re2c assumes
              an input character size of 1 byte. This switch  is  incompatible
              with -w, -x, -u, and -8.

       -f --storable-state
              Generate a scanner with support for storable state.

       -F --flex-syntax
              Partial support for flex syntax. When this flag is active, named
              definitions must be  surrounded  by  curly  braces  and  can  be
              defined  without  an  equal  sign and the terminating semicolon.
              Instead, names are treated as direct double quoted strings.

       -g --computed-gotos
              Generate a scanner that utilizes  GCC's  computed-goto  feature.
              That  is,  re2c  generates jump tables whenever a decision is of
              certain complexity (e.g., a lot of if conditions would be other-
              wise necessary). This is only usable with compilers that support
              this feature.  Note that this implies -b and that the complexity
              threshold  can  be  configured using the cgoto:threshold inplace
              configuration.

       -i --no-debug-info
              Do not output #line information. This is useful  when  you  want
              use  a CMS tool with re2c's output. You might want to do this if
              you do not want to impose re2c as a build requirement  for  your
              source.

       -o OUTPUT --output=OUTPUT
              Specify the OUTPUT file.

       -r --reusable
              Allows  reuse  of  scanner definitions with /*!use:re2c */ after
              /*!rules:re2c */.  In this mode, no /*!re2c */ block and exactly
              one  /*!rules:re2c  */ must be present.  The rules are saved and
              used by every /*!use:re2c */ block that follows.   These  blocks
              can  contain  inplace  configurations,  especially re2c:flags:e,
              re2c:flags:w,  re2c:flags:x,  re2c:flags:u,  and   re2c:flags:8.
              That  way  it  is  possible  to create the same scanner multiple
              times for different character types, different input mechanisms,
              or  different  output mechanisms.  The /*!use:re2c */ blocks can
              also contain additional rules that will be appended to  the  set
              of rules in /*!rules:re2c */.

       -s --nested-ifs
              Generate  nested ifs for some switches. Many compilers need this
              assist to generate better code.

       -t HEADER --type-header=HEADER
              Create a HEADER file that contains  types  for  the  (f)lex-like
              condition support. This can only be activated when -c is in use.

       -T --tags
              Enable submatch extraction with tags.

       -P --posix-captures
              Enable submatch extraction with POSIX-style capturing groups.

       -u --unicode
              Generate  a  parser that supports UTF-32. The generated code can
              deal with any valid Unicode character up to  0x10FFFF.  In  this
              mode,  re2c  assumes  an  input  character size of 4 bytes. This
              switch is incompatible with -e, -w, -x, and -8. This implies -s.

       -v --version
              Show version information.

       -V --vernum
              Show the version as a  number  in  the  MMmmpp  (Majorm,  minor,
              patch) format.

       -w --wide-chars
              Generate  a  parser  that supports UCS-2. The generated code can
              deal with any valid Unicode character up  to  0xFFFF.   In  this
              mode,  re2c  assumes  an  input  character size of 2 bytes. This
              switch is incompatible with -e, -x, -u, and -8. This implies -s.

       -x --utf-16
              Generate a parser that supports UTF-16. The generated  code  can
              deal  with  any  valid Unicode character up to 0x10FFFF. In this
              mode, re2c assumes an input character  size  of  2  bytes.  This
              switch is incompatible with -e, -w, -u, and -8. This implies -s.

       -8 --utf-8
              Generate  a  parser  that supports UTF-8. The generated code can
              deal with any valid Unicode character up to  0x10FFFF.  In  this
              mode,  re2c  assumes  an  input  character  size of 1 byte. This
              switch is incompatible with -e, -w, -x, and -u.

       --case-insensitive
              Makes all strings case insensitive. This makes "-quoted  expres-
              sions behave as '-quoted expressions.

       --case-inverted
              Invert  the  meaning  of  single and double quoted strings. With
              this switch, single quotes are case sensitive and double  quotes
              are case insensitive.

       --no-generation-date
              Suppress date output in the generated file.

       --no-lookahead
              Use  TDFA(0)  instead  of  TDFA(1).  This option only has effect
              with --tags or --posix-captures options.

       --no-optimize-tags
              Suppress optimization of tag variables (mostly used  for  debug-
              ging).

       --no-version
              Suppress version output in the generated file.

       --no-generation-date
              Suppress version output in the generated file.

       --encoding-policy POLICY
              Specify  how  re2c  must treat Unicode surrogates. POLICY can be
              one of the following: fail (abort with an error when a surrogate
              is  encountered),  substitute  (silently replace surrogates with
              the error code point 0xFFFD), ignore (treat surrogates as normal
              code  points). By default, re2c ignores surrogates (for backward
              compatibility). The Unicode standard says that standalone surro-
              gates  are invalid code points, but different libraries and pro-
              grams treat them differently.

       --input INPUT
              Specify re2c's input API. INPUT can be either default or custom.

       -S --skeleton
              Instead of embedding re2c-generated code into C/C++ source, gen-
              erate a self-contained program for the same DFA. Most useful for
              correctness and performance testing.

       --empty-class POLICY
              What to do if the user uses an empty character class. POLICY can
              be  one of the following: match-empty (match empty input: pretty
              illogical, but this is the default for  backwards  compatibility
              reasons), match-none (fail to match on any input), error (compi-
              lation error). Note that there are various ways to construct  an
              empty class, e.g., [], [^\x00-\xFF], [\x00-\xFF][\x00-\xFF].

       --dfa-minimization <table | moore>
              The  internal  algorithm  used  by  re2c  to  minimize  the  DFA
              (defaults to moore).  Both the table filling algorithm  and  the
              Moore  algorithm should produce the same DFA (up to states rela-
              beling).  The  table  filling  algorithm  is  much  simpler  and
              slower; it serves as a reference implementation.

       --eager-skip
              This  option  controls  when the generated lexer advances to the
              next input symbol  (that  is,  increments  YYCURSOR  or  invokes
              YYSKIP).   By  default this happens after transition to the next
              state, but --eager-skip option allows one  to  override  default
              behavior  and  advance  input position immediately after reading
              input symbol.  This option is implied by --no-lookahead.

       --dump-nfa
              Generate .dot representation of NFA and dump it on stderr.

       --dump-dfa-raw
              Generate .dot representation of DFA under construction and  dump
              it on stderr.

       --dump-dfa-det
              Generate  .dot  representation  of  DFA immediately after deter-
              minization and dump it on stderr.

       --dump-dfa-tagopt
              Generate .dot representation of DFA after tag optimizations  and
              dump it on stderr.

       --dump-dfa-min
              Generate  .dot representation of DFA after minimization and dump
              it on stderr.

       --dump-adfa
              Generate .dot representation of DFA after tunneling and dump  it
              on stderr.

       -1 --single-pass
              Deprecated. Does nothing (single pass is the default now).

       -W     Turn on all warnings.

       -Werror
              Turn  warnings  into errors. Note that this option alone doesn't
              turn on any warnings; it only affects those warnings  that  have
              been turned on so far or will be turned on later.

       -W<warning>
              Turn on a warning.

       -Wno-<warning>
              Turn off a warning.

       -Werror-<warning>
              Turn  on  a  warning  and  treat  it  as  an error (this implies
              -W<warning>).

       -Wno-error-<warning>
              Don't treat this particular warning as an  error.  This  doesn't
              turn off the warning itself.

       -Wcondition-order
              Warn  if  the generated program makes implicit assumptions about
              condition numbering. You should use either the -t, --type-header
              option or the /*!types:re2c*/ directive to generate a mapping of
              condition names to numbers and then use the autogenerated condi-
              tion names.

       -Wempty-character-class
              Warn  if a regular expression contains an empty character class.
              Rationally, trying to match an empty character  class  makes  no
              sense: it should always fail. However, for backwards compatibil-
              ity reasons, re2c allows empty character classes and treats them
              as  empty  strings.  Use  the --empty-class option to change the
              default behavior.

       -Wmatch-empty-string
              Warn if a regular expression in a rule is nullable  (matches  an
              empty  string).  If the DFA runs in a loop and an empty match is
              unintentional (the input position in not advanced manually), the
              lexer may get stuck in an infinite loop.

       -Wswapped-range
              Warn  if  the  lower  bound of a range is greater than its upper
              bound. The default  behavior  is  to  silently  swap  the  range
              bounds.

       -Wundefined-control-flow
              Warn  if  some input strings cause undefined control flow in the
              lexer (the faulty patterns are reported). This is the most  dan-
              gerous and most common mistake. It can be easily fixed by adding
              the default rule (*) (this rule has the lowest priority, matches
              any code unit, and consumes exactly one code unit).

       -Wunreachable-rules
              Warn about rules that are shadowed by other rules and will never
              match.

       -Wuseless-escape
              Warn if a symbol is escaped when it shouldn't be.   By  default,
              re2c  silently  ignores such escapes, but this may as well indi-
              cate a typo or error in the escape sequence.

       -Wnondeterministic-tags
              Warn if tag has  n-th  degree  of  nondeterminism,  where  n  is
              greater than 1.

INTERFACE CODE
       The  user  must  supply interface code either in the form of C/C++ code
       (macros, functions, variables, etc.) or in the form of INPLACE CONFIGU-
       RATIONS.   Which symbols must be defined and which are optional depends
       on the particular use case.

       YYBACKUP ()
              Backup current input position (used only with generic API).

       YYBACKUPCTX ()
              Backup current input position for trailing  context  (used  only
              with generic API).

       YYCONDTYPE
              In  -c mode, you can use -t to generate a file that contains the
              enumeration used as conditions. Each of the values refers  to  a
              condition of a rule set.

       YYCTXMARKER
              l-value  of  type  YYCTYPE *.  The generated code saves trailing
              context backtracking information in YYCTXMARKER. The  user  only
              needs  to  define  this  macro  if  a scanner specification uses
              trailing context in one or more of its regular expressions.

       YYCTYPE
              Type used to hold an input symbol (code unit). Usually  char  or
              unsigned char for ASCII, EBCDIC  or UTF-8, or unsigned short for
              UTF-16 or UCS-2, or unsigned int for UTF-32.

       YYCURSOR
              l-value of type YYCTYPE * that points to the current input  sym-
              bol.  The  generated  code  advances  YYCURSOR  as  symbols  are
              matched. On entry, YYCURSOR is assumed to  point  to  the  first
              character  of the current token. On exit, YYCURSOR will point to
              the first character of the following token.

       YYDEBUG (state, current)
              This is only needed if the -d flag was specified. It allows easy
              debugging  of  the  generated  parser  by calling a user defined
              function for every state. The function should have the following
              signature:  void  YYDEBUG  (int  state, char current). The first
              parameter receives the state or  -1  and  the  second  parameter
              receives the input at the current cursor.

       YYFILL (n)
              The  generated  code  "calls""  YYFILL (n) when the buffer needs
              (re)filling: at least n additional  characters  should  be  pro-
              vided. YYFILL (n) should adjust YYCURSOR, YYLIMIT, YYMARKER, and
              YYCTXMARKER as needed. Note that for  typical  programming  lan-
              guages n will be the length of the longest keyword plus one. The
              user can place a comment of the form /*!max:re2c*/ to  insert  a
              YYMAXFILL define set to the maximum length value.

       YYGETCONDITION ()
              This  define  is used to get the condition prior to entering the
              scanner code when using the -c switch. The value  must  be  ini-
              tialized with a value from the YYCONDTYPE enumeration type.

       YYGETSTATE ()
              The  user  only  needs  to  define this macro if the -f flag was
              specified. In that case, the generated code  "calls"  YYGETSTATE
              ()  at  the very beginning of the scanner in order to obtain the
              saved state. YYGETSTATE () must return  a  signed  integer.  The
              value  must be either -1, indicating that the scanner is entered
              for the first time, or a value previously  saved  by  YYSETSTATE
              (s).  In  the  second  case,  the scanner will resume operations
              right after where the last YYFILL (n) was called.

       YYLESSTHAN (n)
              Check if less than n input characters are left (used  only  with
              generic API).

       YYLIMIT
              An expression of type YYCTYPE * that marks the end of the buffer
              YYLIMIT[-1] is the last character in the buffer). The  generated
              code  repeatedly  compares YYCURSOR to YYLIMIT to determine when
              the buffer needs (re)filling.

       YYMARKER
              An l-value of type YYCTYPE *.  The generated  code  saves  back-
              tracking information in YYMARKER. Some simple scanners might not
              use this.

       YYMTAGP (t)
              Append current input position to the history of tag t.

       YYMTAGN (t)
              Append default value to the history of tag t.

       YYMAXFILL
              This will be automatically defined by  /*!max:re2c*/  blocks  as
              explained above.

       YYMAXNMATCH
              This will be automatically defined by /*!maxnmatch:re2c*/.

       YYPEEK ()
              Get current input character (used only with generic API).

       YYRESTORE ()
              Restore input position (used only with generic API).

       YYRESTORECTX ()
              Restore  input position from the value of trailing context (used
              only with generic API).

       YYRESTORETAG (t)
              Restore input position from the value of tag t (used  only  with
              generic API).

       YYSETCONDITION (c)
              This  define  is  used to set the condition in transition rules.
              This is only being used when -c is active and  transition  rules
              are being used.

       YYSETSTATE (s)
              The  user  only  needs  to  define this macro if the -f flag was
              specified. In that case, the generated code  "calls"  YYSETSTATE
              just before calling YYFILL (n). The parameter to YYSETSTATE is a
              signed integer that uniquely identifies the specific instance of
              YYFILL  (n)  that is about to be called. Should the user wish to
              save the state of the scanner and have YYFILL (n) return to  the
              caller,  all  he  has to do is store that unique identifier in a
              variable. Later, when the scanner is called again, it will  call
              YYGETSTATE  () and resume execution right where it left off. The
              generated code will contain both YYSETSTATE (s)  and  YYGETSTATE
              even if YYFILL (n) is disabled.

       YYSKIP ()
              Advance  input  position  to  the next character (used only with
              generic API).

       YYSTAGP (t)
              Save current input position to tag t  (used  only  with  generic
              API).

       YYSTAGN (t)
              Save default value to tag t (used only with generic API).

SYNTAX
       Code  for  re2c consists of a set of RULES, NAMED DEFINITIONS, CODE and
       INPLACE CONFIGURATIONS.

   RULES
       Each rule consist of a regular expression   (see  REGULAR  EXPRESSIONS)
       accompanied with a block of C/C++ code which is to be executed when the
       associated regular expression is matched. You can either start the code
       with  an  opening curly brace or the sequence :=. If you use an opening
       curly brace, re2c will count brace depth  and  stop  looking  for  code
       automatically.  Otherwise,  curly braces are not allowed and re2c stops
       looking for code at the first line that does not begin with whitespace.
       If two or more rules overlap, the first rule is preferred.

       There  is  one special rule that can be used instead of regular expres-
       sion: the default rule *.  Note that the default rule  *  differs  from
       [^]:  the  default  rule has the lowest priority, matches any code unit
       (either valid or invalid) and always consumes  exactly  one  character.
       [^], on the other hand, matches any valid code point (not the same as a
       code unit) and can consume multiple code units. In fact, when  a  vari-
       able-length  encoding  is  used, * is the only possible way to match an
       invalid input character.

       In general, all rules have the form:
          regular-expression-or-* code

       If -c is active, then each regular expression is preceded by a list  of
       comma-separated condition names. Besides the normal naming rules, there
       are two special cases: <*> (these rules are merged to  all  conditions)
       and <> (these rules cannot have an associated regular expression; their
       code is merged to all actions). Non-empty rules may furthermore specify
       the  new condition. In that case, re2c will generate the necessary code
       to change the condition automatically. Rules can use :=> as a  shortcut
       to  automatically  generate  code  that not only sets the new condition
       state but also continues execution with the new state. A shortcut  rule
       should  not  be used in a loop where there is code between the start of
       the loop and the re2c block unless re2c:cond:goto is  changed  to  con-
       tinue.  If some code is needed before all rules (though not before sim-
       ple jumps),  you can insert it with <!> pseudo-rules.
          <condition-list-or-*> regular-expression-or-* code

          <condition-list-or-*> regular-expression-or-* => condition code

          <condition-list-or-*> regular-expression-or-* :=> condition

          <> code

          <> => condition code

          <> :=> condition

          <!condition-list> code

          <!> code

   NAMED DEFINITIONS
       Named definitions are of the form:
          name = regular-expression;

       If -F is active, then named definitions are also of the form:
          name { regular-expression }

   INPLACE CONFIGURATIONS
       re2c:cgoto:threshold = 9;
              When -g is active, this value specifies the complexity threshold
              that  triggers  the generation of jump tables rather than nested
              ifs and decision bitfields. The threshold is compared against  a
              calculated  estimation  of  ifs  needed  where every used bitmap
              divides the threshold by 2.

       re2c:cond:divider = '/* *********************************** */';
              Allows one to customize the divider for  condition  blocks.  You
              can  use  @@  to  put the name of the condition or customize the
              placeholder using re2c:cond:divider@cond.

       re2c:cond:divider@cond = @@;
              Specifies the placeholder that will be replaced with the  condi-
              tion name in re2c:cond:divider.

       re2c:condenumprefix = yyc;
              Allows one to specify the prefix used for condition values. That
              is, the text to be prepended to condition  enum  values  in  the
              generated output file.

       re2c:cond:goto@cond = @@;
              Specifies  the placeholder that will be replaced with the condi-
              tion label in re2c:cond:goto.

       re2c:cond:goto = 'goto @@;';
              Allows one to customize the condition goto statements used  with
              :=> style rules. You can use @@ to put the name of the condition
              or customize the placeholder using re2c:cond:goto@cond. You  can
              also change this to continue;, which would allow you to continue
              with the next loop cycle including any code  between  your  loop
              start and your re2c block.

       re2c:condprefix = yyc;
              Allows one to specify the prefix used for condition labels. That
              is, the text to be prepended to condition labels in  the  gener-
              ated output file.

       re2c:define:YYBACKUPCTX = 'YYBACKUPCTX';
              Replaces YYBACKUPCTX identifier with the specified string.

       re2c:define:YYBACKUP = 'YYBACKUP';
              Replaces YYBACKUP identifier with the specified string.

       re2c:define:YYCONDTYPE = 'YYCONDTYPE';
              Enumeration used for condition support with -c mode.

       re2c:define:YYCTXMARKER = 'YYCTXMARKER';
              Replaces  the YYCTXMARKER placeholder with the specified identi-
              fier.

       re2c:define:YYCTYPE = 'YYCTYPE';
              Replaces the YYCTYPE placeholder with the specified type.

       re2c:define:YYCURSOR = 'YYCURSOR';
              Replaces the YYCURSOR placeholder with the specified identifier.

       re2c:define:YYDEBUG = 'YYDEBUG';
              Replaces the YYDEBUG placeholder with the specified identifier.

       re2c:define:YYFILL@len = '@@';
              Any occurrence of this text inside of  a  YYFILL  call  will  be
              replaced with the actual argument.

       re2c:define:YYFILL:naked = 0;
              Controls  the  argument  in the parentheses after YYFILL and the
              following semicolon. If zero, both the argument  and  the  semi-
              colon are omitted. If non-zero, the argument is generated unless
              re2c:yyfill:parameter is set to zero; the semicolon is generated
              unconditionally.

       re2c:define:YYFILL = 'YYFILL';
              Define  a  substitution  for  YYFILL. Note that by default, re2c
              generates an argument  in  parentheses  and  a  semicolon  after
              YYFILL. If you need to make YYFILL an arbitrary statement rather
              than a call, set re2c:define:YYFILL:naked to  a  non-zero  value
              and use re2c:define:YYFILL@len to set a placeholder for the for-
              mal parameter inside of your YYFILL body.

       re2c:define:YYGETCONDITION:naked = 0;
              Controls the parentheses  after  YYGETCONDITION.  If  zero,  the
              parentheses are omitted. If non-zero, the parentheses are gener-
              ated.

       re2c:define:YYGETCONDITION = 'YYGETCONDITION';
              Substitution for YYGETCONDITION. Note that by default, re2c gen-
              erates  parentheses after YYGETCONDITION. Set re2c:define:YYGET-
              CONDITION:naked to non-zero to omit the parentheses.

       re2c:define:YYGETSTATE:naked = 0;
              Controls the parentheses that follow YYGETSTATE.  If  zero,  the
              parentheses are omitted. If non-zero, they are generated.

       re2c:define:YYGETSTATE = 'YYGETSTATE';
              Substitution  for  YYGETSTATE. Note that by default, re2c gener-
              ates  parentheses  after  YYGETSTATE.   Set   re2c:define:YYGET-
              STATE:naked to non-zero to omit the parentheses.

       re2c:define:YYLESSTHAN = 'YYLESSTHAN';
              Replaces YYLESSTHAN identifier with the specified string.

       re2c:define:YYLIMIT = 'YYLIMIT';
              Replaces  the YYLIMIT placeholder with the specified identifier.
              needed.

       re2c:define:YYMARKER = 'YYMARKER';
              Replaces the YYMARKER placeholder with the specified identifier.

       re2c:define:YYMTAGN = 'YYMTAGN';
              Replaces YYMTAGN identifier with the specified string.

       re2c:define:YYMTAGP = 'YYMTAGP';
              Replaces YYMTAGP identifier with the specified string.

       re2c:define:YYPEEK = 'YYPEEK';
              Replaces YYPEEK identifier with the specified string.

       re2c:define:YYRESTORECTX = 'YYRESTORECTX';
              Replaces YYRESTORECTX identifier with the specified string.

       re2c:define:YYRESTORE = 'YYRESTORE';
              Replaces YYRESTORE identifier with the specified string.

       re2c:define:YYRESTORETAG = 'YYRESTORETAG';
              Replaces YYRESTORETAG identifier with the specified string.

       re2c:define:YYSETCONDITION@cond = '@@';
              Any occurrence of this text inside  of  YYSETCONDITION  will  be
              replaced with the actual argument.

       re2c:define:YYSETCONDITION:naked = 0;
              Controls  the  argument  in  parentheses and the semicolon after
              YYSETCONDITION. If zero, both the argument and the semicolon are
              omitted.  If  non-zero,  both the argument and the semicolon are
              generated.

       re2c:define:YYSETCONDITION = 'YYSETCONDITION';
              Substitution for YYSETCONDITION. Note that by default, re2c gen-
              erates  an  argument  in parentheses followed by semicolon after
              YYSETCONDITION. If you need to make YYSETCONDITION an  arbitrary
              statement   rather  than  a  call,  set  re2c:define:YYSETCONDI-
              TION:naked to non-zero and  use  re2c:define:YYSETCONDITION@cond
              to  denote  the  formal  parameter  inside of the YYSETCONDITION
              body.

       re2c:define:YYSETSTATE:naked = 0;
              Controls the argument in parentheses  and  the  semicolon  after
              YYSETSTATE.  If  zero, both argument and the semicolon are omit-
              ted. If non-zero, both the argument and the semicolon are gener-
              ated.

       re2c:define:YYSETSTATE@state = '@@';
              Any  occurrence  of  this  text  inside  of  YYSETSTATE  will be
              replaced with the actual argument.

       re2c:define:YYSETSTATE = 'YYSETSTATE';
              Substitution for YYSETSTATE. Note that by default,  re2c  gener-
              ates  an  argument  in parentheses followed by a semicolon after
              YYSETSTATE. If you need to make YYSETSTATE an  arbitrary  state-
              ment  rather  than  a  call, set re2c:define:YYSETSTATE:naked to
              non-zero and use re2c:define:YYSETSTATE@cond  to  denote  formal
              parameter inside of your YYSETSTATE body.

       re2c:define:YYSKIP = 'YYSKIP';
              Replaces YYSKIP identifier with the specified string.

       re2c:define:YYSTAGN = 'YYSTAGN';
              Replaces YYSTAGN identifier with the specified string.

       re2c:define:YYSTAGP = 'YYSTAGP';
              Replaces YYSTAGP identifier with the specified string.

       re2c:flags:8 or re2c:flags:utf-8
              Same as -8 --utf-8 command-line option.

       re2c:flags:b or re2c:flags:bit-vectors
              Same as -b --bit-vectors command-line option.

       re2c:flags:case-insensitive = 0;
              Same as --case-insensitive command-line option.

       re2c:flags:case-inverted = 0;
              Same as --case-inverted command-line option.

       re2c:flags:d or re2c:flags:debug-output
              Same as -d --debug-output command-line option.

       re2c:flags:dfa-minimization = 'moore';
              Same as --dfa-minimization command-line option.

       re2c:flags:eager-skip = 0;
              Same as --eager-skip command-line option.

       re2c:flags:e or re2c:flags:ecb
              Same as -e --ecb command-line option.

       re2c:flags:empty-class = 'match-empty';
              Same as --empty-class command-line option.

       re2c:flags:encoding-policy = 'ignore';
              Same as --encoding-policy command-line option.

       re2c:flags:g or re2c:flags:computed-gotos
              Same as -g --computed-gotos command-line option.

       re2c:flags:i or re2c:flags:no-debug-info
              Same as -i --no-debug-info command-line option.

       re2c:flags:input = 'default';
              Same as --input command-line option.

       re2c:flags:lookahead = 1;
              Same as inverted --no-lookahead command-line option.

       re2c:flags:optimize-tags = 1;
              Same as inverted --no-optimize-tags command-line option.

       re2c:flags:P or re2c:flags:posix-captures
              Same as -P --posix-captures command-line option.

       re2c:flags:s or re2c:flags:nested-ifs
              Same as -s --nested-ifs command-line option.

       re2c:flags:T or re2c:flags:tags
              Same as -T --tags command-line option.

       re2c:flags:u or re2c:flags:unicode
              Same as -u --unicode command-line option.

       re2c:flags:w or re2c:flags:wide-chars
              Same as -w --wide-chars command-line option.

       re2c:flags:x or re2c:flags:utf-16
              Same as -x --utf-16 command-line option.

       re2c:indent:string = '\t';
              Specifies  the  string to use for indentation. Requires a string
              that should contain only whitespace unless  you  need  something
              else for external tools. The easiest way to specify spaces is to
              enclose them in single or double quotes.  If you  do   not  want
              any indentation at all, you can simply set this to ''.

       re2c:indent:top = 0;
              Specifies  the  minimum amount of indentation to use. Requires a
              numeric value greater than or equal to zero.

       re2c:labelprefix = 'yy';
              Allows one to change the prefix of numbered labels. The  default
              is yy. Can be set any string that is valid in a label name.

       re2c:label:yyFillLabel = 'yyFillLabel';
              Overrides the name of the yyFillLabel label.

       re2c:label:yyNext = 'yyNext';
              Overrides the name of the yyNext label.

       re2c:startlabel = 0;
              If  set  to a non zero integer, then the start label of the next
              scanner block will be generated even if it  isn't  used  by  the
              scanner  itself.  Otherwise,  the normal yy0-like start label is
              only generated if needed. If set to a text value, then  a  label
              with  that text will be generated regardless of whether the nor-
              mal start label is used or not. This setting is reset to 0 after
              a start label has been generated.

       re2c:state:abort = 0;
              When  not  zero and the -f switch is active, then the YYGETSTATE
              block will contain a default case that aborts and a -1 case will
              be used for initialization.

       re2c:state:nextlabel = 0;
              Used  when  -f is active to control whether the YYGETSTATE block
              is followed by a yyNext: label line.  Instead of  using  yyNext,
              you  can  usually  also  use configuration startlabel to force a
              specific start label or default to yy0 as a start label. Instead
              of  using  a dedicated label, it is often better to separate the
              YYGETSTATE code from  the  actual  scanner  code  by  placing  a
              /*!getstate:re2c*/ comment.

       re2c:tags:expression = '@@';
              Allows one to customize the way re2c addresses tag variables: by
              default it emits expressions of the form yyt<N>, but this  might
              be  inconvenient  if  tag  variables  are defined as fields in a
              struct, or for any other reason require special accessors.   For
              example,  setting  re2c:tags:expression  =  p->@@ will result in
              p->yyt<N>.

       re2c:tags:prefix = 'yyt';
              Allows one to override prefix of tag variables.

       re2c:variable:yyaccept = yyaccept;
              Overrides the name of the yyaccept variable.

       re2c:variable:yybm = 'yybm';
              Overrides the name of the yybm variable.

       re2c:variable:yych = 'yych';
              Overrides the name of the yych variable.

       re2c:variable:yyctable = 'yyctable';
              When both -c and -g are active, re2c will use this  variable  to
              generate a static jump table for YYGETCONDITION.

       re2c:variable:yystable = 'yystable';
              Deprecated.

       re2c:variable:yytarget = 'yytarget';
              Overrides the name of the yytarget variable.

       re2c:yybm:hex = 0;
              If set to zero, a decimal table will be used. Otherwise, a hexa-
              decimal table will be generated.

       re2c:yych:conversion = 0;
              When this setting is non zero, re2c automatically generates con-
              version  code  whenever  yych  gets read. In this case, the type
              must be defined using re2c:define:YYCTYPE.

       re2c:yych:emit = 1;
              Set this to zero to suppress the generation of yych.

       re2c:yyfill:check = 1;
              This can be set to 0 to suppress the generations of YYCURSOR and
              YYLIMIT  based  precondition  checks. This option is useful when
              YYLIMIT + YYMAXFILL is always accessible.

       re2c:yyfill:enable = 1;
              Set this to zero to suppress the generation of YYFILL (n).  When
              using  this,  be  sure to verify that the generated scanner does
              not read beyond the available input, as allowing  such  behavior
              might introduce severe security issues to your programs.

       re2c:yyfill:parameter = 1;
              Controls  the argument in the parentheses that follow YYFILL. If
              zero, the argument is omitted.  If  non-zero,  the  argument  is
              generated unless re2c:define:YYFILL:naked is set to non-zero.

   REGULAR EXPRESSIONS
       "foo"  literal string "foo". ANSI-C escape sequences can be used.

       'foo'  literal string "foo" (case insensitive for characters [a-zA-Z]).
              ANSI-C escape sequences can be used.

       [xyz]  character class; in this case, the regular expression matches x,
              y, or z.

       [abj-oZ]
              character  class  with  a  range in it; matches a, b, any letter
              from j through o, or Z.

       [^class]
              inverted character class.

       r \ s  match any r which isn't s. r and s must be  regular  expressions
              which can be expressed as character classes.

       r*     zero or more occurrences of r.

       r+     one or more occurrences of r.

       r?     optional r.

       (r)    r; parentheses are used to override precedence.

       r s    r followed by s (concatenation).

       r | s  r or s (alternative).

       r / s  r  but  only  if it is followed by s. Note that s is not part of
              the matched text. This type  of  regular  expression  is  called
              "trailing context". Trailing context can only be at the end of a
              rule and cannot be part of a named definition.

       r{n}   matches r exactly n times.

       r{n,}  matches r at least n times.

       r{n,m} matches r at least n times, but not more than m times.

       .      match any character except newline.

       name   matches a named definition as specified by name only  if  -F  is
              off.  If  -F is active then this behaves like it was enclosed in
              double quotes and matches the string "name".

       @stag  save input position at which @stag matches in a  variable  named
              stag

       #mtag  save  all  input  positions at which #mtag matches in a variable
              named mtag (multiple positions are possible if #mtag is enclosed
              in a repetition subexpression that matches several times)

       Character  classes and string literals may contain octal or hexadecimal
       character definitions and the following set of  escape  sequences:  \a,
       \b,  \f,  \n,  \r, \t, \v, \\. An octal character is defined by a back-
       slash followed by its three octal  digits  (e.g.,  \377).   Hexadecimal
       characters  from  0  to 0xFF are defined by a backslash, a lower case x
       and two hexadecimal digits (e.g., \x12).  Hexadecimal  characters  from
       0x100  to  0xFFFF  are  defined  by a backslash, a lower case \u``or an
       upper case ``\X, and four hexadecimal digits (e.g., \u1234).  Hexadeci-
       mal  characters  from 0x10000 to 0xFFFFffff are defined by a backslash,
       an upper case \U, and eight hexadecimal digits (e.g., \U12345678).

       The only portable "any" rule is the default rule, *.

SUBMATCH EXTRACTION
       re2c supports two kinds of submatch extraction.

       The first option is -P  --posix-captures:  it  enables  POSIX-compliant
       capturing  groups.   In  this  mode  parentheses in regular expressions
       denote the beginning and the end of capturing groups; the whole regular
       expression is group number zero.  The number of groups for the matching
       rule is stored in a variable yynmatch, and submatch results are  stored
       in yypmatch array.  Both yynmatch and yypmatch should be defined by the
       user; note that yypmatch size must be at least [yynmatch  *  2].   re2c
       provides  a  directive  /*!maxnmatch:re2c*/  that  defines  a  constant
       YYMAXNMATCH: the maximal value of yynmatch among all rules.  Note  that
       re2c  implements  POSIX-compliant  disambiguation:  each  subexpression
       matches as long as possible, and subexpressions that start  earlier  in
       regular expression have priority over those starting later.

       Second  option  is  -T --tags.  With this option one can use standalone
       tags of the form @stag and  #mtag  instead  of  capturing  parentheses,
       where stag and mtag are arbitrary used-defined names.  Tags can be used
       anywhere inside of a regular expression;  semantically  they  are  just
       position  markers.   Tags  of  the  form  @stag are called s-tags: they
       denote a single submatch value (the last input position where this  tag
       matched).  Tags of the form #mtag are called m-tags: they denote multi-
       ple submatch values (the whole history of  repetitions  of  this  tag).
       All  tags  should  be  defined by the user as variables with the corre-
       sponding names.  With standalone tags re2c uses leftmost greedy  disam-
       biguation:  submatch positions correspond to the leftmost matching path
       through the regular expression.

       With both --posix-captures and --tags options re2c generates  a  number
       of  tag variables that are used by the lexer to track multiple possible
       versions of each tag (multiple versions are caused by possible  ambigu-
       ity  of  submatch).  When a rule matches, ambiguity is resolved and all
       tags of this rule (or capturing parentheses, which are also implemented
       as  tags) are initialized with the values of appropriate tag variables.
       Note that there is no one-to-one correspondence between  tag  variables
       and  tags:  the same tag variable may be reused for different tags, and
       one tag may require multiple tag variables to hold  all  its  ambiguous
       versions.   The  exact  number of tag variables is unknown to the user;
       this number is determined by re2c.  However, tag  variables  should  be
       defined  by  the  user, because it might be necessary to update them in
       YYFILL   and   store   them   between   invocations   of   lexer   with
       --storable-state    option.    Therefore   re2c   provides   directives
       /*!stags:re2c ... */ and /*!mtags:re2c ...  */  that  can  be  used  to
       declare, initialize and manipulate tag variables.

       S-tags must support the following operations:

       o save  input  position  to  s-tag:  t  = YYCURSOR with default API, or
         user-defined operation YYSTAGP (t) with generic API

       o save  default  value  to  s-tag:  t  =  NULL  with  default  API,  or
         user-defined operation YYSTAGN (t) with generic API

       o copy one s-tag to another: t1 = t2

       M-tags must support the following operations:

       o append  input  position  to m-tag: user-defined operation YYMTAGP (t)
         with both default and generic API

       o append default value to m-tag:  user-defined  operation  YYMTAGN  (t)
         with both default and generic API

       o copy one m-tag to another: t1 = t2

       S-tags  can  be  implemented  as  scalar  values (pointers or offsets).
       M-tags need a more complex representation, as  they  need  to  store  a
       sequence  of tag values.  The most naive and inefficient representation
       of m-tag is a list (array, vector) of tag values; a more efficient rep-
       resentation  is  to  store  all  m-tags in a prefix-tree represented as
       array of nodes (v, p), where v is tag value and p is a pointer to  par-
       ent node.

       For  further details see http://re2c.org/examples/examples.html page on
       the website or re2c/examples/ subdirectory of re2c distribution.

SCANNER WITH STORABLE STATES
       When the -f flag is specified, re2c generates a scanner that can  store
       its  current  state,  return to its caller, and later resume operations
       exactly where it left off.

       The default mode of operation in re2c is  a  "pull"  model,  where  the
       scanner  asks  for extra input whenever it needs it. However, this mode
       of operation assumes that the scanner is the  "owner"  of  the  parsing
       loop, and that may not always be convenient.

       Typically,  if  there  is  a  preprocessor  ahead of the scanner in the
       stream, or for that matter, any other procedural source  of  data,  the
       scanner  cannot  "ask"  for  more  data unless both the scanner and the
       source live in separate threads.

       The -f flag is useful exactly for situations like that: it  lets  users
       design  scanners  that work in a "push" model, i.e., a model where data
       is fed to the scanner chunk by chunk. When the scanner runs out of data
       to  consume,  it  stores its state and returns to the caller. When more
       input data is fed to the scanner, it resumes operations  exactly  where
       it left off.

       Changes needed compared to the "pull" model:

       o The  user  has  to  supply  macros named YYSETSTATE () and YYGETSTATE
         (state).

       o The -f option inhibits declaration of yych and yyaccept, so the  user
         has to declare them and save and restore them where required.  In the
         examples/push_model/push.re example, these are declared as fields  of
         a  (C++)  class of which the scanner is a method, so they do not need
         to be saved/restored explicitly. For C, they  could,  e.g.,  be  made
         macros  that select fields from a structure passed in as a parameter.
         Alternatively, they could be declared as local variables, saved  with
         YYFILL  (n)  when it decides to return and restored upon entering the
         function. Also, it could be more efficient to  save  the  state  from
         YYFILL  (n)  because  YYSETSTATE  (state)  is called unconditionally.
         YYFILL (n) however does not get state as a  parameter,  so  we  would
         have to store state in a local variable by YYSETSTATE (state).

       o Modify  YYFILL  (n)  to return (from the function calling it) if more
         input is needed.

       o Modify the caller to recognize if more input is  needed  and  respond
         appropriately.

       o The  generated  code  will  contain  a  switch  block that is used to
         restore the last state by jumping behind the corresponding YYFILL (n)
         call.  This  code  is  automatically generated in the epilogue of the
         first /*!re2c */ block. It is possible to trigger generation  of  the
         YYGETSTATE  () block earlier by placing a /*!getstate:re2c*/ comment.
         This is especially useful when the scanner  code  should  be  wrapped
         inside a loop.

       Please see examples/push_model/push.re for an example of a "push" model
       scanner. The generated code can be tweaked with inplace  configurations
       state:abort and state:nextlabel.

SCANNER WITH CONDITION SUPPORT
       You can precede regular expressions with a list of condition names when
       using the -c switch. re2c will then generate a scanner block  for  each
       condition, and each of the generated blocks will have its own precondi-
       tion. The precondition is given by the interface define YYGETCONDITON()
       and must be of type YYCONDTYPE.

       There are two special rule types. First, the rules of the condition <*>
       are merged to all conditions (note that they have a lower priority than
       other  rules  of  that condition). And second, the empty condition list
       allows one to provide a code block that does not have a  scanner  part,
       meaning  it does not allow any regular expressions. The condition value
       referring to this special block is always the one with the  enumeration
       value 0. This way the code of this special rule can be used to initial-
       ize a scanner. It is in no way necessary to have these rules: but some-
       times it is helpful to have a dedicated uninitialized condition state.

       Non  empty  rules  allow  one to specify the new condition, which makes
       them transition rules. Besides generating calls for the YYSETCONDTITION
       define, no other special code is generated.

       There  is  another kind of special rule that allows one to prepend code
       to any code block of all rules of a certain set of conditions or to all
       code  blocks  of  all rules. This can be helpful when some operation is
       common among rules. For instance, this can be used to store the  length
       of the scanned string. These special setup rules start with an exclama-
       tion mark followed by either a list of conditions <! condition,  ...  >
       or  a  star  <!*>.  When re2c generates the code for a rule whose state
       does not have a setup rule and a starred setup  rule  is  present,  the
       starred setup code will be used as setup code.

ENCODINGS
       re2c  supports  the  following encodings: ASCII (default), EBCDIC (-e),
       UCS-2 (-w), UTF-16 (-x), UTF-32 (-u) and UTF-8 (-8).  See also  inplace
       configuration re2c:flags.

       The  following  concepts  should be clarified when talking about encod-
       ings.  A code point is an abstract number that represents a single sym-
       bol.   A code unit is the smallest unit of memory, which is used in the
       encoded text (it corresponds to one character in the input stream). One
       or  more  code  units  may  be needed to represent a single code point,
       depending on the encoding. In a fixed-length encoding, each code  point
       is  represented  with an equal number of code units. In variable-length
       encodings, different code points can be represented with different num-
       ber of code units.

       o ASCII  is a fixed-length encoding. Its code space includes 0x100 code
         points, from 0 to 0xFF. A code point is represented with exactly  one
         1-byte  code  unit,  which  has the same value as the code point. The
         size of YYCTYPE must be 1 byte.

       o EBCDIC is a fixed-length encoding. Its code space includes 0x100 code
         points,  from 0 to 0xFF. A code point is represented with exactly one
         1-byte code unit, which has the same value as  the  code  point.  The
         size of YYCTYPE must be 1 byte.

       o UCS-2  is  a  fixed-length  encoding. Its code space includes 0x10000
         code points, from 0 to 0xFFFF. One code  point  is  represented  with
         exactly  one  2-byte  code unit, which has the same value as the code
         point. The size of YYCTYPE must be 2 bytes.

       o UTF-16 is a variable-length encoding. Its  code  space  includes  all
         Unicode  code  points,  from 0 to 0xD7FF and from 0xE000 to 0x10FFFF.
         One code point is represented with one or two 2-byte code units.  The
         size of YYCTYPE must be 2 bytes.

       o UTF-32  is  a fixed-length encoding. Its code space includes all Uni-
         code code points, from 0 to 0xD7FF and from 0xE000 to  0x10FFFF.  One
         code point is represented with exactly one 4-byte code unit. The size
         of YYCTYPE must be 4 bytes.

       o UTF-8 is a variable-length encoding. Its code space includes all Uni-
         code  code  points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One
         code point is represented with a sequence of one, two, three, or four
         1-byte code units. The size of YYCTYPE must be 1 byte.

       In  Unicode,  values  from  range 0xD800 to 0xDFFF (surrogates) are not
       valid Unicode code points. Any encoded  sequence  of  code  units  that
       would  map  to  Unicode  code  points  in  the  range 0xD800-0xDFFF, is
       ill-formed. The user  can  control  how  re2c  treats  such  ill-formed
       sequences with the --encoding-policy <policy> switch.

       For  some  encodings,  there are code units that never occur in a valid
       encoded stream (e.g., 0xFF byte in UTF-8).  If  the  generated  scanner
       must  check  for invalid input, the only correct way to do so is to use
       the default rule (*). Note that the full range rule ([^])  won't  catch
       invalid  code  units when a variable-length encoding is used ([^] means
       "any valid code point", whereas the default rule (*) means "any  possi-
       ble code unit").

GENERIC INPUT API
       re2c  usually  operates on input with pointer-like primitives YYCURSOR,
       YYMARKER, YYCTXMARKER, and YYLIMIT.

       The generic input API (enabled with the --input custom  switch)  allows
       customizing input operations. In this mode, re2c will express all oper-
       ations on input in terms of the following primitives:

                    +-----------------+----------------------------+
                    |YYPEEK ()        | get current input  charac- |
                    |                 | ter                        |
                    +-----------------+----------------------------+

                    |YYSKIP ()        | advance to next character  |
                    +-----------------+----------------------------+
                    |YYBACKUP ()      | backup current input posi- |
                    |                 | tion                       |
                    +-----------------+----------------------------+
                    |YYBACKUPCTX ()   | backup current input posi- |
                    |                 | tion for trailing context  |
                    +-----------------+----------------------------+
                    |YYSTAGP (t)      | save  current  input posi- |
                    |                 | tion to tag t              |
                    +-----------------+----------------------------+
                    |YYSTAGN (t)      | save default value to  tag |
                    |                 | t                          |
                    +-----------------+----------------------------+
                    |YYMTAGP (t)      | append  input  position to |
                    |                 | the history of tag t       |
                    +-----------------+----------------------------+
                    |YYMTAGN (t)      | append  default  value  to |
                    |                 | the history of tag t       |
                    +-----------------+----------------------------+
                    |YYRESTORE ()     | restore    current   input |
                    |                 | position                   |
                    +-----------------+----------------------------+
                    |YYRESTORECTX ()  | restore   current    input |
                    |                 | position for trailing con- |
                    |                 | text                       |
                    +-----------------+----------------------------+
                    |YYRESTORETAG (t) | restore   current    input |
                    |                 | position from tag t        |
                    +-----------------+----------------------------+
                    |YYLESSTHAN (n)   | check if less than n input |
                    |                 | characters are left        |
                    +-----------------+----------------------------+

       A couple of useful links that provide some examples:

       1. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html

       2. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html

SEE ALSO
       You can find more information  about  re2c  at:  http://re2c.org.   See
       also: flex(1), lex(1), quex (http://quex.sourceforge.net).

AUTHORS
       Peter Bumbulis   peter@csg.uwaterloo.ca

       Brian Young      bayoung@acm.org

       Dan Nuffer       nuffer@users.sourceforge.net

       Marcus Boerger   helly@users.sourceforge.net

       Hartmut Kaiser   hkaiser@users.sourceforge.net

       Emmanuel Mogenet mgix@mgix.com

       Ulya Trofimovich skvadrik@gmail.com

VERSION INFORMATION
       This manpage describes re2c version 1.0.1, package date 11 Aug 2017.

                                                                       RE2C(1)

Man(1) output converted with man2html
list of all man pages