Grammar LaTeX like with mixed whitespace utf and commandsSimple Island Grammar in ANTLR 4: Token Recognition ErrorANTLR: Help on Lexing Errors for a custom grammar exampleAllow Whitespace sections ANTLR4Can I force ANTL4 to read expected tokens instead of letting it guessing what kind of token it may be?Unindented code breaks my grammarANRLR4 lexer semantic predicate issueRunning Antlr4 parser with lexer grammar gets token recognition errorsAntlr - Is there any formal explanation, why a lexer rule defined first is not visible to a parser rule defined later?Syntax Error in mysql grammar file for ANTLR v4Parsing single word into multiple tokens while skipping whitespace

How can you tell the version of Ubuntu on a system in a .sh (bash) script?

Easy way to get process information from a window

Given mean and SD, can we approximate the underlying distribution?

Can I shorten this filter, that finds disk sizes over 100G?

How to efficiently shred a lot of cabbage?

What would the United Kingdom's "optimal" Brexit deal look like?

Just how much information should you share with a former client?

Why is Searing Smite not listed in the Roll20 Spell books?

Using Python in a Bash Script

Why are prop blades not shaped like household fan blades?

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

What Marvel character has this 'W' symbol?

Academic progression in Germany, what happens after a postdoc? What is the next step?

How to calculate points under the curve?

What is my clock telling me to do?

Password management for kids - what's a good way to start?

How to litter train a cat if both my husband and I work away from home all day?

Can machine learning learn a function like finding maximum from a list?

How did Biff return to 2015 from 1955 without a lightning strike?

Introduction to the Sicilian

Can you remove a blindfold using the Telekinesis spell?

Can living where Rare Earth magnetic ore is abundant provide any protection?

Avoiding Implicit Conversion in Constructor. Explicit keyword doesn't help here

What are the cons of stateless password generators?



Grammar LaTeX like with mixed whitespace utf and commands


Simple Island Grammar in ANTLR 4: Token Recognition ErrorANTLR: Help on Lexing Errors for a custom grammar exampleAllow Whitespace sections ANTLR4Can I force ANTL4 to read expected tokens instead of letting it guessing what kind of token it may be?Unindented code breaks my grammarANRLR4 lexer semantic predicate issueRunning Antlr4 parser with lexer grammar gets token recognition errorsAntlr - Is there any formal explanation, why a lexer rule defined first is not visible to a parser rule defined later?Syntax Error in mysql grammar file for ANTLR v4Parsing single word into multiple tokens while skipping whitespace






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


As you can see, the title can contain several kind of items :



  • string in utf8 without quotes and with whitespace which I'd like to
    keep in one token


  • a variable call as : variable_name


  • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


These items can be nested.



I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



Here my lexer gramar code :



 lexer grammar OEFLexer;
// Default mode rules (the SEA)
SEA_WS : (' '|'t'|'r'? 'n')+ ;
TITLE : '\title';
OB : '';
OP : '(';
BSLASH : '\' -> mode(CALLREFERENCE) ;
TEXT : ~[\(]+; // clump all text together
// ----------------- Everything Callreference ---------------------
mode CALLREFERENCE;

CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

DRAW : 'draw' OP;
IF : 'if' OB;
ID : [a-zA-Z]+ ; // match/send ID in tag to parser


Here my parser grammar



parser grammar OEFParser;
options tokenVocab=OEFLexer;

document: TITLE OB ( callreference
line 1:37 extraneous input ' ' expecting '', TEXT, ')'
line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


With this generated tree generated by Grun



enter image description here



Thanks for your help to help me tackle this issue.
Chris










share|improve this question
































    1















    I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



    titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


    As you can see, the title can contain several kind of items :



    • string in utf8 without quotes and with whitespace which I'd like to
      keep in one token


    • a variable call as : variable_name


    • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


    These items can be nested.



    I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



    Here my lexer gramar code :



     lexer grammar OEFLexer;
    // Default mode rules (the SEA)
    SEA_WS : (' '|'t'|'r'? 'n')+ ;
    TITLE : '\title';
    OB : '';
    OP : '(';
    BSLASH : '\' -> mode(CALLREFERENCE) ;
    TEXT : ~[\(]+; // clump all text together
    // ----------------- Everything Callreference ---------------------
    mode CALLREFERENCE;

    CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
    CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
    CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

    DRAW : 'draw' OP;
    IF : 'if' OB;
    ID : [a-zA-Z]+ ; // match/send ID in tag to parser


    Here my parser grammar



    parser grammar OEFParser;
    options tokenVocab=OEFLexer;

    document: TITLE OB ( callreference
    line 1:37 extraneous input ' ' expecting '', TEXT, ')'
    line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
    line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


    With this generated tree generated by Grun



    enter image description here



    Thanks for your help to help me tackle this issue.
    Chris










    share|improve this question




























      1












      1








      1








      I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



      titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


      As you can see, the title can contain several kind of items :



      • string in utf8 without quotes and with whitespace which I'd like to
        keep in one token


      • a variable call as : variable_name


      • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


      These items can be nested.



      I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



      Here my lexer gramar code :



       lexer grammar OEFLexer;
      // Default mode rules (the SEA)
      SEA_WS : (' '|'t'|'r'? 'n')+ ;
      TITLE : '\title';
      OB : '';
      OP : '(';
      BSLASH : '\' -> mode(CALLREFERENCE) ;
      TEXT : ~[\(]+; // clump all text together
      // ----------------- Everything Callreference ---------------------
      mode CALLREFERENCE;

      CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

      DRAW : 'draw' OP;
      IF : 'if' OB;
      ID : [a-zA-Z]+ ; // match/send ID in tag to parser


      Here my parser grammar



      parser grammar OEFParser;
      options tokenVocab=OEFLexer;

      document: TITLE OB ( callreference
      line 1:37 extraneous input ' ' expecting '', TEXT, ')'
      line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
      line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


      With this generated tree generated by Grun



      enter image description here



      Thanks for your help to help me tackle this issue.
      Chris










      share|improve this question
















      I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



      titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


      As you can see, the title can contain several kind of items :



      • string in utf8 without quotes and with whitespace which I'd like to
        keep in one token


      • a variable call as : variable_name


      • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


      These items can be nested.



      I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



      Here my lexer gramar code :



       lexer grammar OEFLexer;
      // Default mode rules (the SEA)
      SEA_WS : (' '|'t'|'r'? 'n')+ ;
      TITLE : '\title';
      OB : '';
      OP : '(';
      BSLASH : '\' -> mode(CALLREFERENCE) ;
      TEXT : ~[\(]+; // clump all text together
      // ----------------- Everything Callreference ---------------------
      mode CALLREFERENCE;

      CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

      DRAW : 'draw' OP;
      IF : 'if' OB;
      ID : [a-zA-Z]+ ; // match/send ID in tag to parser


      Here my parser grammar



      parser grammar OEFParser;
      options tokenVocab=OEFLexer;

      document: TITLE OB ( callreference
      line 1:37 extraneous input ' ' expecting '', TEXT, ')'
      line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
      line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


      With this generated tree generated by Grun



      enter image description here



      Thanks for your help to help me tackle this issue.
      Chris







      whitespace grammar antlr4






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 27 at 6:34









      Bart Kiers

      136k29 gold badges254 silver badges255 bronze badges




      136k29 gold badges254 silver badges255 bronze badges










      asked Mar 26 at 21:57









      chrisb06chrisb06

      134 bronze badges




      134 bronze badges

























          1 Answer
          1






          active

          oldest

          votes


















          1














          The problem is the space after expression:



          titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
          ^
          ^
          ^


          which causes the mode to go back to the DEFAULT_MODE:



          CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


          Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



          One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



          A quick lexer grammar demo:



          lexer grammar OEFLexer;

          TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

          fragment OB : '';
          fragment OP : '(';
          fragment S : [ trn]+;

          mode CALLREFERENCE;

          CB : '' -> popMode;
          CP : ')' -> popMode;

          DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
          IF : '\if' S? OB -> pushMode(CALLREFERENCE);

          BSLASH : '\';
          ID : [a-zA-Z]+;
          CR_OTHER : .;


          and the parser grammar:



          parser grammar OEFParser;

          options tokenVocab=OEFLexer;

          document
          : TITLE ( callreference | string )* CB EOF
          ;

          string
          : CR_OTHER+
          | ID
          ;

          commandDraw
          : DRAW ( callreference | string )* CP
          ;

          commandIf
          : IF ( callreference | string )* CB
          ;

          callreference
          : BSLASH ID
          | commandDraw
          | commandIf
          ;


          Parsing you example input will result in the following parse tree:



          enter image description here






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55366774%2fgrammar-latex-like-with-mixed-whitespace-utf-and-commands%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The problem is the space after expression:



            titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
            ^
            ^
            ^


            which causes the mode to go back to the DEFAULT_MODE:



            CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


            Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



            One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



            A quick lexer grammar demo:



            lexer grammar OEFLexer;

            TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

            fragment OB : '';
            fragment OP : '(';
            fragment S : [ trn]+;

            mode CALLREFERENCE;

            CB : '' -> popMode;
            CP : ')' -> popMode;

            DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
            IF : '\if' S? OB -> pushMode(CALLREFERENCE);

            BSLASH : '\';
            ID : [a-zA-Z]+;
            CR_OTHER : .;


            and the parser grammar:



            parser grammar OEFParser;

            options tokenVocab=OEFLexer;

            document
            : TITLE ( callreference | string )* CB EOF
            ;

            string
            : CR_OTHER+
            | ID
            ;

            commandDraw
            : DRAW ( callreference | string )* CP
            ;

            commandIf
            : IF ( callreference | string )* CB
            ;

            callreference
            : BSLASH ID
            | commandDraw
            | commandIf
            ;


            Parsing you example input will result in the following parse tree:



            enter image description here






            share|improve this answer





























              1














              The problem is the space after expression:



              titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
              ^
              ^
              ^


              which causes the mode to go back to the DEFAULT_MODE:



              CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


              Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



              One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



              A quick lexer grammar demo:



              lexer grammar OEFLexer;

              TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

              fragment OB : '';
              fragment OP : '(';
              fragment S : [ trn]+;

              mode CALLREFERENCE;

              CB : '' -> popMode;
              CP : ')' -> popMode;

              DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
              IF : '\if' S? OB -> pushMode(CALLREFERENCE);

              BSLASH : '\';
              ID : [a-zA-Z]+;
              CR_OTHER : .;


              and the parser grammar:



              parser grammar OEFParser;

              options tokenVocab=OEFLexer;

              document
              : TITLE ( callreference | string )* CB EOF
              ;

              string
              : CR_OTHER+
              | ID
              ;

              commandDraw
              : DRAW ( callreference | string )* CP
              ;

              commandIf
              : IF ( callreference | string )* CB
              ;

              callreference
              : BSLASH ID
              | commandDraw
              | commandIf
              ;


              Parsing you example input will result in the following parse tree:



              enter image description here






              share|improve this answer



























                1












                1








                1







                The problem is the space after expression:



                titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
                ^
                ^
                ^


                which causes the mode to go back to the DEFAULT_MODE:



                CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


                Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



                One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



                A quick lexer grammar demo:



                lexer grammar OEFLexer;

                TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

                fragment OB : '';
                fragment OP : '(';
                fragment S : [ trn]+;

                mode CALLREFERENCE;

                CB : '' -> popMode;
                CP : ')' -> popMode;

                DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
                IF : '\if' S? OB -> pushMode(CALLREFERENCE);

                BSLASH : '\';
                ID : [a-zA-Z]+;
                CR_OTHER : .;


                and the parser grammar:



                parser grammar OEFParser;

                options tokenVocab=OEFLexer;

                document
                : TITLE ( callreference | string )* CB EOF
                ;

                string
                : CR_OTHER+
                | ID
                ;

                commandDraw
                : DRAW ( callreference | string )* CP
                ;

                commandIf
                : IF ( callreference | string )* CB
                ;

                callreference
                : BSLASH ID
                | commandDraw
                | commandIf
                ;


                Parsing you example input will result in the following parse tree:



                enter image description here






                share|improve this answer













                The problem is the space after expression:



                titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
                ^
                ^
                ^


                which causes the mode to go back to the DEFAULT_MODE:



                CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


                Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



                One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



                A quick lexer grammar demo:



                lexer grammar OEFLexer;

                TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

                fragment OB : '';
                fragment OP : '(';
                fragment S : [ trn]+;

                mode CALLREFERENCE;

                CB : '' -> popMode;
                CP : ')' -> popMode;

                DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
                IF : '\if' S? OB -> pushMode(CALLREFERENCE);

                BSLASH : '\';
                ID : [a-zA-Z]+;
                CR_OTHER : .;


                and the parser grammar:



                parser grammar OEFParser;

                options tokenVocab=OEFLexer;

                document
                : TITLE ( callreference | string )* CB EOF
                ;

                string
                : CR_OTHER+
                | ID
                ;

                commandDraw
                : DRAW ( callreference | string )* CP
                ;

                commandIf
                : IF ( callreference | string )* CB
                ;

                callreference
                : BSLASH ID
                | commandDraw
                | commandIf
                ;


                Parsing you example input will result in the following parse tree:



                enter image description here







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 27 at 7:59









                Bart KiersBart Kiers

                136k29 gold badges254 silver badges255 bronze badges




                136k29 gold badges254 silver badges255 bronze badges





















                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55366774%2fgrammar-latex-like-with-mixed-whitespace-utf-and-commands%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript