Grammar LaTeX like with mixed whitespace utf and commandsSimple Island Grammar in ANTLR 4: Token Recognition ErrorANTLR: Help on Lexing Errors for a custom grammar exampleAllow Whitespace sections ANTLR4Can I force ANTL4 to read expected tokens instead of letting it guessing what kind of token it may be?Unindented code breaks my grammarANRLR4 lexer semantic predicate issueRunning Antlr4 parser with lexer grammar gets token recognition errorsAntlr - Is there any formal explanation, why a lexer rule defined first is not visible to a parser rule defined later?Syntax Error in mysql grammar file for ANTLR v4Parsing single word into multiple tokens while skipping whitespace

How can you tell the version of Ubuntu on a system in a .sh (bash) script?

Easy way to get process information from a window

Given mean and SD, can we approximate the underlying distribution?

Can I shorten this filter, that finds disk sizes over 100G?

How to efficiently shred a lot of cabbage?

What would the United Kingdom's "optimal" Brexit deal look like?

Just how much information should you share with a former client?

Why is Searing Smite not listed in the Roll20 Spell books?

Using Python in a Bash Script

Why are prop blades not shaped like household fan blades?

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

What Marvel character has this 'W' symbol?

Academic progression in Germany, what happens after a postdoc? What is the next step?

How to calculate points under the curve?

What is my clock telling me to do?

Password management for kids - what's a good way to start?

How to litter train a cat if both my husband and I work away from home all day?

Can machine learning learn a function like finding maximum from a list?

How did Biff return to 2015 from 1955 without a lightning strike?

Introduction to the Sicilian

Can you remove a blindfold using the Telekinesis spell?

Can living where Rare Earth magnetic ore is abundant provide any protection?

Avoiding Implicit Conversion in Constructor. Explicit keyword doesn't help here

What are the cons of stateless password generators?



Grammar LaTeX like with mixed whitespace utf and commands


Simple Island Grammar in ANTLR 4: Token Recognition ErrorANTLR: Help on Lexing Errors for a custom grammar exampleAllow Whitespace sections ANTLR4Can I force ANTL4 to read expected tokens instead of letting it guessing what kind of token it may be?Unindented code breaks my grammarANRLR4 lexer semantic predicate issueRunning Antlr4 parser with lexer grammar gets token recognition errorsAntlr - Is there any formal explanation, why a lexer rule defined first is not visible to a parser rule defined later?Syntax Error in mysql grammar file for ANTLR v4Parsing single word into multiple tokens while skipping whitespace






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


As you can see, the title can contain several kind of items :



  • string in utf8 without quotes and with whitespace which I'd like to
    keep in one token


  • a variable call as : variable_name


  • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


These items can be nested.



I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



Here my lexer gramar code :



 lexer grammar OEFLexer;
// Default mode rules (the SEA)
SEA_WS : (' '|'t'|'r'? 'n')+ ;
TITLE : '\title';
OB : '';
OP : '(';
BSLASH : '\' -> mode(CALLREFERENCE) ;
TEXT : ~[\(]+; // clump all text together
// ----------------- Everything Callreference ---------------------
mode CALLREFERENCE;

CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

DRAW : 'draw' OP;
IF : 'if' OB;
ID : [a-zA-Z]+ ; // match/send ID in tag to parser


Here my parser grammar



parser grammar OEFParser;
options tokenVocab=OEFLexer;

document: TITLE OB ( callreference
line 1:37 extraneous input ' ' expecting '', TEXT, ')'
line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


With this generated tree generated by Grun



enter image description here



Thanks for your help to help me tackle this issue.
Chris










share|improve this question
































    1















    I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



    titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


    As you can see, the title can contain several kind of items :



    • string in utf8 without quotes and with whitespace which I'd like to
      keep in one token


    • a variable call as : variable_name


    • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


    These items can be nested.



    I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



    Here my lexer gramar code :



     lexer grammar OEFLexer;
    // Default mode rules (the SEA)
    SEA_WS : (' '|'t'|'r'? 'n')+ ;
    TITLE : '\title';
    OB : '';
    OP : '(';
    BSLASH : '\' -> mode(CALLREFERENCE) ;
    TEXT : ~[\(]+; // clump all text together
    // ----------------- Everything Callreference ---------------------
    mode CALLREFERENCE;

    CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
    CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
    CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

    DRAW : 'draw' OP;
    IF : 'if' OB;
    ID : [a-zA-Z]+ ; // match/send ID in tag to parser


    Here my parser grammar



    parser grammar OEFParser;
    options tokenVocab=OEFLexer;

    document: TITLE OB ( callreference
    line 1:37 extraneous input ' ' expecting '', TEXT, ')'
    line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
    line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


    With this generated tree generated by Grun



    enter image description here



    Thanks for your help to help me tackle this issue.
    Chris










    share|improve this question




























      1












      1








      1








      I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



      titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


      As you can see, the title can contain several kind of items :



      • string in utf8 without quotes and with whitespace which I'd like to
        keep in one token


      • a variable call as : variable_name


      • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


      These items can be nested.



      I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



      Here my lexer gramar code :



       lexer grammar OEFLexer;
      // Default mode rules (the SEA)
      SEA_WS : (' '|'t'|'r'? 'n')+ ;
      TITLE : '\title';
      OB : '';
      OP : '(';
      BSLASH : '\' -> mode(CALLREFERENCE) ;
      TEXT : ~[\(]+; // clump all text together
      // ----------------- Everything Callreference ---------------------
      mode CALLREFERENCE;

      CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

      DRAW : 'draw' OP;
      IF : 'if' OB;
      ID : [a-zA-Z]+ ; // match/send ID in tag to parser


      Here my parser grammar



      parser grammar OEFParser;
      options tokenVocab=OEFLexer;

      document: TITLE OB ( callreference
      line 1:37 extraneous input ' ' expecting '', TEXT, ')'
      line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
      line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


      With this generated tree generated by Grun



      enter image description here



      Thanks for your help to help me tackle this issue.
      Chris










      share|improve this question
















      I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :



      titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 


      As you can see, the title can contain several kind of items :



      • string in utf8 without quotes and with whitespace which I'd like to
        keep in one token


      • a variable call as : variable_name


      • some keyword following by parentheses or other with braces : for instance draw( utf8 var if ... ) or if idem .


      These items can be nested.



      I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the variable_name ( I get a : extraneous input ' ').



      Here my lexer gramar code :



       lexer grammar OEFLexer;
      // Default mode rules (the SEA)
      SEA_WS : (' '|'t'|'r'? 'n')+ ;
      TITLE : '\title';
      OB : '';
      OP : '(';
      BSLASH : '\' -> mode(CALLREFERENCE) ;
      TEXT : ~[\(]+; // clump all text together
      // ----------------- Everything Callreference ---------------------
      mode CALLREFERENCE;

      CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CB : '' -> mode(DEFAULT_MODE) ; // back to SEA mode
      CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode

      DRAW : 'draw' OP;
      IF : 'if' OB;
      ID : [a-zA-Z]+ ; // match/send ID in tag to parser


      Here my parser grammar



      parser grammar OEFParser;
      options tokenVocab=OEFLexer;

      document: TITLE OB ( callreference
      line 1:37 extraneous input ' ' expecting '', TEXT, ')'
      line 1:45 mismatched input 'expression' expecting '', TEXT, ''}
      line 1:75 extraneous input '<EOF>' expecting '', TEXT, ')'


      With this generated tree generated by Grun



      enter image description here



      Thanks for your help to help me tackle this issue.
      Chris







      whitespace grammar antlr4






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 27 at 6:34









      Bart Kiers

      136k29 gold badges254 silver badges255 bronze badges




      136k29 gold badges254 silver badges255 bronze badges










      asked Mar 26 at 21:57









      chrisb06chrisb06

      134 bronze badges




      134 bronze badges

























          1 Answer
          1






          active

          oldest

          votes


















          1














          The problem is the space after expression:



          titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
          ^
          ^
          ^


          which causes the mode to go back to the DEFAULT_MODE:



          CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


          Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



          One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



          A quick lexer grammar demo:



          lexer grammar OEFLexer;

          TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

          fragment OB : '';
          fragment OP : '(';
          fragment S : [ trn]+;

          mode CALLREFERENCE;

          CB : '' -> popMode;
          CP : ')' -> popMode;

          DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
          IF : '\if' S? OB -> pushMode(CALLREFERENCE);

          BSLASH : '\';
          ID : [a-zA-Z]+;
          CR_OTHER : .;


          and the parser grammar:



          parser grammar OEFParser;

          options tokenVocab=OEFLexer;

          document
          : TITLE ( callreference | string )* CB EOF
          ;

          string
          : CR_OTHER+
          | ID
          ;

          commandDraw
          : DRAW ( callreference | string )* CP
          ;

          commandIf
          : IF ( callreference | string )* CB
          ;

          callreference
          : BSLASH ID
          | commandDraw
          | commandIf
          ;


          Parsing you example input will result in the following parse tree:



          enter image description here






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55366774%2fgrammar-latex-like-with-mixed-whitespace-utf-and-commands%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The problem is the space after expression:



            titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
            ^
            ^
            ^


            which causes the mode to go back to the DEFAULT_MODE:



            CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


            Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



            One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



            A quick lexer grammar demo:



            lexer grammar OEFLexer;

            TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

            fragment OB : '';
            fragment OP : '(';
            fragment S : [ trn]+;

            mode CALLREFERENCE;

            CB : '' -> popMode;
            CP : ')' -> popMode;

            DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
            IF : '\if' S? OB -> pushMode(CALLREFERENCE);

            BSLASH : '\';
            ID : [a-zA-Z]+;
            CR_OTHER : .;


            and the parser grammar:



            parser grammar OEFParser;

            options tokenVocab=OEFLexer;

            document
            : TITLE ( callreference | string )* CB EOF
            ;

            string
            : CR_OTHER+
            | ID
            ;

            commandDraw
            : DRAW ( callreference | string )* CP
            ;

            commandIf
            : IF ( callreference | string )* CB
            ;

            callreference
            : BSLASH ID
            | commandDraw
            | commandIf
            ;


            Parsing you example input will result in the following parse tree:



            enter image description here






            share|improve this answer





























              1














              The problem is the space after expression:



              titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
              ^
              ^
              ^


              which causes the mode to go back to the DEFAULT_MODE:



              CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


              Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



              One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



              A quick lexer grammar demo:



              lexer grammar OEFLexer;

              TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

              fragment OB : '';
              fragment OP : '(';
              fragment S : [ trn]+;

              mode CALLREFERENCE;

              CB : '' -> popMode;
              CP : ')' -> popMode;

              DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
              IF : '\if' S? OB -> pushMode(CALLREFERENCE);

              BSLASH : '\';
              ID : [a-zA-Z]+;
              CR_OTHER : .;


              and the parser grammar:



              parser grammar OEFParser;

              options tokenVocab=OEFLexer;

              document
              : TITLE ( callreference | string )* CB EOF
              ;

              string
              : CR_OTHER+
              | ID
              ;

              commandDraw
              : DRAW ( callreference | string )* CP
              ;

              commandIf
              : IF ( callreference | string )* CB
              ;

              callreference
              : BSLASH ID
              | commandDraw
              | commandIf
              ;


              Parsing you example input will result in the following parse tree:



              enter image description here






              share|improve this answer



























                1












                1








                1







                The problem is the space after expression:



                titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
                ^
                ^
                ^


                which causes the mode to go back to the DEFAULT_MODE:



                CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


                Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



                One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



                A quick lexer grammar demo:



                lexer grammar OEFLexer;

                TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

                fragment OB : '';
                fragment OP : '(';
                fragment S : [ trn]+;

                mode CALLREFERENCE;

                CB : '' -> popMode;
                CP : ')' -> popMode;

                DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
                IF : '\if' S? OB -> pushMode(CALLREFERENCE);

                BSLASH : '\';
                ID : [a-zA-Z]+;
                CR_OTHER : .;


                and the parser grammar:



                parser grammar OEFParser;

                options tokenVocab=OEFLexer;

                document
                : TITLE ( callreference | string )* CB EOF
                ;

                string
                : CR_OTHER+
                | ID
                ;

                commandDraw
                : DRAW ( callreference | string )* CP
                ;

                commandIf
                : IF ( callreference | string )* CB
                ;

                callreference
                : BSLASH ID
                | commandDraw
                | commandIf
                ;


                Parsing you example input will result in the following parse tree:



                enter image description here






                share|improve this answer













                The problem is the space after expression:



                titleUn pré é"'§è" VAR state draw( 200ifexpression kjlkjé ) bis tèr 
                ^
                ^
                ^


                which causes the mode to go back to the DEFAULT_MODE:



                CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;


                Something that you don't want because you're (obviously) still in the CALLREFERENCE context.



                One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a ... ( and ... you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or .



                A quick lexer grammar demo:



                lexer grammar OEFLexer;

                TITLE : '\title' S? OB -> pushMode(CALLREFERENCE);

                fragment OB : '';
                fragment OP : '(';
                fragment S : [ trn]+;

                mode CALLREFERENCE;

                CB : '' -> popMode;
                CP : ')' -> popMode;

                DRAW : '\draw' S? OP -> pushMode(CALLREFERENCE);
                IF : '\if' S? OB -> pushMode(CALLREFERENCE);

                BSLASH : '\';
                ID : [a-zA-Z]+;
                CR_OTHER : .;


                and the parser grammar:



                parser grammar OEFParser;

                options tokenVocab=OEFLexer;

                document
                : TITLE ( callreference | string )* CB EOF
                ;

                string
                : CR_OTHER+
                | ID
                ;

                commandDraw
                : DRAW ( callreference | string )* CP
                ;

                commandIf
                : IF ( callreference | string )* CB
                ;

                callreference
                : BSLASH ID
                | commandDraw
                | commandIf
                ;


                Parsing you example input will result in the following parse tree:



                enter image description here







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 27 at 7:59









                Bart KiersBart Kiers

                136k29 gold badges254 silver badges255 bronze badges




                136k29 gold badges254 silver badges255 bronze badges





















                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







                    Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55366774%2fgrammar-latex-like-with-mixed-whitespace-utf-and-commands%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                    은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현