Lexer definition with ANTLR instead of JFlex

General Discussion on RSyntaxTextArea.

Moderator: robert

Lexer definition with ANTLR instead of JFlex

Postby plaidflannel » Fri Feb 24, 2012 4:22 am

I'm working on a Java application that uses ANTLR for lexing, parsing, and code generation. I recently started investigating RSyntaxTextArea to support editing my domain-specific languages.

I did not relish the prospect of learning JFlex and maintaining two versions of my grammars, and it also seemed that JFlex was a bit clumsy because of the required manual editing of the output. Because I already had a significant investment in ANTLR grammars, I tried defining my RSTA TokenMaker by wrapping the lexer produced by ANTLR. It was simpler than I expected. My custom token maker needed about 25 lines of code to implement getTokenList(), plus a few more lines in a method to map ANTLR token type codes (which are chosen by the lexer, not by the user) to RSTA token type codes.

I believe that ANTLR creates a recursive descent lexer, while JFlex creates a deterministic finite state machine lexer. I don't have enough information at this time to determine if that would have a significant performance impact.

If there is interest among the forum readers, I can put together a cleaned up and tested example to illustrate this approach. (My current version is still crude and not thoroughly tested; I may have overlooked some error conditions that should be caught.)
plaidflannel
 
Posts: 8
Joined: Wed Feb 22, 2012 4:02 am

Re: Lexer definition with ANTLR instead of JFlex

Postby robert » Sat Feb 25, 2012 10:11 pm

I would be interested in seeing this. The hand-modified JFlex parsers have been quite an impediment to folks adding support for their own languages. People have done it of course, and there's TokenMakerMaker, but it's pretty limited to languages with C-style grammars.

As an extra data point, there's been at least one other user who's wrapped a Ragel parser in a custom TokenMaker. ANTLR is of course more popular, so an example of integrating it would be beneficial for more potential users...
User avatar
robert
 
Posts: 774
Joined: Sat May 10, 2008 5:16 pm

Re: Lexer definition with ANTLR instead of JFlex

Postby plaidflannel » Tue Feb 28, 2012 5:54 pm

I have created a small program that demonstrates using an ANTLR lexer with RSTA. A zip file is available at http://share.plaidflannel.com/ANTLR_RSTA_Demo.zip.

The distribution file includes source code for the demo main program (ANTLR_RSTA_Demo.java), the interface between an ANTLR lexer and RSyntaxTextArea (ANTLRTokenMaker.java), the ANTLR grammar and lexer (DemoLexer.g and DemoLexer.java), a syntax scheme (DemoSyntaxScheme.java), and the javadoc documentation for the classes (doc/index.html).

The most significant piece of this demo is the method "getTokenList()" in ANTLRTokenMaker.java. The javadoc for that class describes the translation from ANTLR tokens to JFlex/RSTA tokens.

There is no executable jar file for this program to avoid the complexities of redistributing the required libraries for ANTLR and RSyntaxTextArea.
plaidflannel
 
Posts: 8
Joined: Wed Feb 22, 2012 4:02 am


Return to Open Discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron