language support

General Discussion on RSyntaxTextArea.

Moderator: robert

language support

Postby Nate » Thu Apr 07, 2011 4:36 am

Hi, long time!

I like RSyntaxTextArea, but I wish language support was easier. I've recently been using Ragel for a few projects (TableLayout, silly XML parser, etc). Here is a quick go at using Ragel for syntax highlighting with RSyntaxTextArea:

http://n4te.com/temp/rageltest.zip

Ragel has a scanner operator that would probably work well. I may dig into making a more complex parser, but thought I would post it so you can play and in case I don't find time. It seems a lot more sane that JFlex, though I admit to hardly giving it a glance. :)
Nate
 
Posts: 15
Joined: Fri Dec 04, 2009 6:45 am

Re: language support

Postby Nate » Thu Apr 07, 2011 5:21 am

I see that getTokenList can re-parse only a portion of the text. I made a Ragel feature request to better support this:
http://www.complang.org/pipermail/ragel ... 02593.html
Until that request is implemented, is there a way to have it always parse the whole document? I don't mind the performance penalties for now.

Also, I'm not sure my parser will react correctly. Starting the parser in the middle means it doesn't have context about how many levels the current construct is nested, or what the parent constructs are, etc. I guess syntax highlighting is not the same as compiling and I'll work it out as I go...
Nate
 
Posts: 15
Joined: Fri Dec 04, 2009 6:45 am

Re: language support

Postby Guest » Thu Apr 07, 2011 9:10 am

My issue above doesn't seem to surface. Maybe it would in a larger document than I've been playing with.

I got something working, keywords and all:
http://code.google.com/p/table-layout/s ... kenizer.rl

JWS:
http://table-layout.googlecode.com/svn/ ... ditor.jnlp

Works great!

One issue during development is that sometimes Ragel will match multiple machines at the same time, eg:

([0-9]+ %stuff | alnum+ %otherstuff)

This matches 0 through 9 and invokes the "stuff" action OR it matches alphanumeric characters and invokes the "otherstuff" action. Input like "123" matches both, so Ragel goes down both paths simultaneously. This causes duplicate tokens to be added, which totally messes up RSyntaxTextArea. Can RSyntaxTextArea throw an error when this happens? Currently I have to look for visual anomalies. Fixing it by restructuring my Ragel isn't a problem.
Guest
 

Re: language support

Postby Guest » Thu Apr 07, 2011 9:11 am

BTW, your forums log me out and don't auto log me in, and LastPass doesn't work with the forums. Very annoying! :p
Guest
 

Re: language support

Postby Nate » Thu Apr 07, 2011 8:21 pm

Note: I had to make TokenMakerBase public.
Nate
 
Posts: 15
Joined: Fri Dec 04, 2009 6:45 am

Re: language support

Postby robert » Fri Apr 08, 2011 3:20 am

Wow, this is a lot of stuff to look at. If you think it's a simpler means of creating TokenMakers via grammars, maybe it's worth a look. I'll see if I can dig into this sometime soon.
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: language support

Postby Guest » Fri Apr 08, 2011 6:13 am

Gah, I give up even logging in. I'll just sign my posts so you know who it is. :p

Well, I didn't go down the JFlex route, so I don't know how hard that really is. But from reading your posts and from the fact that adding keywords is hard, I would guess Ragel is easier. You basically write regex and you can attach code (aka "actions") on any state changes. As with most regex, it is daunting at first glance, but Ragel has a very good documentation PDF that is all you need. Quick example:

Code: Select all
'\'' @stringChar
^'\''* >buffer %string
'\'' @stringChar


This defines 3 "machines". I put them on different lines so you can more easily see each machine. Here is what they do:

1) This matches a literal single quote. The @ means "finished", so when single quote is matched, call the code snippet (aka action) called "stringChar".

2) This matches single quote, ^ negates it to match anything except single quote, and * means zero or more. So this matches zero or more characters that are not single quote. The > means "entering", so the "buffer" action is called when the machine is entered. The buffer action is code I wrote which stores the current offset into the char[]. The % means "leaving" (slightly different from @, but don't worry about that for now), so the "string" action is called when the machine is exited. This action uses the stored offset and the current offset to know all the characters that were matched so it can add a token.

3) This is the same machine as 1.

See TableLayoutTokenizer I linked above. I only parse for "keywords" without knowing what keyword it is, then I use a HashSet to see if it is a keyword. This makes it super simple for someone to come along and add keywords.
Guest
 


Return to Open Discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron