Nested multiline-comments

Questions on using RSyntaxTextArea should go here.

Moderator: robert

Nested multiline-comments

Postby Stefan » Fri Jan 29, 2010 12:28 pm

Hello,

the programming language I want to edit with RSyntaxTextArea does contain nested multiline comments. Here is an example:

/* outer multiline comment
/* inner multiline comment */
*/

currently I do not find an easy way how to count the nesting information for the comments?

Did someone already encounter this requirement and has any proposals how this could be implemented?

Greetings
Stefan
Stefan
 

Re: Nested multiline-comments

Postby robert » Sun Jan 31, 2010 4:44 am

The only way to currently implement this is to "cheat" and have 1 state in your TokenMaker implementation per level of "nested comment." So, for example, you could create 10 states of multiline comments and support comments nested up to 10 levels deep. Anything more than that would cause a highlighting error (the "end" of any MLC's less than 10 levels deep wouldn't be colored as a comment, if that makes sense). In practice this would probably be more than sufficient - the common case would likely be comments nested 2-deep, but not much more.

If you're creating your own TokenMaker implementation, feel free to email me and I can help you get it implemented. There are languages I would like to add proper support for in RSTA that need this feature.

Arbitrary nesting of comments is something RSTA should properly support, however, so please add a Feature Request so it can be tracked and I don't forget about it.
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: Nested multiline-comments

Postby Guest » Wed Feb 03, 2010 9:38 am

Hello Robert,

you are right, support for arbitrary nesting is best implemented directly in RSTA.

One could add nesting-level information for multiline-comment tokens (in the Token class) and pass this information additionally to the initialTokenType to the method getTokenList() in order to set the appropriate nesting information.

Thanks for your great work, I will add a Feature Request.

Greetings
Stefan
Guest
 

Re: Nested multiline-comments

Postby robert » Wed Feb 03, 2010 1:47 pm

I'm thinking that even better than explicit nested comment support is to have a field that a TokenMaker can use for arbitrary per-line information, besides that of its last token type. Each TokenMaker could use this field for whatever info they want. I hate to make the API any more complex than it already is (not that most users create their own TokenMakers, but still) so I'll have to think over the best way to implement this.

Thanks for the feedback!
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: Nested multiline-comments

Postby Guest » Wed Feb 03, 2010 8:56 pm

Hi Robert,

do you mean the Java-class Object with "field"?

Greetings from Munich
Stefan
Guest
 

Re: Nested multiline-comments

Postby robert » Thu Feb 04, 2010 1:47 pm

I was thinking along the lines of a new method to go alongside getLastTokenTypeOnLine(), something like "getExtraDataForLine(int)". This would return arbitrary data that is meaningful to the current TokenMaker, but could vary from one TokenMaker to the next. The current TokenMakers, for example, wouldn't need it. This information could be used to specify things such as nested comment depth, the current "section" of a language's source code is divided into discrete sections, etc.

The implementation wouldn't be using a new int per line, but rather it would use space in the current "lastTokenTypeOnLine" list of ints. There would be a new limitation on number of states (say 256, should be more than enough) and the remaining 24 bits would be used for the "extra data." So no extra space or time overhead for languages that don't use the feature.

I'm starting to question the need for this though. Implementing this would require slightly modifying and recompiling (and re-testing) several current TokenMakers, and the thing is, I'm still not convinced this cannot be done with the current implementation. For example, instead of my previous proposal of a certain number of states representing "comment depth," say you had a single state be "comment, 1 deep" and each succeeding state (e.g. token type be an extra comment layer). For example, since currently, negative token types are used for states internal to a particular TokenMaker:

Code: Select all
/**
 * Type this TokenMaker for "last token type on line" for multi-line comments 1 level deep.  Anything
 * less than this is used to specify more layers; i.e. "-2" means "2 levels deep," "-3" means "3 levels
 * deep," etc.  This allows arbitrary nested comment depth.  Any other internal states would have to
 * have values in the range -1..-9 in this case.
 */
public static final int INTERNAL_MLC_DEPTH_1  = -10;


Then, just end un-ended MLC lines with (INTERNAL_MLC_DEPTH_1-depth+1) instead of COMMENT_MULTILINE. Your parsing code would decode the lastTokenType for the previous line, set a "depth" field, then parse the current line with this knowledge.

You mentioned earlier that you felt like this functionality should be "built-in" to RSTA, and while I agree to a certain extent, if a language supports nested comments, it'll have to write some code in its TokenMaker to support it whether RSTA has any built-in support or not, so I'm not sure how having support for a "comment depth" field, or arbitrary info-per-line field, really helps over my proposal above?

Or am I missing something? :D Suggestions are welcome of course.
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: Nested multiline-comments

Postby robert » Wed Feb 08, 2012 8:02 pm

Zombie thread...

More notes to anyone interested on how this can be implemented today. The API changes discussed above were never implemented, but here's how you could implement something like my previous suggestion:

This can be done in an AbstractJFlexCTokenMaker subclass with some clever handling of "internal states" used for line end tokens. See PHPTokenMaker.flex for an example of this - it uses such a trick to remember what "parent" token type it's in when it encounters a "<?php" PHP start token. Here's a summary of how to go about doing it as well:

Define an internal state for being in a multi-line comment. If you define other internal states, leave a large space between the multi-line comment one and any others. This will allow you to "encode" the current MLC depth in your end token:

Code: Select all
public static final int INTERNAL_IN_MLC = -1;
public static final int INTERNAL_ANOTHER_STATE = -512;


In the example above, you'd be able to have up to 510 levels of nested comments, for example. Should be plenty!

Next, in your JFlex file, in your MLC state, be sure and include the MLC depth in your end token value when encountering "\n" or <<EOF>>, and check for MLC start/end tokens to increase/decrease MLC depth. For example (untested):

Code: Select all
// Keep track of nested MLC depth.
private int mlcDepth = 0;

// ...

// States defining start and end of an MLC.
MlcStart = ("/*")
MlcEnd = ("/*")

// ...

<YYINITIAL>
   // ...
   // Start an MLC as usual, but note our depth of 1.
   {MlcStart}          { mlcDepth = 1; start = zzMarkedPos-2; yybegin(MLC); }
   // ...
}

// ...

<MLC> {
   {MlcStart}    { mlcDepth++; }
   {MlcEnd}      { if (--mlcDepth==0) { addToken(start,zzStartRead+1, Token.COMMENT_MULTILINE); yybegin(YYINITIAL); } }
   <<EOF>>       { addToken(start,zzStartRead+1, Token.COMMENT_MULTILINE); addEndToken(INTERNAL_IN_MLC - mlcDepth); return firstToken; }
}


While likely incomplete, this example gives you the idea. Subtracting mlcDepth from the end token state allows you to then retrieve it in getTokenList() like so:

Code: Select all
public Token getTokenList(Segment text, int initialTokenType, int startOffset) {

   resetTokenList();
   this.offsetShift = -text.offset + startOffset;
   mlcDepth = 0; // Probably not necessary, just to be safe

   // Start off in the proper state.
   int state = Token.NULL;
   switch (initialTokenType) {
      // ...
      default:
         if (initialTokenType<INTERNAL_IN_MLC && initialTokenType>INTERNAL_ANOTHER_STATE) {
            mlcDepth = -(initialTokenType - INTERNAL_IN_MLC);
            state = MLC;
            start = text.offset;
         }
         break;

   // ...
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: Nested multiline-comments

Postby Guest » Wed Apr 04, 2012 4:17 pm

Hi Robert,

thanks for your last "Zombie"-answer.

Unfortunately, right now it seems to me that I do not fully understand when to create tokens.

Normally I would expect to create a token for each "token" in the input (e.g. an ident or a variable).

BUT: Do I have to create a token at the end of a line, in case there is some construct (like e.g. a multi line comment) covering multiple lines?

Stefan
Guest
 

Re: Nested multiline-comments

Postby robert » Thu Apr 05, 2012 1:18 pm

Yes, you have to use the addEndToken() method like in the example in my last post. You can see its implementation in PHPTokenMaker.flex. It's a pattern used in a few different built-in TokenMakers. It simply adds a Token to the Token list at the last offset in the line, of length 0, so that it doesn't get painted. Since it's the last Token in the Token list generated, its type is used as the starting type of the next line, but since it isn't painted, it can specify your funky internal-only type numbers.
User avatar
robert
 
Posts: 760
Joined: Sat May 10, 2008 5:16 pm

Re: Nested multiline-comments

Postby Guest » Thu Apr 05, 2012 2:59 pm

Thank's for your answer!

Stefan
Guest
 

Next

Return to Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron