Need help with TokenTypes and TokenMap

Questions on using RSyntaxTextArea should go here.

Moderator: robert

Need help with TokenTypes and TokenMap

Postby plaidflannel » Fri Feb 24, 2012 4:01 am

I'm trying to understand the interactions among token makers, token types, syntax schemes, and token maps. There seems to be some ambiguity and perhaps unnecessary redundancy.

Suppose I have a lexer grammar that distinguishes identifiers and reserved words. My RSTA has a SyntaxScheme that specifies different colors for the identifier and reserved word token types. My token maker (a custom subclass of AbstractTokenMaker) does not have a TokenMap. The RSTA seems to display everything in the correct styles.

Question 1: Have I overlooked something that TokenMap would do for me?

Suppose, instead, that I have a lexer grammar that does not distinguish identifiers and reserved words. All "words" are turned into "identifier" tokens (or, alternatively, all "words" are turned into "reserved word" tokens). Also suppose that I put all (and only) the reserved words in a TokenMap. The SyntaxScheme is the same as above.

Question 2: How would the RSTA paint the identifiers and reserved words?

The javadoc comment in class TokenTypes says that it defines "All token types supported by RSyntaxTextArea." Suppose my token maker produces an additional token type "goober", and assume that my RSTA's SyntaxScheme has a style for tokens of type "goober". If necessary, suppose that the integer type for a "goober" is borrowed from one of the supported token types that I don't need.

Question 3: Would my goober tokens be painted according to the specified style, or would the RSTA ignore those tokens because they are not in the list of "all token types supported"?

Interface TokenTypes specifies several token types. In many cases, the name of the token type is reasonably self-explanatory. However, I could not determine the difference between RESERVED_WORD and RESERVED_WORD_2, nor could I determine how the four ERROR_... token types are intended to be used.

Question 4: Is there additional documentation on the token types? Or could someone provide a quick answer here in the forum?

Thanks.
plaidflannel
 
Posts: 8
Joined: Wed Feb 22, 2012 4:02 am

Re: Need help with TokenTypes and TokenMap

Postby robert » Sat Feb 25, 2012 9:53 pm

Suppose I have a lexer grammar that distinguishes identifiers and reserved words. My RSTA has a SyntaxScheme that specifies different colors for the identifier and reserved word token types. My token maker (a custom subclass of AbstractTokenMaker) does not have a TokenMap. The RSTA seems to display everything in the correct styles.

Question 1: Have I overlooked something that TokenMap would do for me?


The vast majority of TokenMakers I've made have been JFlex-based, so I haven't looked at the other approach in awhile. I realize now that the Javadoc is incorrect; AbstractTokenMakers don't use their TokenMap by default, it's assumed that the concrete class will parse keywords, etc. as IDENTIFIERs (for example), and override an addToken() method to call into the TokenMap for the actual token type. For example, in WindowsBatchTokenMaker:

Code: Select all
public void addToken(Segment segment, int start, int end, int tokenType, int startOffset) {

   switch (tokenType) {
      // Since reserved words, functions, and data types are all passed
      // into here as "identifiers," we have to see what the token
      // really is...
      case Token.IDENTIFIER:
         int value = wordsToHighlight.get(segment, start,end);
         if (value!=-1)
            tokenType = value;
         break;
   }

   super.addToken(segment, start, end, tokenType, startOffset);

}


The Javadoc needs to be updated, it's just out-of-date since it was written before the JFlex TokenMakers became the standard for built-in syntaxes.

If your TokenMaker is parsing keywords on its own, it's probably more efficient than using TokenMaker since the latter approach effectively requires re-scanning the characters of each identifier to determine its token type. What TokenMaker would by you is simpler parsing code.

Suppose, instead, that I have a lexer grammar that does not distinguish identifiers and reserved words. All "words" are turned into "identifier" tokens (or, alternatively, all "words" are turned into "reserved word" tokens). Also suppose that I put all (and only) the reserved words in a TokenMap. The SyntaxScheme is the same as above.

Question 2: How would the RSTA paint the identifiers and reserved words?


I think the answer to this ties into my answer to the first question. RSTA will paint them exactly as they are parsed by your TokenMaker, unless your TokenMaker extends AbstractTokenMaker, you put keywords, functions, etc. in your TokenMap, and you override one of the addToken() overloads like that seen in WindowsBatchTokenMaker above.

As I'm typing this out, I'm thinking what should be done is that AbstractTokenMaker should be changed to always check TokenMap for IDENTIFIER token types. By default, getTokenMap() should return null, which would imply that the TokenMaker would identify keywords, etc. by itself. Users could then use it if they chose to without having to override a method. I might make things work that way for the next release.

The javadoc comment in class TokenTypes says that it defines "All token types supported by RSyntaxTextArea." Suppose my token maker produces an additional token type "goober", and assume that my RSTA's SyntaxScheme has a style for tokens of type "goober". If necessary, suppose that the integer type for a "goober" is borrowed from one of the supported token types that I don't need.

Question 3: Would my goober tokens be painted according to the specified style, or would the RSTA ignore those tokens because they are not in the list of "all token types supported"?


It would paint it with the specified style. There are actually built-in TokenMakers that "borrow" styles labeled as something else for token types that don't have an explicit type in TokenTypes (if that makes sense).

Interface TokenTypes specifies several token types. In many cases, the name of the token type is reasonably self-explanatory. However, I could not determine the difference between RESERVED_WORD and RESERVED_WORD_2, nor could I determine how the four ERROR_... token types are intended to be used.

Question 4: Is there additional documentation on the token types? Or could someone provide a quick answer here in the forum?


As you were noticing, you can actually use the token types however you want. However, if you wanted to use them as they were "intended," the idea is:

  • RESERVED_WORD_2 is for languages that have two logically disparate kinds of keywords. Alternatively, you can use it to specially highlight certain keywords with important implications; for example, JavaTokenMaker highlights the "return" keyword as RESERVED_WORD_2 and all other keywords as RESERVED_WORD, since "return" is somewhat special in that it affects program flow. This mimics how in Eclipse, you can colorize "return" differently than other Java keywords. Most TokenMakers will only use RESERVED_WORD.
  • The ERROR_* types stem from the earliest days of RSTA highlighting code. It attempted to identify as much lexical information as possible from the code and highlight it as such. Thus, invalid Strings could be highlighted differently than invalid int literals, for example. A single "Token.ERROR" type, to be used for all invalid tokens, would probably have been simpler, but that's just not the way it was originally built and it's never changed.

You could use ERROR_STRING_DOUBLE, for example, to highlight strings with invalid escape sequences, such as "Foobar \u00Ge bas" in Java. You could also use it to colorize unclosed string literals. You could use ERROR_NUMBER_FORMAT for invalid number literals, such as 400g or 123.4e-12q or 0xCAFEBABEz. If you don't want or need to highlight invalid tokens, you don't have to use those token types, and can just highlight the invalid tokens as IDENTIFIERs or whatever.
User avatar
robert
 
Posts: 774
Joined: Sat May 10, 2008 5:16 pm


Return to Help

Who is online

Users browsing this forum: No registered users and 4 guests

cron