Multi-line token as a single token

Questions on using RSyntaxTextArea should go here.

Moderator: robert

Multi-line token as a single token

Postby rachanak » Mon Feb 24, 2014 3:59 pm

Hi Robert,
I've an unusual problem with RSTA tokenizing on a per-line basis.

I've variables of the form
Code: Select all
${variable_name}


I want them to be colored differently based on whether they exist in a list of pre-defined variables or not. This check becomes impossible if I have a multi-line variable name like
Code: Select all
${variable_
name}

since I cannot get the whole variable name as a single token. Is there a way I can deal with this? I'd appreciate any help.

Thanks,
Rachana
rachanak
 
Posts: 2
Joined: Mon Feb 24, 2014 3:46 pm

Re: Multi-line token as a single token

Postby robert » Thu Feb 27, 2014 3:53 am

That is unusual. So it's valid for variable names to be split into two by a newline?

Unfortunately I'm not sure this is possible. Feel free to open a feature request on GitHub if you want to see this doable in a future release.
User avatar
robert
 
Posts: 798
Joined: Sat May 10, 2008 5:16 pm

Re: Multi-line token as a single token

Postby rachanak » Thu Feb 27, 2014 4:05 pm

Hello Robert,
Yes, it's valid for variable names to be split into multiple lines since the application is used worldwide and German names are very long for example. The user might want to split by a newline so that he/she can make the long name more readable without having to scroll through hundreds of characters. The lexer simply strips any newline characters.

Thanks for your reply though. Hope this feature is implemented in the near future.

Thanks,
Rachana
rachanak
 
Posts: 2
Joined: Mon Feb 24, 2014 3:46 pm

Re: Multi-line token as a single token

Postby preditcon » Mon Mar 03, 2014 1:08 pm

I've had a somewhat similar issue. The language I'm implementing consists of keywords and arguments to those keywords (simplified greatly), where arguments may be identifiers which are also used for keywords. I wanted to keep track of the preceding non-whitespace token in my lexer, so I could assume that text followed by some other text, is a keyword or an argument (mostly in pairs). It worked nicely until newlines got involved (between a keyword and an argument). My jflex grammar uses states, so proper initialization of the initial lexer state is a must. This is easy to do for comments and string literals (even Rachana's example), which always start and end in a specific character sequence, but not for identifiers.

I don't quite recall which method it was (getTokenList, I think), which failed to provide enough information about previous tokens in it's callback for me, but my lexer could be easily implemented, if the method would provide more than just the last token of the previous line. The latter is mostly a null token anyways.

Example:
keyword argument keyword argument keyword
argument /* wrong, should not be bold, but is because I have no way of setting up proper lexer members at this point (provided token to do so is a null token) */

If Rachana could also keep track of preceding tokens, regardless of any whitespace in between, I assume his/her issue would be solvable. It would require mutators for existing tokens though.
preditcon
 
Posts: 27
Joined: Wed Jan 25, 2012 10:09 am

Re: Multi-line token as a single token

Postby robert » Tue Mar 04, 2014 1:34 pm

It may or may not fit your needs, since TokenMakers abstract away even the RSyntaxDocument, but there are methods in the RSyntaxUtilities class that allow you to find the previous "important" token from a specific location in the document.

java code:

/**
* Returns the last non-whitespace, non-comment token, starting with the
* specified line.
*
* @param doc The document.
* @param line The line at which to start looking.
* @return The last non-whitespace, non-comment token, or <code>null</code>
* if there isn't one.
* @see #getNextImportantToken(Token, RSyntaxTextArea, int)
* @see #getPreviousImportantTokenFromOffs(RSyntaxDocument, int)
*/
public static final Token getPreviousImportantToken(RSyntaxDocument doc, int line);

/**
* Returns the last non-whitespace, non-comment token, before the
* specified offset.
*
* @param doc The document.
* @param offs The ending offset for the search.
* @return The last non-whitespace, non-comment token, or <code>null</code>
* if there isn't one.
* @see #getPreviousImportantToken(RSyntaxDocument, int)
* @see #getNextImportantToken(Token, RSyntaxTextArea, int)
*/
public static final Token getPreviousImportantTokenFromOffs(RSyntaxDocument doc, int offs);


Again, just throwing out some ideas; I don't think this will help in your scenario but maybe it will help or is a start.
User avatar
robert
 
Posts: 798
Joined: Sat May 10, 2008 5:16 pm

Re: Multi-line token as a single token

Postby preditcon » Tue Mar 04, 2014 2:50 pm

Calling getTokenList method inside itself (with different parameters) is probably a bad idea. I get the feeling I'd end up lexing the entire document before some starting offset (each time). Those utility methods call getTokenList in their implementation.

Perhaps implementing my own cache in the lexer would be better. Does RSTA lex the entire document when it is initially populated (can it contain lines that are yet to be lexed before an arbitrary offset)? Does it instantiate multiple instances of a TokenMaker or just a single one?

Edit01
Found out that it's client code that does the instantiation. So there's a single instance. Also RSTA initially lexes the entire document (actually it is the document itself that does it). A specialized cache seems doable to me atm. It would be similar to RSyntaxDocument.lastTokensOnLines member. If Token.setType() can be called at any time, the OP could also take advantage of such a cache.
preditcon
 
Posts: 27
Joined: Wed Jan 25, 2012 10:09 am


Return to Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron