Multi-line token as a single token

Post a reply

Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:
BBCode is ON
[img] is ON
[flash] is OFF
[url] is ON
Smilies are ON
Topic review
   

Expand view Topic review: Multi-line token as a single token

Re: Multi-line token as a single token

Post by preditcon » Tue Mar 04, 2014 2:50 pm

Calling getTokenList method inside itself (with different parameters) is probably a bad idea. I get the feeling I'd end up lexing the entire document before some starting offset (each time). Those utility methods call getTokenList in their implementation.

Perhaps implementing my own cache in the lexer would be better. Does RSTA lex the entire document when it is initially populated (can it contain lines that are yet to be lexed before an arbitrary offset)? Does it instantiate multiple instances of a TokenMaker or just a single one?

Edit01
Found out that it's client code that does the instantiation. So there's a single instance. Also RSTA initially lexes the entire document (actually it is the document itself that does it). A specialized cache seems doable to me atm. It would be similar to RSyntaxDocument.lastTokensOnLines member. If Token.setType() can be called at any time, the OP could also take advantage of such a cache.

Re: Multi-line token as a single token

Post by robert » Tue Mar 04, 2014 1:34 pm

It may or may not fit your needs, since TokenMakers abstract away even the RSyntaxDocument, but there are methods in the RSyntaxUtilities class that allow you to find the previous "important" token from a specific location in the document.

java code:

/**
* Returns the last non-whitespace, non-comment token, starting with the
* specified line.
*
* @param doc The document.
* @param line The line at which to start looking.
* @return The last non-whitespace, non-comment token, or <code>null</code>
* if there isn't one.
* @see #getNextImportantToken(Token, RSyntaxTextArea, int)
* @see #getPreviousImportantTokenFromOffs(RSyntaxDocument, int)
*/
public static final Token getPreviousImportantToken(RSyntaxDocument doc, int line);

/**
* Returns the last non-whitespace, non-comment token, before the
* specified offset.
*
* @param doc The document.
* @param offs The ending offset for the search.
* @return The last non-whitespace, non-comment token, or <code>null</code>
* if there isn't one.
* @see #getPreviousImportantToken(RSyntaxDocument, int)
* @see #getNextImportantToken(Token, RSyntaxTextArea, int)
*/
public static final Token getPreviousImportantTokenFromOffs(RSyntaxDocument doc, int offs);


Again, just throwing out some ideas; I don't think this will help in your scenario but maybe it will help or is a start.

Re: Multi-line token as a single token

Post by preditcon » Mon Mar 03, 2014 1:08 pm

I've had a somewhat similar issue. The language I'm implementing consists of keywords and arguments to those keywords (simplified greatly), where arguments may be identifiers which are also used for keywords. I wanted to keep track of the preceding non-whitespace token in my lexer, so I could assume that text followed by some other text, is a keyword or an argument (mostly in pairs). It worked nicely until newlines got involved (between a keyword and an argument). My jflex grammar uses states, so proper initialization of the initial lexer state is a must. This is easy to do for comments and string literals (even Rachana's example), which always start and end in a specific character sequence, but not for identifiers.

I don't quite recall which method it was (getTokenList, I think), which failed to provide enough information about previous tokens in it's callback for me, but my lexer could be easily implemented, if the method would provide more than just the last token of the previous line. The latter is mostly a null token anyways.

Example:
keyword argument keyword argument keyword
argument /* wrong, should not be bold, but is because I have no way of setting up proper lexer members at this point (provided token to do so is a null token) */

If Rachana could also keep track of preceding tokens, regardless of any whitespace in between, I assume his/her issue would be solvable. It would require mutators for existing tokens though.

Re: Multi-line token as a single token

Post by rachanak » Thu Feb 27, 2014 4:05 pm

Hello Robert,
Yes, it's valid for variable names to be split into multiple lines since the application is used worldwide and German names are very long for example. The user might want to split by a newline so that he/she can make the long name more readable without having to scroll through hundreds of characters. The lexer simply strips any newline characters.

Thanks for your reply though. Hope this feature is implemented in the near future.

Thanks,
Rachana

Re: Multi-line token as a single token

Post by robert » Thu Feb 27, 2014 3:53 am

That is unusual. So it's valid for variable names to be split into two by a newline?

Unfortunately I'm not sure this is possible. Feel free to open a feature request on GitHub if you want to see this doable in a future release.

Multi-line token as a single token

Post by rachanak » Mon Feb 24, 2014 3:59 pm

Hi Robert,
I've an unusual problem with RSTA tokenizing on a per-line basis.

I've variables of the form
Code: Select all
${variable_name}


I want them to be colored differently based on whether they exist in a list of pre-defined variables or not. This check becomes impossible if I have a multi-line variable name like
Code: Select all
${variable_
name}

since I cannot get the whole variable name as a single token. Is there a way I can deal with this? I'd appreciate any help.

Thanks,
Rachana

Top

cron