A 39 line generic lexer. Lexing is always the easy part, but this guy is pretty sweet for quick and dirty token parsing.
[Edit] There is some redundant code in this revision. I realized I posted the wrong version of the lexer as I was driving down the road on my way to a dinner engagement. I'll post a follow-up lexer (though this one does work), that is more compact as soon as I get home.
Probably wondering what the purpose is behind this little guy. A member of one of the MS newsgroups was curious how hard it would be to process a configuration file that was in a non XML format. The format is actually somewhat popular and uses the french brace nesting, key=value; format. This gives us plenty of well-formedness to work with since we have statement terminators for key=value pairs, and nesting for complex data types (I wouldn't call them complex data types, really, rather named configuration sections.)
Below is the lexer we'll be using. Really simple, doesn't do much. It allows for single character token delimiters. It also allows us to toss certain breaking characters out. For instance, the override I've created will toss out spaces, tabs, carriage returns, and linefeeds. When I actually create the parser I'll be using a slightly different lexer, since I need to keep whitespace, but we'll get to that later. For now, enjoy!
public class Token {
public string TokenData;
public Token(string tokenData) { TokenData = tokenData; }
}
public class BasicLex {
public static Token[] StringToTokens(string tokenString) {
return StringToTokens(tokenString, " \n\r\t{}\"=;.()[],", " \t\r\n");
}
public static Token[] StringToTokens(string tokenString, string breakers, string toss) {
ArrayList tokens = new ArrayList();
int tokenStart = 0, tokenPointer = 0;
TOKENLOOP: while(tokenPointer < tokenString.Length) {
for(int i = 0; i < breakers.Length; i++) {
if ( breakers.IndexOf(tokenString[tokenPointer]) > -1 ) {
if ( tokenStart != tokenPointer ) {
tokens.Add(new Token(tokenString.Substring(tokenStart, tokenPointer - tokenStart)));
}
if ( toss.IndexOf(tokenString[tokenPointer]) == -1 ) {
tokens.Add(new Token(tokenString.Substring(tokenPointer, 1)));
}
tokenStart = ++tokenPointer;
goto TOKENLOOP;
}
}
tokenPointer++;
}
if ( tokenStart != tokenPointer ) {
tokens.Add(new Token(tokenString.Substring(tokenStart, tokenPointer - tokenStart)));
}
return (Token[]) tokens.ToArray(typeof(Token));
}
}