Real Time Colorizing - Some initial thoughts
As Justin mentioned yesterday, I'm building a new parser. This parser is responsible for parsing and colorizing MarkUp and non-markup code; so it could therefore fully MarkUp an .aspx page which contained: Html, Xml, clientside script and serverside scripts within it.
I've built a few small tools to colorize code in the past and I even have an article published on Msdn about it. For this parser however, I'm hoping to create a set of parsing routines which are efficient enough to allow me to re-build (and re-render) the parse tree "on the fly".
This is my first real foray into building a true parser and also my first attempt at real time parsing.
I believe that the secret to success will be to ensure that I only ever re-parse the smallest fragment of tree necessary. That is, if a user is typing into a textbox which I'm using to build a tree behind the scenes then, rather than re-building the tree from the root at each keystroke, I need to have some way to evaluate the nodes which are within the "current context" and parse from that root down when changes are made to the underlying document.
There's 2 challenges which spring immediately to mind here which, once solved should see me substantially closer to my end goal.
Defining the Current Context
Presently I'm thinking about what information I might need to store in a shared Context object to assist me to be able to define which point in the tree a user is working on when the cursor is located within the textbox containing the document. Consider the following (the gold pointer indicated the position of the cursor and the tree below it indicates a possible tree for that structure):
text is <foo>[ ]
- Document - HtmlElement( current state :: open ) text is <foo>asdf[ ]
- Document - HtmlElement( current state :: open ) - TextNode
text is <foo>asdf<foo>[ ]
- Document - HtmlElement( current state :: closed ) - TextNode
Would it be fair enough to presume that, for the 3 scenario's listed above that the Node in context would be: 1) HtmlElement, 2) TextNode and 3) Document? What if, for scenario 2 the user's next action is a "paste" action which inserts the text "</foo>"?
Rebuilding the Tree
Hopefully as a result of doing minimal parsing I'll also be doing a minimal amount of tree re-building. Here's an issue that I'm currently trying to work through with regarding to tree operations. Given the following tree:
- Document
- HtmlElement1
- Child1
- Child2
- Child3
- HtmlElement2
- Child1
- Child2
- Child3
- HtmlElement3
- Child1
- Child2
- Child3
- HtmlElement1
And, supposing that each node in the tree is an instance of a class named "Node" having the following members:
string Text ; int StartingIndex ;
... if the current node in context is HtmlElement1::Child2...
- Document
- HtmlElement1
- Child1
- Child2[ ]
- Child3
- HtmlElement2
- Child1
- Child2
- Child3
- HtmlElement3
- Child1
- Child2
- Child3
- HtmlElement1
... which is a chunk of text, and the user's next action is to append to that node. The resulting operations on the tree might include:
- Parse the newly entered text to see if it is a node or just raw text
- If it was just text then, append to the current node : HtmlElement1::Child2
- Re-parse HtmlElement1::Child2 to see if it is a node or just raw text
- If it is just text then update it in the tree.
- Enumerate all nodes after the current node and update the starting indexes
- Re-render the affected nodes: HtmlElement1::Child2
If the user's next action resulted in the current node becoming a closing tag for HtmlElement1 then a whole different set of actions would take place:
- Parse the newly entered text to see if it is a node or just raw text
- If it was just text then, append to the current node : HtmlElement1::Child2
- Re-parse HtmlElement1::Child2 to see if it is a node or just raw text
- It's an ending tag, walk up the tree to find it's opening tag and close it.
- Set the current context as the parent of the node which we just closed
- Re-parse that node.
- Enumerate all nodes after the current node and update the starting indexes
- Re-render the affected nodes.
That seems like a lot of work, and that's only for an add operation - what about a deletion of text? I'm sure that what I've come up with here is not really the correct way - but it's the best that I could come up with for today's bus ride and at least it's a starting point to document some initial thoughts.
Hopefully throughout the week I'll uncover some more strategy to solve the 2 challenges and I'll post my discoveries - who knows, maybe at the end of it all I'll have a nice little parser which can do real time colorizing and/or intellisense :-)