ShowUsYour<Blog>

Irregular expressions regularly

April 2004 - Posts

How to build a comma-separated list

Dave has an interesting little thread going about how to do build a comma-separated list and what you can do with that annoying, left-over ',':

    http://weblogs.asp.net/dburke/archive/2004/04/28/122332.aspx

Who'd've thought that there would be so many options!

Real Time Colorizing - Some initial thoughts

As Justin mentioned yesterday, I'm building a new parser. This parser is responsible for parsing and colorizing MarkUp and non-markup code; so it could therefore fully MarkUp an .aspx page which contained: Html, Xml, clientside script and serverside scripts within it.

I've built a few small tools to colorize code in the past and I even have an article published on Msdn about it. For this parser however, I'm hoping to create a set of parsing routines which are efficient enough to allow me to re-build (and re-render) the parse tree "on the fly".

This is my first real foray into building a true parser and also my first attempt at real time parsing.

I believe that the secret to success will be to ensure that I only ever re-parse the smallest fragment of tree necessary. That is, if a user is typing into a textbox which I'm using to build a tree behind the scenes then, rather than re-building the tree from the root at each keystroke, I need to have some way to evaluate the nodes which are within the "current context" and parse from that root down when changes are made to the underlying document.

There's 2 challenges which spring immediately to mind here which, once solved should see me substantially closer to my end goal.

Defining the Current Context

Presently I'm thinking about what information I might need to store in a shared Context object to assist me to be able to define which point in the tree a user is working on when the cursor is located within the textbox containing the document. Consider the following (the gold pointer indicated the position of the cursor and the tree below it indicates a possible tree for that structure):

text is <foo>[ ]
- Document - HtmlElement( current state :: open ) text is <foo>asdf[ ]
- Document - HtmlElement( current state :: open ) - TextNode
text is <foo>asdf<foo>[ ]
- Document - HtmlElement( current state :: closed ) - TextNode

 

Would it be fair enough to presume that, for the 3 scenario's listed above that the Node in context would be: 1) HtmlElement, 2) TextNode and 3) Document? What if, for scenario 2 the user's next action is a "paste" action which inserts the text "</foo>"?

Rebuilding the Tree

Hopefully as a result of doing minimal parsing I'll also be doing a minimal amount of tree re-building. Here's an issue that I'm currently trying to work through with regarding to tree operations. Given the following tree:

  • Document
    • HtmlElement1
      • Child1
      • Child2
      • Child3
    • HtmlElement2
      • Child1
      • Child2
      • Child3
    • HtmlElement3
      • Child1
      • Child2
      • Child3

And, supposing that each node in the tree is an instance of a class named "Node" having the following members:

    string Text ;
    int StartingIndex ;

... if the current node in context is HtmlElement1::Child2...

  • Document
    • HtmlElement1
      • Child1
      • Child2[ ]
      • Child3
    • HtmlElement2
      • Child1
      • Child2
      • Child3
    • HtmlElement3
      • Child1
      • Child2
      • Child3

... which is a chunk of text, and the user's next action is to append to that node. The resulting operations on the tree might include:

  • Parse the newly entered text to see if it is a node or just raw text
  • If it was just text then, append to the current node : HtmlElement1::Child2
  • Re-parse HtmlElement1::Child2 to see if it is a node or just raw text
  • If it is just text then update it in the tree.
  • Enumerate all nodes after the current node and update the starting indexes
  • Re-render the affected nodes: HtmlElement1::Child2

If the user's next action resulted in the current node becoming a closing tag for HtmlElement1 then a whole different set of actions would take place:

  • Parse the newly entered text to see if it is a node or just raw text
  • If it was just text then, append to the current node : HtmlElement1::Child2
  • Re-parse HtmlElement1::Child2 to see if it is a node or just raw text
  • It's an ending tag, walk up the tree to find it's opening tag and close it.
  • Set the current context as the parent of the node which we just closed
  • Re-parse that node.
  • Enumerate all nodes after the current node and update the starting indexes
  • Re-render the affected nodes.

That seems like a lot of work, and that's only for an add operation - what about a deletion of text? I'm sure that what I've come up with here is not really the correct way - but it's the best that I could come up with for today's bus ride and at least it's a starting point to document some initial thoughts.

Hopefully throughout the week I'll uncover some more strategy to solve the 2 challenges and I'll post my discoveries - who knows, maybe at the end of it all I'll have a nice little parser which can do real time colorizing and/or intellisense :-)

A VB Language Bug or Feature?

Following on from post http://weblogs.asp.net/dneimke/archive/2004/04/22/117856.aspx ...

Another issue with this "feature" is that I'd almost class it as a bug in VB.NET that they don't handle the Null Terminator for you. For example:

    Dim a As String = "Foo"
    Dim b As String = CHR(0)
    Dim c As String = "Bar"
    MsgBox( a & b & c ) '    <== displays "Foo"

Whereas...

    Dim a As String = "Foo"
    Dim b As String = CHR(0)
    Dim c As String = "Bar"
    Console.WriteLine( a & b & c )    ' <== displays "FooBar" 

I'd call that inconsistent with my expectation as a language "user". So what would you call this? A VB Language Bug or Feature?

Posted: Apr 22 2004, 12:10 PM by digory | with 4 comment(s)
Filed under:
Null Terminator character

A zero Char is added to a char array to indicate its end point, this character is referred to as the "Null Terminator". This character is added to the end of each string to mark its ending boundary - the String classes in .NET do this automagically. So, declaring something like:

    string s = "Foo" ;

Results in the following character array being generated under the covers:

    char[] s = {'F','o','o','0'} ;

This will almost never be a problem because, using .NET string classes and methods which return strings will always return a "safe" string rather than the raw character arrays. But, imagine if that wasn't the case and you had to deal with the "Null Terminator" in your own code; forgetting to do so could easily result in unexpected results:

    Dim a As String = "Foo"
    MsgBox( a )    ' displays "Foo"
    Dim b As String = "Foo" & Chr(0) & "Bar"
    MsgBox( b )    ' displays "Foo"!!!

Yesterday, it did actually bite me because I was dealing with some API calls to get volume information about a machine:

     Private Declare Function GetVolumeInformation Lib "kernel32" Alias "GetVolumeInformationA" _
          (ByVal lpRootPathName As String, _
          ByVal lpVolumeNameBuffer As String, _
          ByVal nVolumeNameSize As Int32, _
          ByRef lpVolumeSerialNumber As System.UInt32, _
          ByRef lpMaximumComponentLength As Int32, _
          ByRef lpFileSystemFlags As Int32, _
          ByVal lpFileSystemNameBuffer As String, _
          ByVal nFileSystemNameSize As Int32) As Int32

     Dim drvserial As System.UInt32
     Dim drvlbl As String = Space(200)
     Dim filesys As String = Space(200)
     Dim i As Int32
     Dim j As Int32
     Dim k As Int32
     k = GetVolumeInformation("C:\", drvlbl, 200, drvserial, i, j, filesys, 200)
     MsgBox( drvserial.ToString & "-" & drvlbl & "-" & filesys )

What happened is that, when I ran this code the value of drvlbl was being returned with the Chr(0) character appended at the end of 200 spaces. Therefore, I was only ever seeing the value or drvserial being displayed. The easy way to fix that - in VB at least - is the use the VB Replace function to clean up the Chr(0) character:

     MsgBox( drvserial.ToString & "-" & Replace( drvlbl, Chr(0), "" ) & "-" & filesys )

I'm not really sure whether this behaviour would be exhibited when making the same API call via C# or whether it's just a "feature" of the VB language; I suspect that it will only be an issue in VB!

Posted: Apr 22 2004, 11:45 AM by digory | with 7 comment(s)
Filed under:
Me is always foolin' fa this stuff...
Grammar God!
You are a GRAMMAR GOD!

If your mission in life is not already to
preserve the English tongue, it should be.
Congratulations and thank you!

How grammatically sound are you?
brought to you by Quizilla
Posted: Apr 22 2004, 10:08 AM by digory | with 3 comment(s)
Filed under:
Generating Marked-up code snippets in my blog

In a recent comment - http://weblogs.asp.net/dneimke/archive/2004/04/09/110222.aspx#112468 - Brian asks how I do my colorization of code snippets in my blog.

I recently wrote this entry describing how I use my MarkUp component ( which is freely available ) to emit stylized code snippets.  The component also emits a Css stylesheet for rendering the code:

    http://weblogs.asp.net/dneimke/archive/2004/02/03/66621.aspx

My current Css block for a code window looks like this:

PRE.CodeSnippet
{
    background-color: #f0f0f0;
    padding  : 1em;
    margin  : 0px 2em;
    padding  : 1em;
    border  : 1px dotted gray;
    font-family         : "Lucida Console";
    font-size           : 1em ;
    width               : 800 ;
    color  : #000000;
    overflow            :auto;
}


And here is the Css block for a command line window:

PRE.Console
{
    background-color: #000000; 
    color: #00ff00;
}

So, to create a normal code snippet, you simply run your snippet through the tool and whack it inside a PRE tag with the "CodeSnippet" class applied to it.  To create a console window you whack your text inside a PRE tag with both the "CodeSnippet" and "Console" classes applied.


Note: To grab the MarkUp tool, go to http://Workspaces.gotdotnet.com/MarkUp and download the binaries labelled: "SimpleMarkUpTool 1.0.0.0 - Binaries Only "

Posted: Apr 14 2004, 09:52 AM by digory | with 1 comment(s)
Filed under: ,
Back from the Summit

I'm writing this from the Tom Bradley International Airport in L.A. where I have an 8 hour wait for my plane back to Melbourne, Australia...

BillW

I arrived in Seattle on Saturday 3rd April and, met Bill Wilkinson and his wife Beverley that night for dinner. We ate at a seafood restaurant at Pike's place markets and had a supreme evening of fish, idle chatter amidst a beautiful sunset over the Olympic mountain range.

Bill is an amazing guy, in fact, I would go so far to say that if it wasn't for his mentorship I most probably wouldn't have even got the opportunity to make this trip.  Bill works in the Visual Basic language group within Microsoft and has challenged my knowledge throughout most of my programming life.

Positive Impressions

The conference presented me with an opportunity to view and hear about some of the newest thinking which is coming out of Microsoft.

My most lasting impression thus far was the day when the execs themselves told us of their vision for the community and the role which each of us can play in helping that to become a reality.  I think that in the next few years there will be a rapid development of "community" where we have more ability to shape how we conduct our online relationships and the products needed to support that.

Great Discussion

Wednesday was quite an amazing day... most people - nah, *all* people were very tired after a big party from night before.  I managed to get together early in the day with James Avery, Andy Smith and Scott Mitchell to have a bit of a brainstorming session about where we see certain areas of the online community heading in the coming 12-24 months.  Imagine that!  Getting to discuss "forums useability" with Scott Mitchell, "Server control development" with Andy Smith and "Blogging Community" with James Avery.

It didn't finish there either; after about 3 hours of brainstorming I headed off to listen to a discussion by Paul Vick about Visual Basic's support for Generics in Whidbey, after that I went next door to listen in on a similar disussion by Anders Heijslberg about generics implementation in C#.

After all of that, I managed to spend some time with Chris Sells to discuss the state of the UI and what possibilities there might be in Longhorn timeframe when Avalon technology becomes available.  This conference and my chats with several people (including Chris) have aroused my interest in the future of the UI and it will certainly be interesting to see what - exactly - Avalon can do for us.  Chris has pointed me to some reference material which I'll link to in the next few days (after I've had the opportunity to read it for myself).


Just Hanging Out

Because I didn't leave Seattle until Saturday I was able to have 2 free days just hanging out and catching up with people on campus.  I spent most of my time with Justin Rogers and Andy Smith - there was some very interesting discussions between those 2 guys ;-)

Thursday morning we went to campus and ran into Paul Vick who was able to spend about 20 minutes chatting.  It was great to be able to ask him questions directly about his recent book!  After that Andy and I went to see Kent Sharkey - we managed to bug him for over an hour!  (Sorry about that Kent :)

On Friday I went back to campus by myself to catch up with Wayne King (ASP.NET team) and the team responsible for the development of classes in the System.Text.RegularExpressions namespace.  I was lucky enough to have lunch that day with Justin Rogers and Joel Pobar.  Joel is an Aussie guy who is working on some really cool, low-level parts of the runtime and has a leading hand in the Rotor community source drops.  Joel and I talked about Brisbane (where he came from and I worked) and some of our mutual friendships - such as Greg Lowe and Joseph Cooney.  That was a fun time.

Friday finished with Justin, Amy (his fiancee) and myself going to the movies at Redmond to watch Starsky and Hutch.  I can highly recommend that movie as it was an absolute blast!

The End

Well, that was it - and of course I've only mentioned a few of the highlights.  Overall it was a great opportunity to get together with some of the most knowledgeable and prolific members of the community, some people I met for the first time, and some friendships were re-acquainted.


Thanks everyone for making my time in Seattle memorable.  Let's do it again sometime ;-)

Posted: Apr 12 2004, 02:09 PM by digory | with 2 comment(s)
Filed under: ,
Regex's in JScript.NET

I started reading JScript.NET Programming today:

This language offers a very different way to write .NET code. I'm reading it mostly for two reasons, A) I like the authors' manner of offering more abstract samples and sensible advice than you tend to get in a book of this size; B) so that I can learn about how Regular Expressions are implemented within the JScript.NET language. I'm keen to write a dirt simple regex based routine using to lex through a string with multiple regex's within a loop similar to how you might do it using Perl; it might be nice if you could write and compile regex parsing routines in a js module and consume it from an other language of choice.

So, anyway, I started on Chapter 12 : Regular Expressions and wrote my first app. with it tonight...

import System ;
function DoMatch( str ) {
    var re = /\w+/g ;
    while( re.test( str ) ) {
        Console.WriteLine( 
            RegExp.index + " - " + RegExp.lastIndex +
            "\t" + RegExp.lastMatch + "." ) ;
    }
}
DoMatch( "This is a group of words" ) ;

 

C:\>jsc /t:exe /fast- JS_One.js
Microsoft (R) JScript .NET Compiler version 7.10.3052
for Microsoft (R) .NET Framework version 1.1.4322
Copyright (C) Microsoft Corporation 1996-2002. All rights reserved.

C:\>JS_One
0 - 4   This.
5 - 7   is.
8 - 9   a.
10 - 15 group.
16 - 18 of.
19 - 24 words.

 

Posted: Apr 09 2004, 03:16 PM by digory | with 3 comment(s)
Filed under:
Registration day

Tonight we had the registration for the Summit; it was great to meet up with many guys with whom I've e-mailed and IM'ed with for the past 2 or 3 years but never me in person.

The summit registration took place in a huge room at the state convention center and was very impressive. Many of the product groups had a "stand" there where we could go and be dazzled by cool new stuff or simply as useability questions about the product - I had a great chat with a guy from the Msn team about their product and how they are faring within their market space.

After spending quite a bit of time at the bar several other "stands" I headed over to the MSDN section and caught up with Kent. One of the things that I mentioned to Kent was that they needed a link on the site which lists off all articles in chronological order.  This is because, often I'll remember reading an article and I can kind of remember *when* I read it but I cannot find it via search or the tree.

Kent pulled up the ASP.NET dev. center and pointed to a link titled "Headline Archive" on the main navigation menu on the left hand side of the page; then he just said, "What, like this you mean - and smiled!"

     http://msdn.microsoft.com/asp.net/archive/default.aspx

What can I say, you learn new things every day! :-)

Posted: Apr 05 2004, 01:33 PM by digory | with no comments
Filed under: ,
Touchdown!

Just arrived in Seattle after ~23 hours of flights and connections - man that's a long time.  I only managed about 2 hours of sleep between Melbourne and LA, but managed to sleep pretty much the whole 2.5 hours between LA and Seatac.

 

Posted: Apr 04 2004, 09:12 AM by digory | with no comments
Filed under:
More Posts Next page »