ShowUsYour<Blog>

Irregular expressions regularly

June 2003 - Posts

ASP.NET Starter Kits - mining for jewels

As I was ranting to Dave: "Somehow people are missing the Starter Kits and I'm not really sure why".

Here's some advice that I give people to get them looking at ASP.NET Starter Kits and how to reap some value out of them.

Imagine for a moment that you are a shopkeeper standing behind your counter and a customer comes in and asks "I'd like some food please.",  to which you reply, "sure sir, I have all sorts of food.  What do you feel like?  Hot?  Cold?  Sweet?  Savoury....?"

This goes on for 10-15 minutes and you've nearly narrowed this guy down (mind you the queue behind him has started to build up quite alarmingly!).  From your probing you've ascertained that he would like:  a small sweet item, with some variety in it and it cannot contain nuts but would preferably contain chocolate and, if possible caramel.

"Cool!", you say.  That will be $1.40 thanks and you hand him a Mars Bar.

Now, that would obviously never happen because the customer can come in and basically pick and choose for themselves what they want, and luckily, Mars Bars are well known and built to the same "spec" so that wherever you are in the world, a Mars Bar will always be exactly that - a Mars Bar!

A problem exists within our game however that means that you cannot always compare apples with apples.  Take reporting for example, there's really only 7 or 8 "popular" types of report, that is:

 - Tabular Report
 - Visual Report
 - Cross Tab Report
 - Master Details Report
 - Simple Report
 - Text Report
 - Hierarchical Report
 - Drill-Down Report

These report types are covered by the Reports StarterKit.

So, to parallel the example I used for the customer that wanted a Mars Bar... how many times have you sat down and tried to understand what a client wanted when they said:

"I want to see the quarterly sales figures by region."

You might have thought about it for a while and finally come back with a picture, powerpoint mock-up or Html mockup and said, does it look like this {insert picture or mock-up}.  Inevitably the customer has explained that you clearly weren't listening, "Don't you understand!" they say, "I want to see the quarterly sales figures by region.". 

This goes on for another 3 or 4 iterations, relations become strained, the project fails, your wife walks out on you, and your goldfish dies.  YOU STUPID BASTARD!  Why weren't you using the Starter Kits?

If you were using the starter kits however, it might go something like this:

Customer: "I want to see the quarterly sales figures by region."

You:  { browses to reports selctor } http://www.asp.net/ReportsStarterKit/
You:  Does it look like this?  http://www.asp.net/ReportsStarterKit/SourceViewer/srcview.aspx?path=crosstab.src&file=crosstab&rows=5

Customer:  Yes, it does :-)

[ realizing that you're on a roll, you click "run" and browse the live version of that report ]

You: Does it behave like this?
Customer:  Yes, it does :-)

Using this approach the client selects the remaining 9 reports... and the final order looks like this:

1 Visual Report
2 Tabular Reports
1 Master Details Report
3 Simple Reports 
2 Drill-Down Reports

You: Great, that'll be $2500 thanks.  You'll have your reports by the end of the week :-)

Now, that's really cool and everything, but the story doesn't end there.  What happens next is that you ship the order to your developers with a link to the "spec" for building the sprocs, middle-tier and UI layer (the plans) for each report - http://www.asp.net/ReportsStarterKit/SourceViewer/srcview.aspx?path=crosstab.src&file=crosstab&rows=5.

The developer delivers a consistent product on-time, your customer is happy, your wife tells you that she is pregnant with your second child, you win the lottery and your circle of friends increases.

That's the secret of the starter kits!

Posted: Jun 25 2003, 05:25 PM by digory | with 7 comment(s)
Filed under:
GotDotNet User Samples Rss feed

Just saw a blog entry from Duncan about a new Rss feed for User Samples on GDN... 3 words "subscribe to it":

     http://www.gotdotnet.com/community/resources/rss.aspx?feed=sample

I regularly save myself from having to write widgets by first browsing through the User Samples area.

I went to the site to find the link for this resource and, it's pretty difficult to notice (see if you can spot it). 

While I was there I checked out the new Workspaces V1 application.  It too looks cool and, apparently they also have an Rss feed to which you can subscribe (according to some text on the home page) although I could find no actual evidence that one exists!   

GDN is a great site, although it often appears chaotic and unorganized in terms of how it's managed.  It regularly crashes, e-mails and forum posts go unanswered and new features often go un-announced.  It's still a great resource though :-)

Experiences Shared ( that's static to you )

I've been having a nice chat with ( d.o.t.d. ) Dave about our respective growth as developers :-)

You can muse over our ramblings here: http://weblogs.asp.net/dburke/posts/9191.aspx

Mort by day, Elvis on the bus.

I've committed to buying the book that Kent  mentioned - Build Your Own .NET Language and Compiler .  It sounds like just the sort of book that I'd love for reading on the bus on the way to work and back.  In fact this paragraph from the blurb sums up my own feelings very well:

All software developers use languages – it’s the fundamental tool of the trade. Yet despite widespread curiosity about how languages work, few developers actually learn how they work. For one thing, most texts on language and compiler development are highly academic and theoretical tomes intended for use in college level computer science programs. This is a shame, because the techniques used to make a language work have widespread applications in general programming.

Actually, I love reading about raw data structures, and I was only chatting to a friend last night about storage, access and data structures.  It's quite fascinating to think that, in a tightly packed BinaryTree you can store ONE MILLION different pieces of data and take no more than 20 "hops" to find any single piece!

To quote one of my favourite books:

"When searching a tighly packed 1,000,000 element binary tree, no more than 20 comparisons need to be made because 2^20 > 1,000,000"

So that's a maximum of 20 comparisons as opposed to 1,000,000 in a simple linked list.

Now, granted, it's not often that you will have to roll your own BinaryTree's, but that sort of knowledge and problem solving can be readily applied to many pieces of the architectural puzzle methinks. 

Anyways, I find that sort of titbit fascinating and I'm hoping that this book is written for the kind of people that do ;-)

Here's a link to a piece that I wrote about 18 months ago that discusses BinaryTree's and what they are:

    http://www.flws.com.au/showusyourcode/codeLib/code/BinaryTree.asp?catID=5

Some quotes to keep me sane....

"Twenty years from now you will be more disappointed by the
things you didn't do than by the ones you did do. So throw off
the bowlines. Sail away from the safe harbor. Catch the trade
winds in your sails. Explore. Dream. Discover
." - Mark Twain

"Don't get hung up in the past, but move on and continue in a
forward motion no matter what happens. Even if a piece of work
is an overall failure, there will still be some element in that
work that is better than you've ever done before. It is that
small element that you must thrive on, and use to give yourself
the motivation to continue."
- Robert Suguita

- Why is it that bullets ricochet off of Superman's chest, but he
ducks when the gun is thrown at him?

Please feel free to add any of your favourite quotes to this list, especially if they help describe your current mood, situation, or outlook ;-)

I found a home... thanks INETA ;-)

After spending the past year without any local developer affilliations I finally found - via the INETA site - that there was in fact a .NET Users group in my home town - http://www.ineta.org/GroupDetail.aspx?GroupID=569&tabindex=1.

As it turned out, the group were on the lookout (so to speak) for people that were able and willing to share .NET experiences - so of course I've offered to do what I can :-) My first talk is on the 9th of July and will be the start of a series that I think I'll call "Mining for Nuggets in the ASP.NET Starter Kits" :-) Here's a link to the page that contains the agenda for that evening:

ADNUG Notices

If you come along that night don't forget to say "hi"!

Changed Hex2Color algorithm

I'm not sure that this is the most optimal Hex2Dec2Color code around, but it will do me for the time being. I ended up writing a tool to wrap that following function so that I could easily enter a Hex string and see the color representation of it.

    Dim c As Color = Color.FromArgb( _
            ByteFromHexString("#88A8B2", 0), _
            ByteFromHexString("#88A8B2", 2), _
            ByteFromHexString("#88A8B2", 4) _
        )


    Private Function ByteFromHexString(ByVal hexStrng As String, ByVal rgb As Byte) As Byte
        If Microsoft.VisualBasic.Left(hexStrng, 1) = "#" Then
            hexStrng = Mid(hexStrng, 2)
        End If
        hexStrng = UCase(hexStrng)

        Dim _hex As String = "0123456789ABCDEF"
        Dim iOut As Byte
        Try
            iOut = CByte((_hex.IndexOf(hexStrng.Chars(rgb + 1)) * 16))
            iOut += CByte((_hex.IndexOf(hexStrng.Chars(rgb))))
        Catch ex As System.OverflowException
        Catch ex As System.IndexOutOfRangeException
            ' bad formatted hexstring
        End Try
        Return iOut
    End Function
I must admit that I was amazed that I couldn't find a Hex2Dec method in the BCL though!



                                            
            
Allright, allright, so I've taken the pledge, happy now?

The last 2 nights have been spent feeling guilty for the crimes that I've committed.  They were never really *big* crimes, more like indiscretions I'd say.  Anyway, reading the new article on Msdn titled Writing Faster Managed Code was a timely reminder about what is required to write quality code.

Basically, the author implores you to go to the trouble of "measuring" things as opposed to going with "gut feel".  That sounds fine, but, measuring things is not always as easy as you might think, so, if you're a NEWBIE like me and you're used to going with "gut feel" you'll probably need to go through another period of adjustment too.  Don't get me wrong, you're not being asked to measure every single operation within every single line of code, just to understand the cost implications of things such as: Object initialization, Data Type choice, Storage access times, and, other lower-level things such as being friendly to the GC process and boxing.  To steal a performance related quote from Jeffrey Friedl:

"I use these exact numbers not because the precision is important, but rather to be more concrete than words such as 'lots', 'few', 'many', 'better', 'not too much' and so forth.  I don't want to imply that {blah} is an exercise in counting tests or backtracks; I just want to acquaint you with the relative qualities of the samples."

So, now that I've said it:

"I promise I will not ship slow code. Speed is a feature I care about. Every day I will pay attention to the performance of my code. I will regularly and methodically measure its speed and size. I will learn, build, or buy the tools I need to do this. It's my responsibility."


...go ahead, you say it too!

Displaying text on a web page

You are creating a an app. that will allow a user to "fetch, fix and format" regular expressions.  The database table that stores the regex's is called Pattern and looks like this:

 PatternId - int
 Pattern - varchar(1000)

Prepare a webpage page that will display the following 4 patterns on a single page in raw text as well as inside a TEXTAREA element:


patternA = "^(?:(?:(?:0?[13578]|1[02])(\/|-" _
   &
"|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\."
_
   &
")(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\"
_
   &
"d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?"
_
   &
":1[6-9]|[2-9]\d)?(?:0[48]|[2468][048"
_
   &
"]|[13579][26])|(?:(?:16|[2468][048]|"
_
   &
"[3579][26])00))))$|^(?:(?:0?[1-9])|"
_
   &
"(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2["
_
   &
"0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$"

  
 patternB =
"<script\s[^\>]>(.*?)\<\/script\>"
  
 patternC =
"<script>self.location = 'raceyRhonda.com" _
   &
"'</script>"


 patternD =
"</textarea><script>self.location = 'raceyRhonda" _
   &
".com'</script>"


Here's a mock-up of the Html page structure to get you started:

<form>
<h3>First Pattern</h3>
<p><font face="Courier New" size="-1">
  PATTERN HERE</font></p>
<textarea rows="4" cols="70">PATTERN HERE TOO!</textarea>
<hr>

<h3>Second Pattern</h3>
<p><font face="Courier New" size="-1">
  PATTERN HERE</font></p>
<textarea rows="4" cols="70">PATTERN HERE TOO!</textarea>
<hr>
<input type="submit" />
</form>

 

Tokenizing

Two friends have recently asked me to provide a definition for the term "Tokenize" as in: "I'm going to tokenize this chunk of text.", and I didn't really provide an answer. I guess that they asked me because it's a term that I've used quite a bit in the past - and in the future too no doubt ;-)

Merriam Webster provides several definitions for the term "Token", a couple of which are:

    a distinguishing feature : CHARACTERISTIC
    a small part representing the whole :

For the record, now that I've had time to give it some thought, I'd like to give an example of what I mean when I use it...

Imagine that you've been assigned the task of building a service that would provide a word-wrap functionality to applications. Applications could supply a body of text and a LineLength property to the service and it would return the original chunk of text with lines formatted to "no longer than" the LineLength limit. Additionally, subscribers to this service would be afforded the option of toggling between word-wrap states via the Wrapped, UnWrapped values of the WrapMode enumerated datatype.

To convert raw text into formatted "wrapped" text, you decide to apply an algorithm similar to the one shown here http://www.namesuppressed.com/syneryder/code-phpwordwrap.shtml; that is:

    - Find all paragraphs
    - For Each paragraph
        - Remove linebreaks
        - Split on spaces
        - Enumerate the words and append them to a string    
        - When the length of the string reaches the LineLength limit insert a linebreak

Given the following chunk of raw text:

    This is a paragraph of text.
    This is a yet another
    paragraph of boring 
    old text.

A LineLength of 18 would see it formatted as:

    This is a 
    paragraph of text.
    This is a yet 
    another paragraph
    of boring old 
    text.

To return the text in it's original, raw format, you might store 2 versions of the data in private fields and, depending on which version is requested, simply return it from that location. That is, after the initial "formatting", you'd write the formatted version to a private field and you would have already stored the raw value in another field, i.e.:

Private mstrRawValue As String
Private mstrFormattedValue As String
Private mCurrentState As WrapMode
 
Public Function GetText() As String
  If Me.mCurrentState = WrapMode.Wrapped Then
   If mstrFormattedValue.Length = 0 Then
       FormatRawText()
       End If
       Return mstrFormattedValue
  Else
       Return mstrRawValue
  End If
End Function

That's probably fine, even though the amount of memory required is approximately double size of the raw string alone, but you might find it difficult to scale if the client asks for another one, or two, or twenty-two different WrapMode states or if they request that they'd like you to provide an "offline" version of the formatted text!

In situations such as the one mentioned above, if I think that there's a chance that I might need to more than one operation on a string I'll often "tokenize" it after the first pass. When I say tokenize, what I'm referring to is that I leave small, descriptive "marks" in the text that can be read at a later date to describe a given state. To show what I'm referring to, here's the algorithm above, amended for "tokenizing":

    - Find all paragraphs
    - For Each paragraph
        - Split on linebreaks and wrap with "<raw>...</raw>" tokens
        - Split on spaces
        - Enumerate the words and append them to a string    
        - When the length of the string reaches the LineLength limit insert a linebreak and wrap with "<formatted>...</formatted>" tokens

And, again, given the following chunk of raw text:

    This is a paragraph of text.
    This is a yet another
    paragraph of boring 
    old text.

A LineLength of 18 would see it formatted as:

<raw><formatted>This is a</formatted>

<formatted>paragraph of text.</formatted></raw>

<raw><formatted>This is a yet</formatted>

<formatted>another</raw></raw> <raw>paragraph</formatted>

<formatted>of boring</raw> <raw>old</formatted>

<formatted>text.</raw></formatted>

This allows the amount of memory required to store the data to be roughly halved as the document is now self-describing of its states. The algorithm for returning text is like so:

Private mstrStoredValue As String
Private mblnIsMarked As Boolean = False
Private mCurrentState As WrapMode
 
Public Function GetText() As String
  Dim tmpStrng As String
  If Not mblnIsMarked Then
           FormatRawText()
  End If
  If Me.mCurrentState = WrapMode.Wrapped Then tmpStrng = RegexReplace(mstrStoredValue,"(\<formatted\>|\<\/?raw\>)", "")
       Return Regex.Replace(tmpStrng, "<\/formatted\>",Environment.NewLine)
  Else
       tmpStrng = RegexReplace(mstrStoredValue,"(\<raw>|\<\/?formatted\>)", "")
       Return Regex.Replace(tmpStrng, "\<\/raw\>",Environment.NewLine)
  End If
End Function

Either way, because the document is now "described" via the tokenizing process, the presentation of the data can now be separated from the logic required for the formatting of it.

Well, that pretty much covers the "Darren" interpretation of Tokenizing. I should add however, that, anyone with compiler or interpreter experience will probably have a different interpretation where the term is generally used to refer to the "substitution" of text rather than marking, or adding to text.

More Posts Next page »