September 2005 - Posts - Jon Galloway

September 2005 - Posts

Splitting Camel Case with RegEx

Phil posted some code to Split Pascal/Camel Cased Strings a few days ago. We had an offline discussion on doing this via RegEx.

I like the RegEx approach since it's only one line of code:

output = System.Text.RegularExpressions.Regex.Replace(
    input,
    "([A-Z])",
    " $1",
    System.Text.RegularExpressions.RegexOptions.Compiled).Trim();

This matches all capital letters, replaces them with a space and the letter we found ($1), then trims the result to remove the initial space if there was a capital letter at the beginning.

So, which would you use?

Arguments for Phil's C# approach:

  1. Easier for other programmers to read - not everyone knows RegEx
  2. Faster (see comparison below)
  3. All compiled code, so errors are more likely to be caught in development

Arguments for my RegEx approach:

  1. Simpler (in my opinion)
  2. RegEx is a string, so it can be put in a configuration file

So, let's compare performance. Now, this is mostly academic since this kind of function would likely be called less than 25 times, but still worth a look. Here are the sample string "SampleSplitText":

Approach Repetitions Time (seconds)
RegEx Replace 1000 .0312500
RegEx Replace 100000 .3125000
RegEx Replace 10000000 29.1562500
Code Approach 1000 0 (not measurable)
Code Approach 100000 .0156250
Code Approach 10000000 1.6562500
RegEx Delegate Replace 1000 0 (not measurable)
RegEx Delegate Replace 100000 .0937500
RegEx Delegate Replace 10000000 7.5000000

The only reason for calling this out is to show the exceptionally slow performance of the RegEx replace method for high iterations. For under a thousand iterations, I'd definitely go with the RegEx replace. For high repetitions, I'd consider using a RegEx replace with a MatchEvaluator delegate (see the code below). For my very simple test, it was just about as fast for anything under 100000 repetitions.

(updated - fixed a code error with delegate method)

using System;
using System.Collections;
using System.Collections.Specialized;
using System.Text.RegularExpressions;

public class SplitTest
{
    
public static void Main()
    {
        
string input;
        
int iterations;
        
        
for(;;)
        {
            Console.WriteLine("Enter CamelCase text to split (defaults to SampleSplitText):");
            input = Console.ReadLine();
            
if(input==string.Empty)
                input="SampleSplitText";
        
            iterations = 0;
            Console.WriteLine("Enter number of operations ( enter 0 to quit):");
            
try
            
{
                iterations = 
int.Parse(Console.ReadLine());
            }
            
catch
            
{
                Console.WriteLine("Exiting");
                
break;
            }
            
            
if(iterations==0)
                
break;
    
            System.DateTime start;
            start = System.DateTime.Now;
            Console.WriteLine(
string.Format("Output from Inline RegEx approach: {0}", InlineRegExTest(input, iterations)));
            Console.WriteLine(
string.Format("Inline RegEx approach took {0} seconds for {1} iterations.",System.DateTime.Now-start,iterations));        

            start = System.DateTime.Now;
            Console.WriteLine(
string.Format("Output from RegEx / MatchEvaluator approach: {0}", DelegateRegExTest(input, iterations)));
            Console.WriteLine(
string.Format("RegEx / MatchEvaluator approach took {0} seconds for {1} iterations.",System.DateTime.Now-start,iterations));        

            start = System.DateTime.Now;
            Console.WriteLine(
string.Format("Output from Code approach: {0}", CodeTest(input, iterations)));
            Console.WriteLine(
string.Format("Code approach took {0} seconds for {1} iterations.",System.DateTime.Now-start,iterations));        
            Console.ReadLine();
        }
    }

    
private static string InlineRegExTest(string input, int iterations)
    {
        
string output = "Failed";
        
        
for(int i=0;i<iterations;i++)
        {
            output = System.Text.RegularExpressions.Regex.Replace(input,"([A-Z])"," $1",System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
        }
        
return output;
    }

    
private static string DelegateRegExTest(string input, int iterations)
    {
        System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.Compiled;
        Regex reg = 
new Regex("(?<Word>[A-Z])",options);
        
string output = "Failed";
        
        
for(int i=0;i<iterations;i++)
        {
            output = reg.Replace( input, 
new MatchEvaluator( FormatWord ) ) ;
        }
        
return output;
    }

    
private static string FormatWord(Match m)
    {
        
if( m.Groups["Word"].Success )
        {
            
string word = m.Groups["Word"].Value ;
            
return " " + word;
        }
        
else
            return 
m.Value ;
    }

    
private static string CodeTest(string input, int iterations)
    {
        
string output = "Failed";

        
for(int i=0;i<iterations;i++)
        {
            output = SplitUpperCaseToString(input);
        }
        
return output;
    }
    
        
/// <summary>
    /// 
Parses a camel cased or pascal cased string and returns a new
    
/// string with spaces between the words in the string.
    
/// </summary>
    /// <example>
    /// 
The string "PascalCasing" will return an array with two
    
/// elements, "Pascal" and "Casing".
    
/// </example>
    /// <param name="source"></param>
    /// <returns></returns>
    
public static string SplitUpperCaseToString(string source)
    {
        
return string.Join(" ", SplitUpperCase(source));
    }
    
    
/// <summary>
    /// 
Parses a camel cased or pascal cased string and returns an array
    
/// of the words within the string.
    
/// </summary>
    /// <example>
    /// 
The string "PascalCasing" will return an array with two
    
/// elements, "Pascal" and "Casing".
    
/// </example>
    /// <param name="source"></param>
    /// <returns></returns>
    
public static string[] SplitUpperCase(string source)
    {
        
if(source == null)
            
return new string[] {}; //Return empty array.
    
        
if(source.Length == 0)
            
return new string[] {""};
    
        StringCollection words = 
new StringCollection();
        
int wordStartIndex = 0;
    
        
char[] letters = source.ToCharArray();
        
// Skip the first letter. we don't care what case it is.
        
for(int i = 1; i < letters.Length; i++)
        {
            
if(char.IsUpper(letters[i]))
            {
                
//Grab everything before the current index.
                
words.Add(new String(letters, wordStartIndex, i - wordStartIndex));
                wordStartIndex = i;
            }
       }
    
        
//We need to have the last word.
        
words.Add(new String(letters, wordStartIndex, letters.Length - wordStartIndex));
    
         
//Copy to a string array.
        
string[] wordArray = new string[words.Count];
        words.CopyTo(wordArray, 0);
        
return wordArray;
    }
}
Posted by Jon Galloway | 2 comment(s)
Filed under:

[Fix] View Style Information disabled on Firefox Web Developer Toolbar

Some of the tools in the Firefox Web Developer Toolbar extension require that you install Firefox with the Developer's Tools option. If the "View Style Information" menu option is disabled this means the Dom Inspector is not installed.

Fix:
1. Uninstall Firefox. Your user settings should be maintained - mine were.
2. Download Firefox
3. Do a Custom Install
4. Check the "Developer's Tools" checkbox

Posted by Jon Galloway | 1 comment(s)
Filed under:

From/Select and Select/From in LINQ

I'd been wondering about the FROM/WHERE/SELECT syntax in DLINQ. I'm used to the SQL SELECT/FROM/WHERE approach. Turns out that I'm late to this party - this was (of course) discussed at the PDC, and has been under some discussion since then.

At first glance, I wanted my SELECT first. I've written more than my share of SQL, so this "reverse polish SQL" syntax doesn't feel natural. Plus, on the surface, it looks like Microsoft is ignoring decades of precedent and standards by doing their own thing.

After some reading, FROM/WHERE/SELECT makes sense to me. It will allow better Intellisense, it's based on standard XQuery FLWOR syntax (with some modifications - I presume because return is a reserved word in C#?), and it's honestly more intuitive to me.

There's a lot of discussion, so I'll just link to it:

Paul Vick on why VB will support SELECT/FROM and C# will support FROM/SELECT
Cyrus' Blather on the reasons for FROM/SELECT
Wesner Moise on From/Select and Select/From
Article  on the difference between XQuery and SQL syntax
Highlights from the C# Language Enhancements Chat (a bit long, but a worthwhile to skim) 
Discussion on MSDN Technical Forums

powered by IMHO 1.2

Posted by Jon Galloway | with no comments
Filed under:

[link] TransferBigFiles.com

TransferBigFiles.com makes it easy to send files that are way to big for e-mail - up to 1GB. You upload the file(s) via their website and enter the recipient's e-mail (and some optional things, like a password). The recipient gets an e-mail with a download link and has five days to pick it up. It's a free service from the good folks at Axosoft.

I've used Dropload for this kind of thing before, but their limit is 100MB. YouSendIT handles up to 1GB, but the guys behind TransferBigFiles said it's not reliable and pretty limited functionally.

Here's their post about how they put it together in 20 hours:
Just a Weekend Project! The Implementation of TransferBigFiles.com

They used Mediachase FileUploader.NET to handle the actual uploads. Looks like a good product to remember.

Posted by Jon Galloway | with no comments
Filed under:

[fix] IIS Unexpected error 0x8ffe2740 occurred - Skype on Port 80

"Unexpected error 0x8ffe2740 occurred" is not a particularly helpful error message. It just means IIS can't start a website because port 80 is already in use.

MS KB article 816944 has more info.

A likely suspect is Skype, since it will use port 80 if it's free. This is a tricky one, since IIS might work fine most of the time since it starts up before Skype, but if you stop and restart IIS Skype sees port 80 is free and snakes it. 08ffe2740 for you!

IIS-Resources has the answer but not enough Google Juice. Here's the fix:

Skype File-> Options -> Connection
Uncheck Use Port 80 as an alternative for incoming connections.
Restart default website.

Posted by Jon Galloway | 135 comment(s)
Filed under:

[tool] Make Property / Refactoring Plug-in for VS.NET

DPack's "Surround With" feature is great, but the "property" function just writes out a template. Since VS2005 will include this feature, it probably won't make it into DPack (http://www.usysware.com/forums/viewtopic.php?t=36).

But I have variables that need to be upgraded to properties NOW!!! What to do?

Stephan Meyn's Refactoring Plug-in does the trick. It's got an interesting extensibility model, but I think most people would probably work with the DXCore system that powers CodeRush instead. I don't have time to think about that, though - too busy refactoring!

[1] What is thiss? I'm sure not gonna click on that. You tell me what it does.
[2] Yes, the screenshot doesn't show "Make Property because it's not applicable for the selected text. Trust me on this one.
[3] No, I've never heard of CodeRush.
[4] No, I had no idea that VS.NET 2005 will support Refactoring without plugins.

Posted by Jon Galloway | with no comments
Filed under:

[tool] GIF Plug-in for Cropper 1.6

Cropper is one of the my most used utilities. It's a lightweight screenshot application that's easy to use, unobtrusive, and free. It's never supported GIF output, though. I actually hacked together a working but ugly GIF output option about a year ago, but it never made the official release.

The 1.6 release has great plugin support, and  E. W. Bachtal finally wrote that missing GIF Plugin . Now my battleship is fully operational!

Here's what Cropper looks like, if you haven't seen it:

Posted by Jon Galloway | with no comments
Filed under:

Avoiding DataBinder.Eval in ASP.NET

I've used this tip at least thrice, so following Phil's "Rule of Three" it's time to do something with it. Link to it now, I shall.

It would be easy to pass this one up if you're not using ASP.NET 2.0, but this is applicable for ASP.NET 1.x, too. Key points:

  1. The DataBinder.Eval syntax uses reflection and should be avoided if you can determine the object type at design time (see Scott Galloway's writeup on this here and here).
  2. A quick way to see what the Container object holds is to bind directly to it: <asp:Label ID="Label2" runat="server" Text='<%# Container.DataItem%>'></asp:Label>

Sure, the syntax got easier. Instead of the cumbersome:

<%# DataBinder.Eval(Container.DataItem, "url") %>

We get to save some strokes and remove the entire confusion around “what the heck is Container.DataItem?“:

<%# Eval("url") %>

But, this isn't all its cracked up to be. Eval() STILL uses reflection to evaluate expressions, therefore for every bound column/row displayed in your ASP.NET pages, you are adding overhead, unnecessarily. Of course, what this really means is, just like with 1.1, you should be using explicit casts to cast Container.DataItem to its actual type:

<%# ((System.Data.DataRowView)Container.DataItem)["url"]) %>

Of course the trick is to know...you guessed it...what the heck is Container.DataItem??? A quick way to find this out for various objects you may choose to employ in binding, is to bind just to Container.DataItem as a test. In the attached example I bound the GridView control to the Web configuration sections:

Configuration webConfig = System.Web.Configuration.WebConfigurationManager.OpenWebConfiguration(Request.ApplicationPath);

ConfigurationSectionCollection webConfigSections = webConfig.Sections;

GridView1.DataSource = webConfigSections;

In the GridView declaration I included these labels in a template column:

<asp:Label ID="Label2" runat="server" Text='<%# Container.DataItem%>'></asp:Label>:

<asp:Label ID="Label3" runat="server" Text='<%# ((ConfigurationSection)Container.DataItem).SectionInformation.SectionName %>'></asp:Label>

Now you can consider yourself early bound.

ConfigurationUtility.zip (60.58 KB)

Source: dasBlonde (Michele Leroux Bustamante)

Posted by Jon Galloway | 2 comment(s)
Filed under: ,

LINQ looks good, but DLINQ scares me

One of Microsoft's announcements at the PDC this week has been LINQ (Language INtegrated Query). Here's the elevator speech version: "LINQ enables developers to query objects, databases and XML using a unified programming model because LINQ makes data transforms and queries first class NET citizens."

There's a lot more about it on the MSDN LINQ page, including 101 LINQ code samples. Paul Vick has a pretty good introductory article on LINQ, as well. Barry Gervin has come up with a nice summary of how the devlopment community is reacting to LINQ .

On the whole, I think it's a nice advance. While it's nice to simplify things a bit mentally by separating procedural logic into applications and declarative, set based logic into the database, that line has been bluring lately with the introduction of the .NET CLR to SQL Server 2005 (and of course, JServer back with Oracle 8i). I'll probably cling to this distinction where performance and / or maintainability are driving factors, but I definitely see how SQL syntax queries against collections can boost productivity.

XLINQ (extensions to allow LINQ to operate on XML data) looks good, too. It's still way too hard to query XML, and XLINQ looks like it would solve that.

DLINQ scares me, though. Not for the reasons Frans Bouma and Paul Wilson are talking about, although I do respect their OR/M expertise. Paul says:What's wrong with DLinq?  Here's the list I have so far: attribute-based, MS Sql Server only, overly complicated, poor stored proc support, no server-side paging, very limited functionality, and no WinFS/OPath integration."

I'm just scared about code maintainability.

I actually have some relevant experience here - several years ago I was tasked with migrating a legacy PowerBuilder application to VB COM and T-SQL. PowerBuilder allows you to intermingle data access SQL with your procedural logic, and the application developers had made full use of this capability. Over the years, this application had grown to the point that it was mission critical, but very difficult to maintain. Additionally, there was a need to migrate from inline SQL to stored procedures for performance and data management reasons.

The project to upgrade to VB COM and T-SQL failed. The program flow was nearly impossible to follow, and the set-based and procedural logic were intermingled in such a way that separating them was just too expensive.

This is being pushed as a productivity enhancement for developers, and I totally agree with that. DLINQ makes it very easy to write database oriented applications very quickly. Properly architected applications could, of course, isolate the data access to layers or components to future-proof. DLINQ has support for custom UpdateMethods which could call stored procedures, for example. Developers and projects focused on productivity probably won't bother with any of that, in my experience. They'll code their SQL inline and move on to the next project.

I've had a similar concern with the SQL-CLR integration, but I think DLINQ is much more subject to abuse. If the SQL-CLR makes it easy to shoot yourself in the foot, DLINQ is a Junior H-Bomb Activity Kit. Take a look at the following code samples (from the LINQ Code Samples page and the DLINQ Hands On Lab), and imagine a three year old application with thousands of lines of code which use DLINQ for data access:

DLINQ - Simple Select example
// DataContext takes a connection string 
DataContext db = new DataContext("c:\\northwind\\northwnd.mdf");

// Get a typed table to run queries
Table<Customer> Customers = db.GetTable<Customer>();

// Query for customers from London
var q =
      from c 
in Customers
      where c.City == "London"
      select c;

foreach (var cust in q)
      Console.WriteLine("id = {0}, City = {1}", cust.CustomerID, cust.City);

That's not too bad, but it wouldn't be very easy to migrate a complex query to a stored procedure or move the Cities into a link table. Here's a database update:

DLINQ - Demonstration of a database update
// Use a standard connection string for updates
Northwind db = new Northwind(@"C:\Temp\northwnd.mdf");

using(TransactionScope ts = new TransactionScope()) 
{
    var q =
      from p 
in db.Products
      where p.ProductID == 15
      select p;
   
    Product prod = q.First();
                   
    
// Show UnitsInStock before update
    
Console.WriteLine("In stock before update: {0}", prod.UnitsInStock);   
    
if (prod.UnitsInStock > 0) prod.UnitsInStock--;
    db.SubmitChanges();
    ts.Complete();
    Console.WriteLine("Transaction successful");
}
Console.ReadLine();

Now imagine we've got multiple nested selects, inserts, and updates. Complicated, right?

Hey, PM,I can't figure out what's causing this weird bug and QA doesn't want to retest the binary, but I can fix it with an INSERT TRIGGER on the database...Pretty soon we've got an unmaintainable application on our hands.

Posted by Jon Galloway | with no comments
Filed under: , ,

Fun with Generics - Currying

Sriram writes about aninteresting use of C# 2.0 Generics to implement "Currying," a technique which is normally reserved for functional programming languages.

Currying is the use of virtual functions which fix an function argument to a value and remove the argument:

  In computer science, currying is the technique of transforming a function taking multiple arguments into a function that takes a single argument (the first of the arguments to the original function) and returns a new function that takes the remainder of the arguments and returns the result. The technique was named by Christopher Strachey after logician Haskell Curry, though it was invented by Moses Schönfinkel and Gottlob Frege.

Intuitively, currying says "if you fix some arguments, you get a function of the remaining arguments". So if you take the function in two variables yx, and fix y = 2, then you get the function in one variable 2x.

[Wikipedia]

It seems a lot more academic than practical to me, but it's interesting to see what the kind of thing that C# Generics will enable.

Posted by Jon Galloway | with no comments
Filed under:
More Posts Next page »