Split Name into First and Last

Not too long ago we received a text file of customer names. The problem with the file is it just had one field "Name". One of the requirements of our database application was to have a first name field and a last name field so our customer could search for their customers on either field. As a result we needed to take this text file and split out the data into the two fields from the one.

As we looked into this file, we noticed that the data was in complete disarray! Some of the names were entered like "First Last", and some in the format "Last, First". So, we needed to perform a little string parsing to get the data into our database table. The following is a simplified version of the routine we wrote to attempt to handle each format and create a new Customer object that we could then use to populate our database table.

First we needed to create a Customer class. We created an instance of this Customer class and passed in the "name" value read from the text file into the method of the Customer class named NameSplit().

C#
class Customer
{
  public string FirstName { get; set; }
  public string LastName { get; set; }

  public void NameSplit(string name)
  {
    if(name.Length > 0)
    {
      // Check for a comma
      if(name.IndexOf(",") > 0)
      {
        LastName = name.Substring(0, name.IndexOf(",")).Trim();
        FirstName = name.Substring(name.IndexOf(",") + 1).Trim();
      }
      else if(name.IndexOf(" ") > 0)
      {
        FirstName = name.Substring(0, name.IndexOf(" ")).Trim();
        LastName = name.Substring(name.IndexOf(" ") + 1).Trim();
      }
    }
  }
}


Visual Basic
Public Class Customer
  Private mFirstName As String
  Private mLastName As String

  Public Property FirstName() As String
    Get
      Return mFirstName
    End Get
    Set(ByVal Value As String)
      mFirstName = Value
    End Set
  End Property

  Public Property LastName() As String
    Get
      Return mLastName
    End Get
    Set(ByVal Value As String)
      mLastName = Value
    End Set
  End Property

  Public Sub NameSplit(ByVal name As String)
    If name.Length > 0 Then
      ' First check for a comma to see if they entered Last, First
      If name.IndexOf(",") > 0 Then
        mLastName = name.Substring(0, name.IndexOf(",")).Trim()
        mFirstName = name.Substring(name.IndexOf(",") + 1).Trim()
      ElseIf name.IndexOf(" ") > 0 Then
        mFirstName = name.Substring(0, name.LastIndexOf(" ")).Trim()
        mLastName = name.Substring(name.LastIndexOf(" ") + 1).Trim()
      End If
    End If
  End Sub
End Class

To test out this method, use the following code:

C#
private void TestCustomerSplit()
{
  Customer cust;

  cust = new Customer();
  cust.NameSplit("Bruce Jones");
  MessageBox.Show(cust.FirstName + " " + cust.LastName);

  cust = new Customer();
  cust.NameSplit("Jones, Bruce");
  MessageBox.Show(cust.FirstName + " " + cust.LastName);
}

Visual Basic
Private Sub TestCustomerSplit()
  Dim cust As Customer

  cust = New Customer()
  cust.NameSplit("Bruce Jones")
  MessageBox.Show(cust.FirstName & " " & cust.LastName)

  cust = New Customer()
  cust.NameSplit("Jones, Bruce")
  MessageBox.Show(cust.FirstName & " " & cust.LastName)
End Sub

Another way to accomplish this task would have been to use the Split() method of the String class. You can split a string on a specific character like a space or a comma. You then end up with an array of each name. I will leave that as an exercise for you to do! :)

You just have to love the string parsing capabilities of .NET! Just a little bit of code was all it took to make this job of turning different formats into something that was much easier to work with.

Good Luck With Your Coding,
Paul Sheriff

** SPECIAL OFFER FOR MY BLOG READERS **
Visit http://www.pdsa.com/Event/Blog for a free eBook on "Fundamentals of N-Tier".

3 Comments

  • No offence - but that is way to simple to correctly parse the type of data you are talking about.

    Many many years ago I worked on similar parsing for large (100,000+ people per database) files for snail mailouts, and the parser ended up having to handle hundreds of different cases of formatting to achieve 99+ percent accuracy.

    The most unfortunate was that in one of the databases the company had inserted "DEAD" after the name of anyone deceases. So we unfortunately sent to a number of customers letters starting:
    Dear Mr Dead,

    oops.

    Good luck.

    Dave

  • Dave,

    None taken. This is an excellent point! A lot of times we often focus on just the plain technical side of things, but we need to consider business rules and other factors in our jobs.

    In fact for this project, we had set a flag on each record that meant that the record needed to be checked for accuracy. Luckily, there were only a couple of thousands records, so the customer was able to go through them by hand. Otherwise we would have also have had to write many more business rules to check for just the situation you mentioned.

    Thanks!
    Paul

  • Helpful code. Thanks

Comments have been disabled for this content.