Code Puzzle #2 - Generate random fake surnames

I'm talking to you, Mr. Thymmet! Step forward and be counted, Ms. Betusen! It's time for another Code Puzzle! My solution will be posted on Friday, 1/12. Get to work!

UPDATE: Read the recap here

Task: Write a single function which generates fake but passable surnames.

How does it go back together again?Rules / Notes:

  • Post your solution in the comments along with 25 results.
  • Names should be pronouncable and passable as real surnames. If you saw them on a business card, your reaction should at the most be an inward chuckle.
  • Your solution should be capable of generating a large number of random names. (My solution generated 1.7+ million distinct names in a simple test.)
  • My solution creates surnames which would make sense in the US. Feel free to write something target different ethnicities, but please let us know which you're targeting.
  • Keep it simple. My solution is 33 lines and 2,700 characters, including some comments. My function takes no parameters and returns a string.
  • Use whatever language you want; please specify if it's not VB.NET or C# using .NET 2.0. Yeah, I know you can probably write this in 8 characters in PowerShell, Ruby, Python, and Boo. So do it!
  • Your code cannot reference any external resouces (no internet access, no file or data access, etc.).
  • Your code doesn't need to worry about filtering obscenities (but please keep your obscene results to yourself).

Judging criteria:

This is a fixed contest; my solution will win first place. However, to achieve glory in this contest you must:

  • Generate passable surnames
  • Use concise yet readable code
  • Try to be elegant, but things like magic numbers and hardcoded strings are fine by me

Sample output:

for (int i = 0; i < 50; i++)
Console.WriteLine(GenerateSurname());

Greta
Slease
Sloje
Seethoson
Sleeslo
Tethe
Heshoez
Lackesm
Sorommees
McTeame
Louwheth
Sheysset
Sakneesm
Lathet
Tosoatesm
Betusen
McTasea
Fata
Thymmet
Degha
Thateyez
Thethatea
McShatte
Tethes

(Yes, these look silly when listed together. The idea is that they shouldn't stand out on a page in the phonebook. Let's see you do better!)

References:

UPDATE: Is this for spammers?
No. I was working on a sample application and wanted to generate a few hundred names to demonstrate paging through data. I thought about it, and I don't believe this will help spammers. It's very easy for spammers to just use lists of actual names (such as the common surnames list in the references above). Spam filtering algorithms can't block e-mail from common surnames, or they'd be blocking most of their legitimate e-mail traffic. It would be a waste of time for spammers to bother with generating random surnames.

14 Comments

  • Is this something for spammers ?
    More algorithms, worse for spam filter ?

  • My name's not silly, dangit! Stop writing me into your code puzzles!

  • Hmmm - none of the sample names start with a vowel? ;)

  • This is great! and funny too... i had alot of laughts building my solution.. but its longer than your solution Jon.. hmmm... time to refactor!

  • @Wim - You're right. I always started with consonants. It might work better if I allow some vowels to start. I'll give it a shot.

    @Keith - Great! I was worried I wouldn't get any entries!

  • That was easy!
    return "McSa" + Regex.Replace(Guid.NewGuid().ToString(),"\\d+|-","e") + "erson";

  • @Wim - Interesting! Your solution was very similar to mine. Mine has a few tweaks to give higher weight to more common vowel and consonant combinations, but they're remarkably similar.

  • Hi Jon - was thinking of weighting, but that way I wouldn't have got my solution down to 20 odd lines. ;-)

    And after all, it was just to generate test surnames. I could probably 'tune' the appropriate arrays a bit more, but hey, it was only a 5 minute job.

  • I've done two versions, but each of them is 140+ lines long, so I guess I'm out. Anyway, it's been a lot of fun!! I'll post them tomorrow.

  • @Carlos - Looking forward to seeing what you came up with!

  • Here's my solution using digraphs.

    using System;
    using System.Text.RegularExpressions;
    namespace SurnameGenerator
    {
    class Program
    {
    public static void Main(string[] args)
    {
    for (int count = 0; count < 25; count++ )
    {
    Console.WriteLine(GenerateName());
    }
    Console.ReadKey();
    }

    private static Regex invalidNameRegex = new Regex(@"(?:([aiuy])\1)|(?:(\w\w)\2)|([^aeiouy]{3,})|([aeiouy]{3,})|(\wy\w)|(^nd)|(^nt)|(^rt)|(^rs)|(^ht)|([aiou]$)|(tw$)");
    private static string[] digraphs = new string[]
    { "en", "re", "er", "nt", "th", "on", "in", "te", "an", "or", "st",
    "ed", "ne", "ve", "es", "nd", "to", "se", "at", "ti", "ar", "ee",
    "rt", "as", "co", "io", "ty", "fo", "fi", "ra", "et", "le", "ou",
    "ma", "tw", "ea", "is", "si", "de", "hi", "al", "ce", "da", "ec",
    "rs", "ur", "ni", "ri", "el", "la", "ro", "ta"
    };
    private static Random random = new Random();

    public static string GenerateName()
    {
    string potentialName;
    do
    {
    potentialName = "";
    for (int digraphCount = 0; digraphCount < random.Next(2, 5); digraphCount++)
    {
    potentialName += digraphs[random.Next(0, digraphs.GetUpperBound(0))];
    }
    } while (invalidNameRegex.IsMatch(potentialName));
    return potentialName.Substring(0, 1).ToUpper() + potentialName.Substring(1);
    }
    }
    }

    And the names generated:
    Nicolaed
    Iont
    Esnith
    Erelec
    Asraed
    Raerse
    Rost
    Vecoas
    Ceener
    Hisior
    Raetelde
    Dastur
    Toed
    Onende
    Arfind
    Eendel
    Fitouren
    Alraur
    Erenet
    Elce
    Five
    Orisre
    Altirais
    Eerore
    Ator

  • Hmmm, the blog seems to have cut off the end of my regular expression.

    here the line that got cut off, more readable:

    private static Regex invalidNameRegex = new Regex(@"(?:([aiuy])\1)|
    (?:(\w\w)\2)|
    ([^aeiouy]{3,})|
    ([aeiouy]{3,})|
    (\wy\w)|
    (^nd)|
    (^nt)|
    (^rt)|
    (^rs)|
    (^ht)|
    ([aiou]$)|
    (tw$)", RegexOptions.IgnorePatternWhitespace);

  • yellow is my favorite color.
    14 is my favorite number

  • links/go.txt;25;25

Comments have been disabled for this content.