Very odd OutOfMemoryException issue with GetHashCode(string)

In .NET there's a class called StringComparer. It has some handy helpers, like the InvariantCultureIgnoreCase StringComparer. These classes also implement a method called GetHashCode(string), which produces the hashcode in the scope of the comparer, so if you're calling that method on the InvariantCultureIgnoreCase variant, you get the hashcode for that scope.

This is handy as hashcodes are important, for example to find duplicates. We recently ran into an issue with this, as passing a large string to this method caused it to throw an OutOfMemoryException, but ... there was plenty of memory left. What was even stranger was that the length of the string differs per appdomain and even machine!

So I wrote a little app, sourcecode is below. It fiddles with digits to find the maximum string length one can pass to GetHashCode before it throws this exception. Of course, this is of little use, but it illustrates the problem and is a good repro-case for Microsoft as well. The code below will crash with an OutOfMemoryException as it will test the found length by increasing it with 1. I'll post this to Connect (yes, I'm that naive, but perhaps this time they'll fix it). Tested on .NET 3.5 SP1 and XP sp3 as well as .NET 2.0 and XP sp3 (I'm pretty sure the error is in Win32, so it might be OS dependent even).

using System;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace OOMTester
{
    public class Program
    {
        static void Main(string[] args)
        {
            int digitIndex = 0;
            char[] digits = new char[8];
            StringComparer comparer = StringComparer.InvariantCultureIgnoreCase;
            while(digitIndex<digits.Length)
            {
                for(int i=9;i>=0;i--)
                {
                    digits[digitIndex] = i.ToString()[0];
                    for(int j=digitIndex+1;j<digits.Length;j++)
                    {
                        digits[j] = '0';
                    }
                    int length = Convert.ToInt32(new string(digits));
                    bool succeeded = false;
                    try
                    {
                        int hashCode = comparer.GetHashCode(new string('X', length));
                        succeeded = true;
                    }
                    catch(OutOfMemoryException)
                    {
                        // failed.
                    }
                    catch(ArgumentException)
                    {
                        // out of range
                    }
                    if(succeeded)
                    {
                        digitIndex++;
                        Console.WriteLine("Digit index increased: {0}. Full digits: {1}", 
                                    digitIndex, new string(digits));
                        break;
                    }
                }
            }

            Console.WriteLine("MaxLength: {0}", new string(digits));

            int maxLength = Convert.ToInt32(new string(digits));
            string xmlData = new string('X', maxLength);
            int hashcode = comparer.GetHashCode(xmlData);
            Console.WriteLine("Length: {0}. Hashcode: {1}", xmlData.Length, hashcode);
            maxLength++;
            xmlData = new string('X', maxLength);
            hashcode = comparer.GetHashCode(xmlData);
            Console.WriteLine("Length: {0}. Hashcode: {1}", xmlData.Length, hashcode);
        }
    }
}


Update: Connect issue.

11 Comments

  • That is strange...

    Using reflector the call eventually gets to System.Globalization.TextInfo.nativeGetHashCodeOrdinalIgnoreCase ..

    [MethodImpl(MethodImplOptions.InternalCall)]
    private static extern unsafe int nativeGetHashCodeOrdinalIgnoreCase(void* pNativeTextInfo, string s);

    I checked the .NET source release by MS, and couldn't find the source for this native method..

    So probably something strange with pointers.. good luck @ connect..


  • Well.. it stops on my system with a length of 26336976. Which is one contigious memory allocation of 26336976*2/1024/1024 = 50.23MB.

    The length varies per run.

    What is also interesting is that your sample continues execution until it gets out of scope. You are catching the OutOfMemoryException but aren't throwing it when you are finished handling the exception. But in the end it gets thrown again by .net when out of scope.

  • @Ramon: the OutofMemory catch is there to force the loop to try another digit. So the exception is expected, as the value at that point represented by the digits is too high. So it's of no use to throw that, as it's expected.

    The exception you get at the end is the one thrown in the last GetHashCode in line 59.

    My guess is that malloc somewhere in the native code isn't allowed to allocate a block of that size. Which is kind of weird considering the limited length of the string compared to the main memory size.

  • Is your memory fragmented or you're just running the code as is? OOM's can be thrown when there's not enough contigous memory to allocate. Could be interesting to fire up windbg or vmmap to take a look at the process memory topology.

  • Based on Ramon's comment, I set a breakpoint on line 33 (where you catch the OutOfMemoryException) and I hit that consistently with length = 70,000,000. 32-bit Vista SP1 with 4GB of RAM. Intel Core 2 Duo.

  • @Samper: good catch! It indeed works without a problem with ordinalignorecase. We can't use that (as the values passed in are entity field values) but nevertheless, it's a step forward for the people who run into this as well. We found out we didn't need the hashcode of a string that long based on the case sensitivity, so we implemented a workaround. Not sure if the ordinal variant is also working for turkish text...

    @Yann: no fragmentation, I have more than a gig free (physical mem).

    @Patricksteel: hmm. It might be my little test code is a little buggy, the idea is that it goes from left to right and decreases the digit from 9 to 0 and when a digit succeeds in not throwing OutOfMemory, it moves on to the next. This is rather obscure, but it's an automated routine to find the value.


  • Getting different exceptions with different hardware/software

    Test on VS 2008 3.5 SP1 Windows 7 64 bit. maxLength is 79536432.

    System.ArgumentException was unhandled
    Message="Value does not fall within the expected range."
    Source="mscorlib"
    StackTrace:
    at System.Globalization.CompareInfo.nativeGetGlobalizedHashCode(Void* pSortingFile, String pString, Int32 dwFlags, Int32 win32LCID)
    at OOMTester.Program.Main(String[] args) in C:\Labs\HasCodeBug\HasCodeBug\Program.cs:line 59



    Test on VS 2010 Beta 2 4.0 Windows 7 32 bit. maxLength is 21672998.

    System.ArgumentException was unhandled
    Message=Object must be of type String.
    Source=mscorlib
    StackTrace:
    at System.Globalization.CompareInfo.InternalGetGlobalizedHashCode(IntPtr handle, String localeName, String source, Int32 length, Int32 dwFlags)
    at System.Globalization.CompareInfo.GetHashCodeOfString(String source, CompareOptions options)
    at System.CultureAwareComparer.GetHashCode(String obj)
    at OOMTester.Program.Main(String[] args) in C:\Labs\HasCodeBug10\HasCodeBug\Program.cs:line 59

    Raj

  • @Raj: that range issue is indeed weird as well, it's not documented at all, I had to add a check in my little app for that to be able to test values > 50mil.

    the .net 4.0 crash is interesting. The error is different and doesn't make sense.

  • Frans, I get the same exact crash as Raj does on .NET 3.5 SP1 (all updates).

    I'm also on Win7 (x64, 8GB RAM), and I get almost the exact same MaxLength as he does as well (79536431 instead of 79536432).

    At the point where it crashes, the app's commit size (in Task Manager) is well over 4GB.

  • @Eric / Raj: so it's really related to 32bit, not 64bit (which is not that weird, considering 64bit os-es have a bigger addressable space).

    Strange about the commit charge. On 32bit it gets hardly above 80MB.

  • awwww cmon, MS connect isn't _that_ bad. They even eventually re-opened my real odd datetime picker in a contextMenuStrip in Vista bug. It's just a matter of time.

Comments have been disabled for this content.