Very odd OutOfMemoryException issue with GetHashCode(string)

In .NET there's a class called StringComparer. It has some handy helpers, like the InvariantCultureIgnoreCase StringComparer. These classes also implement a method called GetHashCode(string), which produces the hashcode in the scope of the comparer, so if you're calling that method on the InvariantCultureIgnoreCase variant, you get the hashcode for that scope.

This is handy as hashcodes are important, for example to find duplicates. We recently ran into an issue with this, as passing a large string to this method caused it to throw an OutOfMemoryException, but ... there was plenty of memory left. What was even stranger was that the length of the string differs per appdomain and even machine!

So I wrote a little app, sourcecode is below. It fiddles with digits to find the maximum string length one can pass to GetHashCode before it throws this exception. Of course, this is of little use, but it illustrates the problem and is a good repro-case for Microsoft as well. The code below will crash with an OutOfMemoryException as it will test the found length by increasing it with 1. I'll post this to Connect (yes, I'm that naive, but perhaps this time they'll fix it). Tested on .NET 3.5 SP1 and XP sp3 as well as .NET 2.0 and XP sp3 (I'm pretty sure the error is in Win32, so it might be OS dependent even).

using System;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace OOMTester
{
    public class Program
    {
        static void Main(string[] args)
        {
            int digitIndex = 0;
            char[] digits = new char[8];
            StringComparer comparer = StringComparer.InvariantCultureIgnoreCase;
            while(digitIndex<digits.Length)
            {
                for(int i=9;i>=0;i--)
                {
                    digits[digitIndex] = i.ToString()[0];
                    for(int j=digitIndex+1;j<digits.Length;j++)
                    {
                        digits[j] = '0';
                    }
                    int length = Convert.ToInt32(new string(digits));
                    bool succeeded = false;
                    try
                    {
                        int hashCode = comparer.GetHashCode(new string('X', length));
                        succeeded = true;
                    }
                    catch(OutOfMemoryException)
                    {
                        // failed.
                    }
                    catch(ArgumentException)
                    {
                        // out of range
                    }
                    if(succeeded)
                    {
                        digitIndex++;
                        Console.WriteLine("Digit index increased: {0}. Full digits: {1}", 
                                    digitIndex, new string(digits));
                        break;
                    }
                }
            }

            Console.WriteLine("MaxLength: {0}", new string(digits));

            int maxLength = Convert.ToInt32(new string(digits));
            string xmlData = new string('X', maxLength);
            int hashcode = comparer.GetHashCode(xmlData);
            Console.WriteLine("Length: {0}. Hashcode: {1}", xmlData.Length, hashcode);
            maxLength++;
            xmlData = new string('X', maxLength);
            hashcode = comparer.GetHashCode(xmlData);
            Console.WriteLine("Length: {0}. Hashcode: {1}", xmlData.Length, hashcode);
        }
    }
}


Update: Connect issue.

Published Wednesday, December 02, 2009 11:12 AM by FransBouma

Comments

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 5:35 AM by Davy Landman

That is  strange...

Using reflector the call eventually gets to System.Globalization.TextInfo.nativeGetHashCodeOrdinalIgnoreCase ..

  [MethodImpl(MethodImplOptions.InternalCall)]

  private static extern unsafe int nativeGetHashCodeOrdinalIgnoreCase(void* pNativeTextInfo, string s);

I checked the .NET source release by MS, and couldn't find the source for this native method..

So probably something strange with pointers.. good luck @ connect..

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 7:27 AM by Ramon

Well.. it stops on my system with a length of 26336976. Which is one contigious memory allocation of 26336976*2/1024/1024 = 50.23MB.

The length varies per run.

What is also interesting is that your sample continues execution until it gets out of scope. You are catching the OutOfMemoryException but aren't throwing it when you are finished handling the exception. But in the end it gets thrown again by .net when out of scope.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 7:39 AM by FransBouma

@Ramon: the OutofMemory catch is there to force the loop to try another digit. So the exception is expected, as the value at that point represented by the digits is too high. So it's of no use to throw that, as it's expected.

The exception you get at the end is the one thrown in the last GetHashCode in line 59.

My guess is that malloc somewhere in the native code isn't allowed to allocate a block of that size. Which is kind of weird considering the limited length of the string compared to the main memory size.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 7:43 AM by Samper Geoffrey

In clr 1.0 to do caseinsensitive string comparision MS advised to use the  CaseInsensitiveComparer. In 2.0 they advised to use the ordinalIgnoreCase. You can find it at

msdn.microsoft.com/.../ms973919.aspx

I run the sample again with StringComparer.OrdinalIgnoreCase and I everything works fine

maybe this help?

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 7:45 AM by Yann Schwartz

Is your memory fragmented or you're just running the code as is? OOM's can be thrown when there's not enough contigous memory to allocate. Could be interesting to fire up windbg or vmmap to take a look at the process memory topology.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 7:46 AM by PSteele

Based on Ramon's comment, I set a breakpoint on line 33 (where you catch the OutOfMemoryException) and I hit that consistently with length = 70,000,000.  32-bit Vista SP1 with 4GB of RAM.  Intel Core 2 Duo.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 8:00 AM by FransBouma

@Samper: good catch! It indeed works without a problem with ordinalignorecase. We can't use that (as the values passed in are entity field values) but nevertheless, it's a step forward for the people who run into this as well. We found out we didn't need the hashcode of a string that long based on the case sensitivity, so we implemented a workaround. Not sure if the ordinal variant is also working for turkish text...

@Yann: no fragmentation, I have more than a gig free (physical mem).

@Patricksteel: hmm. It might be my little test code is a little buggy, the idea is that it goes from left to right and decreases the digit from 9 to 0 and when a digit succeeds in not throwing OutOfMemory, it moves on to the next. This is rather obscure, but it's an automated routine to find the value.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 8:25 AM by rajbk

Getting different exceptions with different hardware/software

Test on VS 2008 3.5 SP1 Windows 7 64 bit. maxLength is 79536432.

System.ArgumentException was unhandled

 Message="Value does not fall within the expected range."

 Source="mscorlib"

 StackTrace:

      at System.Globalization.CompareInfo.nativeGetGlobalizedHashCode(Void* pSortingFile, String pString, Int32 dwFlags, Int32 win32LCID)

      at OOMTester.Program.Main(String[] args) in C:\Labs\HasCodeBug\HasCodeBug\Program.cs:line 59

Test on VS 2010 Beta 2 4.0 Windows 7 32 bit. maxLength is 21672998.

System.ArgumentException was unhandled

 Message=Object must be of type String.

 Source=mscorlib

 StackTrace:

      at System.Globalization.CompareInfo.InternalGetGlobalizedHashCode(IntPtr handle, String localeName, String source, Int32 length, Int32 dwFlags)

      at System.Globalization.CompareInfo.GetHashCodeOfString(String source, CompareOptions options)

      at System.CultureAwareComparer.GetHashCode(String obj)

      at OOMTester.Program.Main(String[] args) in C:\Labs\HasCodeBug10\HasCodeBug\Program.cs:line 59

Raj

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 8:41 AM by FransBouma

@Raj: that range issue is indeed weird as well, it's not documented at all, I had to add a check in my little app for that to be able to test values > 50mil.

the .net 4.0 crash is interesting. The error is different and doesn't make sense.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 9:17 AM by Eric Means

Frans, I get the same exact crash as Raj does on .NET 3.5 SP1 (all updates).

I'm also on Win7 (x64, 8GB RAM), and I get almost the exact same MaxLength as he does as well (79536431 instead of 79536432).

At the point where it crashes, the app's commit size (in Task Manager) is well over 4GB.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Wednesday, December 02, 2009 9:26 AM by FransBouma

@Eric / Raj: so it's really related to 32bit, not 64bit (which is not that weird, considering 64bit os-es have a bigger addressable space).

Strange about the commit charge. On 32bit it gets hardly above 80MB.

# re: Very odd OutOfMemoryException issue with GetHashCode(string)

Monday, December 21, 2009 8:08 AM by Jax

awwww cmon, MS connect isn't _that_ bad. They even eventually re-opened my real odd datetime picker in a contextMenuStrip in Vista bug. It's just a matter of time.