Kerberos and Web Service Preauthentication

Tags: .NET

My current project has been having various inconsistent, irreproducible 401 Unauthorized errors on our web service calls. We tried various tests, checks, changes and wild guesses, and we think we’ve landed it this time. Let’s recap the problem:

 

Symptoms

Among the several web-service calls the system makes, some consistently succeed, while some receive sporadic 401 (Unauthorized) results from the web-service.  When these are caught and retried, they will sometimes succeed. Sometimes it only succeeds after several retries.

 

Scenario

The client is an NT service written in .NET 1.1 running under the LocalSystem account, on a Win2003 domain with Kerberos authentication. This authentication scheme allows the LocalSystem account to authenticate as the computer account when accessing external resource (i.e. the web-service). NTLM authentication does not allow LocalSystem accounts to access network resources.

 

The server exposes several web-services, all with identical permissions and restrictions. We will call them the Config service and the Sessions service. 

All calls to the Config service consistently work. No retries necessary.

Calls to the Sessions service occasionally fail, and only work after several retries. 

The difference between the two calls are in a little parameter of the web-service proxy called Preauthenticate. The Config service proxy’s property was set to False, while the Sessions service proxy’s was set to True.

 

Interlude – Preauthentication & HTTP calls.

How does an HTTP call authenticate against the server? A web server could demand integrated (NTLM, Kerberos) authentication, might demand a clear-text username and password (Basic) or ignore all credentials altogether (Anonymous). The internet being the chaotic place it is, HTTP authentication follows this pattern:

1)      The client sends an HTTP request with no credentials at all.

2)      If the server refuses anonymous access, it returns a 401 Unauthorized response along with a header (WWW-Authenticate) detailing what authentication schemes it supports. This is the Challenge part of the Challenge-Response handshake.
(RFC link)

3)      The client and server perform the rest of the handshake protocol, which involves another client message, another 401 and then a client message which results in a 200 OK message, as detailed in the HTTP NTLM spec here.

 

Seeing as this is a multi-step problem, and network latency being a problem in most places, we would like to minimize all this back-and-forth every time we connect.       This is where the Preauthenticate flag comes in. If the flag is set, the Web Service proxy that .NET creates will perform the handshake the first time it tries to connect, but for each subsequent call it will send the actual authentication message immediately. This can reduce the overhead of each call, and is generally a Good Thing.

However, this optimization can cause problems under specific scenarios:

 

The Kerberos Conundrum

The default authentication scheme used by Windows starting with Windows 2000 is Kerberos, an open standard for authentication. NT4’s NTLMv2 protocol is still used when connecting to older domains or non-Active Directory networks, as a fallback mechanism if no Kerberos ticket provider is found.

Our system, as we previously mentioned, runs under the LocalSystem credentials on the client machine, and connects to the server via Kerberos authentication, which enables the service to use the machine’s credentials. This isn’t supported by the NTLM protocol, which requires a domain in order to authenticate against a domain resource.

 

Now for the problem - it appears that there’s a bug/limitation/problem with the Kerberos implementation on the client, if I understand things correctly. Even when both client and server are configured to use Kerberos and the web service proxy is set to Preauthenticate, the web request will always initially issue an anonymous handshake request.

However, since the code behind the WebClientProtocol in the CLR expects the Preauthenticate behavior, it will treat the first HTTP response as the final, actual status code returned from the request, rather than going through the motions of the handshake. This means that the initial perliminary 401 is interpreted as a real Unauthorized error from the web-service. This is what led our service to intermitent failures. I’m still not sure why it didn’t always fail – perhaps, once the 401 has been issued, the next request will send the proper credentials and succeed. Only when the session has expired will the client try the 401 again. This might explain why a stress-test we ran that attempted to connect to the service every 2 minutes didn’t experience any 401 errors – it kept the session alive, and didn’t need to establish new credentials.

 

The Solution

The solution for our problem was very simple, once the problem was found. We simply set Preauthenticate to False for all our web service calls. We pay a small price in network traffic, but it’s much, much better than the alternative.

I don’t know what’s the status of this problem, whether it’s been fixed in .NET 2.0  or if there’re thoughts of a forthcoming fix. I do know it was one hell of an annoying problem that took us days to figure out, and if I wouldn’t have been tipped in the direction of this Kerberos bug I would never have quite understood what’s going on (assuming I do now).

 

Sorry this turned out so long, and beware optimizations that might turn around and bite you!

 

No Comments