To finish off the DEV404 session Pete and I presented at TechEd NZ, I gave a brief run through of the steps required to get Windows Authentication working in a load balanced environment using kerberos. Given the number of camera phones that appeared for snaps I’m going to assume this is a common problem with a non-intuitive solution…
The product I work on is an on-premise enterprise solution that uses the Windows Identity to provide an authenticated credential against which to authorize user requests. We host our services in IIS/Windows Server AppFabric and take advantage of the Windows Authentication provided by IIS. This allows one of two protocols to be used: kerberos and NTLM, which have quite separate characteristics.
Why Use Kerberos?
There are two main reasons we want to use kerberos over NTLM:
1. Performance: NTLM uses a challenge response pattern for authentication which leads to a high network utilization. During performance testing we saw a high volume of NTLM challenges which ultimately throttled our ability to serve requests. Kerberos uses tickets which can be cached permitted a better performing protocol.
1. Double hops: NTLM does not flow credentials – the canonical example is a user requesting serviceA on server1 to access a secured resource on server2. Server1 cannot flow the users identity to server2.
Kerberos and Load Balancing
We want to run our services within a load balanced cluster to avoid single points of failure and to be able to grow resources to meet demand as required, without having to adopt bigger tin. The default configuration of IIS does not encourage this… the Application Pools run as a local machine account. This is a significant issue for Kerberos because of the manner in which the protocol encrypts the tickets passed between client, TGS and target server. The password of the account running the service is used to encrypt tickets so that only a process running under that account can decrypt the message. The default use of a machine specific account prevents a ticket granting access to serviceX on server A also being used to access serviceX on server B.
The following steps are required to fix this:
1. Use a common domain account for the applications pools.
We use a DOMAIN\service.expert account to run our services. This domain account is granted log on as a service and log on as a batch job rights on each of the application servers.
2. Register an SPN mapping the service class to the account.
We run our services on HTTP and so register the load balancer address with the domain account used to run the services:
>setspn -a HTTP/clusteraddress serviceAccount
We are using the WCF BasicHttpBinding which does not require the client to ensure the service is running as a particular user (to prevent man in the middle attacks). If you are using any other type of binding then the client needs to state who it expects the service to be running as.
3. Configure IIS to use the application pool account rather than a machine account
system.webServer/security/authentication/windowsAuthentication useAppPoolCredentials must be set to true.
4. Configure IIS to allow kerberos authentication tokens to be cached
system.webServer/security/authentication/windowsAuthentication authPersistNonNTLM must be set to true.
See also http://support.microsoft.com/kb/954873
5. Ensure the cluster address is considered to be in the Local Intranet zone
Kerberos tokens are not supported in the Internet zone, therefore the URL for your services must be considered to be trusted. The standard way to implement this is to roll out a group policy that adds your domain to the local intranet zone settings.
The slide deck for the talk is available from http://public.me.com/stefsewell/