Configuration options for Remote PowerShell and WS-Management

Here’s the want list:
• to be able to run WCF and workflow services in IIS that use a basicHttpBinding.
• to scale out services in an application farm using the network load balancing service in Windows Server 2008.
• to authenticate users using Kerberos to flow the Windows Identity.
• to administer servers remotely using PowerShell.

It’s not exactly an exotic or out there set of needs, however it has been over three weeks now that I’ve been working through various attempts to get this up are running reliably.

The crux of the issue is around the use of HTTP and kerberos. To get the services to work in a load balanced environment with kerberos, a set of SPNs needed to be added to the Active Directory for the domain.The web applications hosting the service needed to run under a domain identity (e.g. MyDomain\service.expert) so they are mapped to an application pool with this identity. SPNs are then added to map the HTTP protocol to this user, rather than the machine account. In our case, four SPNs are added to the service.expert user – one for the network load balancers virtual host name and one for each server in the application farm:

HTTP/SVNLB301.ap.aderant.com
HTTP/SVEXPGG302.ap.aderant.com
HTTP/SVEXPGG303.ap.aderant.com
HTTP/SVEXPGG304.ap.aderant.com

Doing this breaks the default WinRM service configuration as the WinRM HTTP listener is running under a machine account not service.expert and so the SPN is incorrect and Kerberos negotiation fails. This is pretty much where we left off on the last posting and since then I have been looking at using HTTPS as the transport for the PowerShell remoting calls and other authentication mechanisms.

There are two options for hosting the WinRM service:

1. as a Windows Service (this is the default)
2. in IIS using a WinRM v2 features called ‘WinRM IIS Extensions’. This is an optional install in Windows Server 2008 to support the ‘fan-in’ model for PowerShell remoting which is targeted at the cloud.

Hosting the WinRM service using HTTPS is meant to be simple so long as you have an appropriate certificate installed on the server for SSL. The command is:

> winrm quickconfig -transport:HTTPS

I have never been able to get this to work. Before explaining how I did get a WinRM HTTPS endpoint working, let’s cover off the certificate.

Windows Server 2008 has a role which allows a server to act as a certificate authority (CA) for a domain. This role includes a self-service website from which any machine on the domain can request a certificate. I used this to request certificates created using the web server template with the common name (CN) set to the fully qualified domain name of the server in my application farm. The self-service website is pretty straight forward but note that the certificated is installed in the current user path, not the local machine so so you need to move it. The easiest way to see this is to use the certificate provider within PowerShell:

> cd cert:\CurrentUser\My
> ls
> cd cert:\LocalMachine\My
> ls

This will show you all of the certificates installed in the current user\my and the local machine\my stores. You can also use the management console (MMC) and add in the certificate plug-in for both the current user and local computer.

The WSMAN provider allows you to configure the WinRM service from within Powershell.

> cd WSMAN:\localhost\Listener
> new-item . -Address * -Transport HTTPS -CertificateThumbprint “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”

You need the 40 character certificate thumbprint which can be easily found by listing the certificates in cert:\LocalMachine\My. With the real thumbprint replacing the Xs, the above command will create an HTTPS listener that is hosted in the WinRM service.

To connect to the machine from a remote client, using kerberos to authenticate as the current user:

> icm -ComputerName targetServer -UseSSL -Authentication NegotiateWithImplicitCredential -ScriptBlock {get-host}

The script block is executed on the remote machine. If a test certificate has been used to set-up the HTTPS channel, then the remote call will fail. The certificate must have been issued by the domain CA, the CN must match the machine name and the revocation list is checked. It is possible to switch off these checks by adding the following parameter to the call:

> icm ... -SessionOption (new-PSSessionOption -SkipCNCheck -SkipCACheck - SkipRevocationCheck)

Any combination of the three skips can be used.

This again proved somewhat unreliable for me, due to the use of Kerberos over HTTPS to authorize the user. There are other authentication options available such as basic, which is secure over an HTTPS channel since the channel is encrypted.

The change in identity of the HTTP SPN just seemed to keep tripping me up, which made me wonder why not host the management service in IIS and then set it to run in an application pool with the same identity as our other services? Finding out how to do this took me some time and led me to the fan-in model for PowerShell mentioned earlier.

Fan-In Model
Within WinRM v2 there comes a plug-in model to allow ISVs to supply a module that allows their software to be managed via WS-Management. The PowerShell team ships such a module pwrshplugin.dll which can be found in %windir%\system32. To be able to host such a module in IIS, you need to ensure that you have the WinRM IIS Extensions option installed, I have only seen it available on Windows Server 2008 and not Windows 7.

[ On Windows Server 2008 R2, you can use the ServerManager module to check the installed features:

> Import-Module ServerManager
> Get-WindowsFeatures
]

With this option enabled, you can create a new web application and drop in a web.config file similar to the following which is discussed here:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <system.webServer>
    <system.management.wsmanagement.config>
      <PluginModules>
        <OperationsPlugins>
          <Plugin Name="PowerShellplugin" Filename="%windir%\system32\pwrshplugin.dll" SDKVersion="1" XmlRenderingType="text">
           <InitializationParameters>
                <Param Name="PSVersion" Value="2.0" />
            </InitializationParameters>
            <Resources>
                <Resource ResourceUri="http://schemas.microsoft.com/powershell/Microsoft.PowerShell" SupportsOptions="true">
                    <Capability Type="Shell" />
                </Resource>
            </Resources>
          </Plugin>
        </OperationsPlugins>
      </PluginModules>
    </system.management.wsmanagement.config>
        <security>
            <access sslFlags="Ssl" />
            <authentication>
                <anonymousAuthentication enabled="false" />
                <basicAuthentication enabled="true" />
                <windowsAuthentication enabled="true" />
            </authentication>
        </security>
        <modules>
            <add name="WSMan" />
        </modules>
  </system.webServer>
</configuration>

The web application is configured to use SSL and Basic or Windows authentication is accepted. You might need to edit your applicationhost.config file to unlock the section of the section. The web application can be mapped to an application pool that has the same identity as the other services, in our case MyDomain\service.expert so the SPNs should work.

[Note: do not set-up an HTTPS listener in both IIS and WinRM at the same time on the same certificate, if you do recycling the app pool will drop the HTTPS binding from IIS – the Windows Service WinRM gets precedence.]

To connect to the machine from a remote client (using basic authentication), the following is required:

> $secpasswd = ConvertTo-SecureString "myPassword" -AsPlainText -Force
> $mycreds = New-Object System.Management.Automation.PSCredential ("MyDomain\MyUsername", $secpasswd)
> icm -ConnectionUri https://svexpgg303.ap.aderant.com/Powershell -Authentication Basic -Credential $mycreds -ScriptBlock {get-host}

The password is captured in a secure string and then a new PSCredential object is created to contain the username and password. This is passed to the invoke-command cmdlet using the -Credential parameter. Note that we are also using the -ConnectionUri parameter.


UPDATE [2nd October 2010]: I finally got to the bottom of the 1300 error I saw in the Windows Remote Management event log thanks to this post: http://blogs.msdn.com/b/wmi/archive/2010/02/25/winrm-hosted-in-iis-fails-to-start-with-error-1300-in-event-log.aspx

The account that the application pool is using must have the ‘Generate security audits’ right granted. Also when testing, it is important to reset IIS after each change to ensure that you are running against the correct set-up.

Retesting with security set-up correctly proved that any app pool can be used and the web application path could contain subfolders.

Having managed to establish a secure connection for remote PowerShell via IIS using basic auth and HTTPS, I’ve pretty much given up on getting it to work over Kerberos. I might try just once more to do Kerberos over HTTP when the management service is hosted in IIS but I’ve already been fighting with this for way too long. I hope the above saves someone the pain I went through…

Unexpected consequences…

Having set-up a load balanced environment as per the previous post, I then discovered some knock on effects…

By changing the SPNs for HTTP to be account rather than machine specific, the remote Powershell calls were broken – so our automated deployments were broken. By default the WinRM service connection from the client to the target server is authenticated using kerberos. The communication channel is HTTP through a separate listener process and it expects a machine SPN to be registered. In our case it was expecting HTTP/LRSRV310.lr.aderant.com to be registered against the machine account LRSRV310. Instead this SPN was mapped to our application pool identity service.workflow.lr and so we were broken.

I added in the SPN mapping to the LRSRV310 machine account and remote Powershell sessions were available again however this meant duplicate SPNs in AD which is against the rules. After a little thought and some digging it turns out there are (at least) two options available to us:
1. use an HTTPS channel rather than HTTP for the WinRM service.
2. add the client machine names to the TrustedHosts list for WinRM.

I’ve tried option 2. and it works, though I think option 1 may be a more secure approach. To get option 2. to work, from a Powershell prompt:

PS> set-item WSMAN:\localhost\Client\TrustedHosts -value “*.aderant.com”

In the command above I’m using a wildcard but you can be more specific and list individual machines that you trust. Note that you need to enable the trusted hosts setting before you set up the SPNs against the application pool identity or else you won’t be able to use the WSMAN provider.

Update…
Turns out the TrustedHosts list option is not so great. It seems that this appears to work while the kerberos ticket is valid which makes it look like everything is good. The local access to WSMAN settings is available but remote access still has kerberos issues when the ticket expires. So next we will try setting up HTTPS for WinRM.

> winrm quickconfig -transport:https

However, this requires that a certificate is installed to validate the server identity. Tomorrow we will be using the certificate server for our domain to generate a certificate however not all environments will have this. I’ll also have a look at the other authentication options and try turning off kerberos support [WSMAN:\localhost\Service\Auth].

When we sort this out, I’ll post the solution.

Configuration for Kerberos

This is a summary of the voodoo required to get WCF services hosted in IIS to work with a load balancer and kerberos. This took me way longer than I had hoped to figure out so I hope I can save someone else that pain.

We have recently been running some load and stress tests against our latest Golden Gate SP1 product which supports the horizontal scale out of workflow services. This scale out capability is one of the core features of Windows Server AppFabric. Our software is designed to run in an ‘on premise’ scenario and leverages Windows integrated security for authorization of users. A major performance improvement we discovered during our original Golden Gate testing was to ensure kerberos was used rather than NTLM when performing Windows Authentication. We wanted to ensure that our new services were using kerberos for Windows authentication since we had moved some of our services from being hosted as a Windows Service to being hosted in IIS, in particular the workflow services.

Note: in addition to performance advantages, you need to use Kerberos if you want to achieve multi-hop delegation of credentials, NTLM does not support this. The resources at the end of this post discuss this further.

In this post I’m going to walk through a worked example and give a checklist to follow. In a later post I may drill down into a little more of the background, in the meantime I’ll include some additional resources at the end.

Scenario
The scenario involves three application servers that are configured into a network load balanced (NLB) cluster using NLB in Windows Server 2008. The machine names are:
• svexpgg310.ap.aderant.com
• svexpgg311.ap.aderant.com
• svexpgg312.ap.aderant.com

The virtual host name for the NLB is svnlb301.ap.aderant.com.

The NLB is set-up to load balance traffic on port 80, for our HTTP based services and the port range 18180-18199 for our Windows Services. Each of the servers runs all of the services that we support horizontal scale out for and one of the servers (310) runs the services that only support a single instance. In a typical installation we have around 15 services, rather than list out all of these I’ll concentrate on two types:
• services hosted in IIS that expose HTTP endpoints
• services hosted as Windows Services that expose net.tcp endpoints

Alongside the three application servers is a database server that hosts the ADERANT Expert database, the AppFabric monitoring database and the AppFabric workflow persistence database.

The basicHttpBinding configuration used to enable Windows authentication is as follows:

      <basicHttpBinding>
        <binding name="expertBasicHttpBinding" maxReceivedMessageSize="2147483647">
          <readerQuotas maxArrayLength="2147483647" maxStringContentLength="2147483647" />
          <security mode="TransportCredentialOnly">
            <transport clientCredentialType="Windows" proxyCredentialType="Windows">
              <extendedProtectionPolicy policyEnforcement="Never" />
            </transport>
          </security>
        </binding>
      </basicHttpBinding>

1. The servers must be in the local intranet zone of any calling machines.
As of Windows Server 2003, by default only the local intranet zone supports the passing of credentials for Windows Integrated authentication between machines. This makes sense as you rarely want to pass your Windows credentials beyond your own domain. At ADERANT we have a group policy set-up so that all machines have any machine with a name matching *.aderant.com registered in the local intranet zone.

You can explicitly name the servers for the zone, also ensure that the servers are not listed in the Trusted Sites zone.

2. Windows Services exposing WCF net.tcp endpoints must have SPNs registered for both the application server and the network load balancer addresses.

When a non-basicHttpBinding is used, such as net.tcp, the WCF infrastructure checks to ensure that the service is running under the identity that the client expects. This prevents ‘man-in-the-middle’ attacks where someone spoofs the service you want to call with their own for some nefarious purpose. When you generate a service proxy against a net.tcp endpoint you’ll see something similar to the following configuration snippet in the app.config:

<client>
  <endpoint
    address="net.tcp://myserver.mydomain.com:8003/servicemodelsamples/service/spnIdentity"
    binding="netTcpBinding"
    bindingConfiguration="netTcpBinding_ICalculator_Windows"
    contract="ICalculator"
    name="netTcpBinding_ICalculator">
    <identity>
      <servicePrincipalName value="CalculatorSvc/myServer.myDomain.com:8003" />
    </identity>
  </endpoint>
</client>

There is an identity element that specifies the expected identity of the service host. There are two different options supported: and . If your service is published on a domain and you always expect the client calling the service to be online, then the userPrincipalName is easiest to configure. The value attribute contains the identity that the service is running as, e.g. value=“ADERANT_AP\service.expert”.

Alternatively you can set a servicePrincipalName, as above. The service principal name (SPN) is broken down into three parts:

serviceClassName / address [: portNumber]

The service class name is a token that uniquely represents the service. Common service classes are HTTP and HOST, the example above is using CalculatorSvc to uniquely identify a calculation service. At ADERANT we use class names such as ExpertConfigurationSvc. After the service class name comes the machine name, e.g. SVEXPGG310. Note that the NetBIOS name and the fully qualified domain names are considered to be different, it is common place to register both. For example:

ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG310:18180

Once we have an SPN, it must be registered in Active Directory (AD) against the user account used to run the service. We recommend a service account along the lines of myDomain\service.expert to run the ADERANT services. To register this account with an SPN there is a command line tool setspn:

setspn -A ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180 service.expert

As part of our deployment tooling we automatically generate a batch file containing all the SPNs that require to be registered in AD for a given environment. An SPN must not be registered twice, this will cause errors. To see the SPNs currently registered against a user you can use the setspn tool using the -L option and passing the account name:

setspn -L service.expert

If we take our configuration service as an example, we need the following SPNs registered in AD for the scenario environment:

ExpertConfigurationSvc/SVNLB301.ap.aderant.com:18180
ExpertConfigurationSvc/SVNLB301:18180
ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG310:18180
ExpertConfigurationSvc/SVEXPGG311.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG311:18180
ExpertConfigurationSvc/SVEXPGG312.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG312:18180

If you are running a development workstation, you will often see HOST/localhost as the SPN generated by the svcutil for locally hosted WCF services. This indicates that the service is expected to be running on the local machine.

If the service needs to support delegation then the AD account used to run the service must have this enabled:

The account must also be granted ‘Log on as a service’ rights on the application server hosting the service. This can be set-up using the local machine policies admin tool or pushed out via group policy.

3. Load balanced WCF Services hosted in IIS, using HTTP bindings, must have HTTP SPNs added for the account of the application pool.

By default an SPN is created in AD for the machine account of a server running IIS, for example HTTP/SVEXPGG310. In a load balanced scenario the machine account SPN cannot be used to issue a kerberos ticket because it is different for each machine in the application farm. Instead the kerberos ticket needs to be issued using the identity of the application pool that the web service is running under. If you have multiple application pools, these must all be running under the same account. The application pool account must have SPNs registered for the HTTP service as follows:

setspn -A HTTP/svnlb301.ap.aderant.com service.expert
setspn -A HTTP/svnlb301 service.expert
setspn -A HTTP/svexpgg310.ap.aderant.com service.expert
setspn -A HTTP/svexpgg310 service.expert
setspn -A HTTP/svexpgg311.ap.aderant.com service.expert
setspn -A HTTP/svexpgg311 service.expert
setspn -A HTTP/svexpgg312.ap.aderant.com service.expert
setspn -A HTTP/svexpgg312 service.expert

Here we have both the NetBIOS and FQDNs for the servers and the load balancer.

4. Load balanced WCF services hosted in IIS, using HTTP bindings, must use the Application Pool credentials to issue kerberos tickets.

In addition to adding the SPNs in 3, now change IIS so that it uses the app pool credentials for the kerberos ticket. This can be done either through the configuration manager in IIS or from the command line.

The obscured section path is system.webServer/security/authentication/windowsAuthentication.
From a command line:
appcmd set config /section:windowsAuthentication /useAppPoolCredentials:true

This has to be set on all of the application servers within the application farm.

While in IIS configuration, it is also worth setting authPersistNonNTLM to true, see http://support.microsoft.com/kb/954873 for details.

5. Enabled Windows Authentication on the required web applications in IIS.
There are two parts to this, the first of which is to ensure that the Windows Authentication provider for IIS is installed. This can be checked in the Windows features control panel.

The next step isto enable the Windows Authentication on the website itself. From the dashboard for the site, open the Authentication manager and then ensure that Windows Authentication is enabled:

While you are here, it’s worth checking the advanced properties of the Windows Authentication (available from the context menu) to ensure that Kernel-mode authentication is set.

This can also be set programmatically:

appcmd set config “Default Web Site/MyWebService” -section:system.webServer/security/authentication/windowsAuthentication /enabled:true /commit:apphost

Wrap up & Testing
Those are the key steps required to get kerberos working in a load balanced environment:
1. ensure the servers are in the local intranet zone.
2. create and register SPNs for net.tcp services for all app servers and the load balancer.
3. create and register HTTP SPNs for all app servers and the load balancer.
4. take care to avoid duplicate SPNs.
5. understand that NetBIOS and FQDNs require separate SPNs.
6. set useAppPoolCredentials to true on all IIS servers in the app farm.
7. run all application pools using a common domain service account, give this account permission to delegate and log on as a service.
8. ensure the web applications for the services have Windows authentication enabled.

It’s mostly straight forward once you’ve been through the steps once.

The easiest tool to test with is a browser and Fiddler. From within Fiddler you can look at the authorization headers for the HTTP requests which will show you if kerberos or NTLM is used. We expose an OData service which requires Windows authentication, it was very easy to trace the authentication negotiation going on for this site within Fiddler.

Resources
Security in WCF (MSDN Magazine): http://msdn.microsoft.com/en-us/magazine/cc163570.aspx

Patterns & Practices Kerberos Overview: http://msdn.microsoft.com/en-us/library/ff649429.aspx

Patterns & Practices WCF Security Guide: http://msdn.microsoft.com/en-us/library/ff650794.aspx