The case of the Fiddler heisenbug

There is a presentation I give to our graduates during their first week with us, the second slide is:

EverythingYouKnowIsWrong

This is taken from the multi-media overload that was U2s Zoo TV tour. I use it to try to get our graduates to accept that they are really back at the start of their learning process. This is pretty much how I felt a week or two back when one of our consultants said that they were seeing lots of HTTP 401 authentication traffic while our application was running. I’d personally spent a lot of time over the years trying to make sure that we were as efficient as possible so I was sceptical to say the least…

Background

The services architecture for the product I work on follows the Command Query Responsibility Separation approach which I’ve talked about before. In summary we fetch data from an OData service provided by WCF Data Services and then make updates via a suite of services implemented using regular SOAPy WCF. We closely monitor the message exchange between our applications and services to ensure that we aren’t too chatty, messages aren’t too big and so on – we do this using the excellent Fiddler. Many moons ago, I spent quite some time getting my head around how to correctly configure IIS and WCF to use Kerberos to allow the services to be scaled out over a web farm. By now I’ve run through this on numerous test environments and real world environments so I was pretty confident I know how it works.

The Problem

Our software runs on-premise within the walled garden of the corporate network. We support some of the largest law firms in the world and so on occasion have to deal with some very wide area networks. The connection from desktop to server can take place over long distances with the characteristics of high latency and low bandwidth; any messaging overhead can be painful. For years now we’ve used Fiddler to look at our services as all the call activated services use HTTP. At one client, Fiddler was not working [which turned out to be a conflict with the McAfee software they used] and so they used Wireshark instead. When observing the HTTP traffic in Wireshark, our consultants and the client saw many HTTP 401 authentication responses, far more than we expected. Each 401 response results in additional latency delay and requires additional messages to be exchanged between the client and the server. In our testing to date, we believed we had tuned the services to require only a single 401 authentication response and then to cache and present the credentials on each subsequent request.

 

TL;DR

To stop a WCF Data Services request, secured using Windows Authentication, requiring authentication on every call – you need to set the PreAuthenticate flag to true on the HttpWebRequest via the SendingRequest2 event on the generated context. Fiddler (and Web Proxy in the Microsoft Message Analyzer) hides this from you because it implements a connection pool of Keep Alive connections.

 

Reproducing the issue

The first task was to reproduce the behaviour inside one of our test environments. I’m fortunate to have a very well spec’d HP Z420 on my desk which is a great Hyper-V server. Inside Hyper-V I have a private domain set up which has a couple of load balanced application servers running our software. First off, I ran the client software on both Windows 7 and Windows 8.1 with Fiddler running in the background, no sign of the additional 401s. I then switched over to using a lower level network monitoring but rather than using Wireshark, I decided to try out the Microsoft Message Analyzer. This is Microsoft’s replacement for the Network Monitor tool, it provides a number of different filters, two of which were of interest:

  • web proxy – same deal as Fiddler, looking at HTTP
  • local link layer – all traffic on the NIC

Using the web proxy produced the same results as Fiddler however using the local link layer filter showed lots of additional 401 responses – when I ran the Message Analyzer with both web proxy and the local link layer filters there was no additional 401s. We had hit a Heisenbug, when observing the HTTP traffic through a web proxy, the proxy was changing the behaviour of the traffic.

Confirm our current understanding

My faith in our current collective understanding of what was happening was pretty shaken so I ran through the various settings that I previously thought would avoid these 401s:

1. Is the URL of the service trusted? Windows must consider the service URL to be trusted to pass Kerberos tickets. Any easy way to check the zone of any URL is the following code snippet:

var zone = System.Security.Policy.Zone.CreateFromUrl("http://wsakl001013.ap.aderant.com/Expert_Local");
Console.WriteLine(zone.SecurityZone);

If necessary, add the service host URL or a matching pattern to the Local Intranet Zone via IE:

In this example, *.aderant.com has been added to the local intranet zone.

 

2. Are the load balanced services running as a domain account? Does this account have an appropriate HTTP SPN registered against it?

 

3. Do the various IIS web applications have the useAppPoolCredentials flag set in configuration? This instructs IIS to expect the Kerberos SGT (service granting ticket) to be encrypted using the credentials of the account used by the mapped application pool, rather than the default machine account.

 

4. Is Kerberos configured to use a transport session rather than a connection per call for authentication? This is set in IIS against the web application using the authPersistNonNTLM setting.

This adds a Persistent-Auth header to the HTTP response (seen here using Message Analyzer):

image

These settings are available from within the IIS Manager using the Configuration Editor:

IISConfigEditor

Navigate to the system.webServer/security/authentication/windowsAuthentication settings:

image

Set the properties as required. If you want to programmatically set these values via script, IIS will helpfully generate the scripts for you. Look over on the right hand side of the Configuration Editor and you’ll see a ’Generate Script’ option.

image

Clicking on this will generate a change script for you in a number of technologies, I tend to favour PowerShell:

image

All this checked out on my environment but I wanted to ensure that NTLM was not in play (here). To do this I enabled NTLM logging on the domain controller using group policy. Using gpedit.msc, I enabled the ‘Network Security: Restrict NTLM: Audit Incoming NTLM Traffic’ and  ‘Network Security: Restrict NTLM: Audit NTLM authentication in this domain’ policies [under Windows Settings, Security Settings, Local Policies, Security Options]:

image

Interesting it showed that there was unexpected NTLM traffic – from the AppFabric services to the SQL Server. The MSSQLService was set-up to run as a domain account, service.sql, but the appropriate SPN had not been mapped to that account:

> setspn –a MSSQLSvc/SqlServer2012.expert.local:1433 service.sql

> setspn –a MSSQLSvc/SqlServer2012:1433 service.sql

I mapped both the FQDN and the NETBIOS name formats just to be sure. This resolved the issue and I no longer saw NTLM traffic.

image

What Next?

At this point I thought the environment was configured as it should be but I was still seeing the additional 401s. After a lot of searching and head scratching I came across this post from Fiddler author, Eric Lawrence. The rub being:

Keep-Alive

In some cases, the time required to open a new network connection to the server is greater than the time required to send the request and download the response. Therefore, if the client opens a new connection for every request, the application’s performance is greatly degraded. The practice of reusing a single TCP/IP connection for multiple requests is called “keep-alive” and it’s the default behaviour in HTTP/1.1. However, clients or servers may choose to disable keep-alive by either sending a Connection: close header or by abruptly closing the connection after each transaction.

Fiddler maintains a “connection pool” of idle keep-alive connections to the server. When the a client request comes in, this pool is first checked to determine if an existing connection is available on which the request can be sent. Even if the client specifies a Connection: close request header, that only causes Fiddler to close the client’s connection after the response is sent—the server connection is returned to the pool (unless it too disabled keep-alive).

What this means is that if your client isn’t using Keep-Alive connections, its performance can be severely impacted. However, when Fiddler is introduced, performance is improved because “expensive” server connections are reused.(Since Fiddler and the client are (typically) running on the same computer, establishing a new connection from the client to Fiddler is very fast.)

The fix for this problem is simple: Ensure that your client is using KeepAlive connections. That’s as simple as:

  1. Ensure that you’re using HTTP/1.1
  2. Ensure that you haven’t disabled Keep-Alive (e.g. set the KeepAlive property of the HTTPWebRequest object to true)
  3. Don’t send Connection: Close headers

Note that creating connections to servers can be even more expensive than the simple TCP/IP establishment cost. First, there’s TCP/IP Slow-Start, a congestion-management feature of the protocol that means that new connections have a slower transfer rate than longer-lived connections. Next, if you’re using HTTPS, there’s an expensive cryptographic handshake which must be performed on each new connection. Lastly, if your connections use either the NTLM or Negotiate authentication protocols, you may find that each new connection requires a 3-step handshake (e.g. the server sends a HTTP/401 challenge, the client resends the request, the server sends another HTTP/401 challenge, the client resends the request with a challenge-response, and the server finally sends a HTTP/200). Because these are “connection-oriented” authentication protocols, subsequent requests over an existing connection may be able to avoid these extra round-trips.

Here is the heisenbug, Fiddler is maintaining a Keep-Alive connection to the server even though my call may not be.

So how does this relate to the WCF service calls? For the basicHttpBinding, the Keep-Alive behaviour is enabled by default, it can optionally be turned off via a custom binding, see here.

Back to Basics

At this point I was still convinced I should not be seeing those additional 401s, so I decided to build a very simple secured WCF service and generate a proxy to the standard OData service we use.

Here is a WCF Service that simply says Hello to the calling Windows user.

image

WCF Configuration as follows:

image

Visual Studio created a service reference for me an I simply called the service a number of times: both reusing the proxy as well as closing the proxy and recreating it:

image

The link layer trace was as follows:

image

This was as expected, a single 401 but then 200s on subsequent calls. Kerberos was being used successfully and a transport level session was established! Just for completeness I could see the HTTP Keep-Alive header in the POST:

image

 

OK, on to the WCF Data Service. Again in Visual Studio I generated a service reference then:

image

This resulted in:

image

And the following trace:

image

At last here was the repeated 401/200 behaviour.

I checked for the Keep-Alive header in the request:

image

And looked for the Persistent-Auth header in the response:

image

Both present.

More head scratching.

More searching.

Then I posted this question to the Microsoft WCF Data Services forum.

While waiting for an answer, a colleague and I took at look at the System.Data.Services.Client.DataServiceContext base class for the generated context object. Working through that code, I came across the HttpWebRequest class which had a PreAuthenticate property which looked exactly what I wanted. A little more digging and then I found I could do this:

var context = new ExpertDbContext(…

context.Credentials = CredentialCache.DefaultNetworkCredentials;

context.SendingRequest2 += context_SendingRequest2;

 

static void context_SendingRequest2(object sender, SendingRequest2EventArgs e) {

((HttpWebRequestMessage)e.RequestMessage).HttpWebRequest.PreAuthenticate = true;

}

 

This was it!

Testing the code with this small change and the 401s were gone from the WCF Data Service traffic. Just as I was grabbing a celebratory cup of coffee, a colleague asked if I had seen the response to my question on the forum? I had not; it validated the above approach – Thank you Fred Bao.

 

Wrapping Up

This took about a week elapsed to work through, we’ve now updated our query service (OData) proxy to set the PreAuthenticate flag and can see improved system performance, particularly over constrained WAN connections. That Fiddler hid this really threw me, heisenbugs are really hard to dealt to.

 

WiXing Lyrical (Part 2)

Picking up from where we left off previously with the product.wxs file, next we come to the Media element.

<Media Id=”1″ Cabinet=”media1.cab” EmbedCab=”yes” />

This is the default media entry created by Votive. The files to be installed are constructed into a single cabinet file which is embedded within the MSI.

Specifying the Install Location

Following the media is the directory structure we want to install the application into. A set of directory elements are nested to describe the required structure.

<Directory Id=”TARGETDIR” Name=”SourceDir”>
  <Directory Id=”dirAderant” Name=”AderantExpert”>
    <Directory Id=”dirEnvironmentFolder” Name=”[EXPERTENVIRONMENTNAME]” >
      <Directory Id=”INSTALLLOCATION” Name=”ExpertAssistantInstaller”>
      <!–additional components go here –> 

      </Directory>
      <Directory Id=”ProgramMenuFolder”>
        <Directory Id=”ApplicationProgramsFolder” Name=”Aderant”>
        </Directory>
      </Directory>
    </Directory>
  </Directory>
</Directory>

The TARGETDIR is the root directory for the installation and by default is set to the SourceDir. We then set-up the directory structure underneath \AderantExpert\Environment\ExpertAssistantInstaller. The environment folder is set to the value in the EXPERTENVIRONMENTNAME property. The INSTALLLOCATION Id specifies where the files will be installed to. If you want to install into the Program Files folder, see here.

In addition to specifying the target location for the install files, a folder is added to the Program Menu folder for the current user, the ApplicationProgramsFolder reference is used later in the script when setting up the start menu items.

Updating an XML File

It is possible to use components to action additional steps and tie the KeyPath to the containing directory structure. The KeyPath is used by the installer to determine if an item exists so if the containing directory structure exists the actions do not run. In my sample the red comment above is a place holder for a couple of components similar to.

<Component Id=”cmpConfigEnvironmentName” Guid=”????” KeyPath=”yes”>
<util:XmlFile Id=”xmlConfigEnvironmentName”
Action=”setValue”
ElementPath=”/configuration/appSettings/add[\[]@key=’EnvironmentName'[\]]/@value”
File=”[INSTALLLOCATION]\ExpertAssistantCO.exe.config”
Value=”[EXPERTENVIRONMENTNAME]”
/>
</Component>

The component is responsible for updating the ExpertAssistant.exe.config Xml file with a property from the MSI. The util extension library provides a XmlFile function which can read and write to a specified Xml file. The element path is a formatted field and therefore, square brackets in the XPath must be escaped. We have three updates to the exe.config to make and so end up with three components, for ease of management these are then wrapped in a ComponentGroup:

<ComponentGroup Id=”ExpertAssistantCO.ConfigSettings”>
  <ComponentRef Id=”cmpConfigEnvironmentName”/>
  <ComponentRef Id=”cmpConfigExpertSharePath”/>
  <ComponentRef Id=”cmpConfigLocalInstallationPath”/>
</ComponentGroup>

Adding a Start Menu Item

A common requirement is to add a shortcut for the installed application to the Start Menu. There is an odd twist here as we are using a perUser install. The StartMenu item is unique to each user installing the software and therefore a registry key is required to track the KeyPath. The registry key must be in the HKEY_CURRENT_USER hive and a logical location is Software\VendorName\ApplicationName\.

<DirectoryRef Id=”ApplicationProgramsFolder”>
  <Component Id=”ApplicationShortcut” Guid=”????” >
    <Shortcut Id=”ApplicationStartMenuShortcut”
       Name=”Expert Assistant”
       Description=”Expert Assistant”
       Target=”[INSTALLLOCATION]\ExpertAssistantCO.exe”
       />
    <RegistryValue Root=”HKCU” Key=”Software\Aderant\ExpertAssistant_[EXPERTENVIRONMENTNAME]” Name=”installed” Type=”integer” Value=”1″ KeyPath=”yes”/>
    <RemoveFolder Id=”ApplicationProgramsFolder” On=”uninstall”/>
  </Component>
</DirectoryRef>

Along with the Shortcut, we also tie a RemoveFolder action to the registry KeyPath so that the folder containing the shortcut is removed during an uninstall.

Remove Custom Registry Key On Uninstall

It is possible to have specific actions occur only during an uninstall to clean up, we have this need to remove a registry key that maybe set-up by the application. To achieve this we schedule a Registry action to ‘removeKeyOnUninstall’. Again, this action is perUser and therefore tied to a KeyPath in the HKEY_CURRENT_USER registry hive.

<DirectoryRef Id=”ApplicationProgramsFolder”>
  <Component Id=”RemoveRunRegistryEntry” Guid=”????”>
    <RegistryValue Root=”HKCU” Key=”Software\Microsoft\Windows\CurrentVersion\Run” Name=”ADERANT.ExpertAssistant” KeyPath=”yes” Type=”string” Value=””/>
    <Registry Action=”removeKeyOnUninstall” Root=”HKCU” Key=”Software\Microsoft\Windows\CurrentVersion\Run\ADERANT.ExpertAssistant” />
  </Component>
</DirectoryRef>

Launch an Exe after Installation

After the installation or repair of the MSI is complete, we’d like the MSI to run the executable that we’ve just installed on to the machine. To do this we need to invoke a custom action.

<CustomAction Id=”LaunchApplication” BinaryKey=”WixCA” DllEntry=”WixShellExec” Impersonate=”yes” />

The custom action executes the command held in the WixShellExecTarget property that we specified near the beginning of the wxs file. The custom action then needs to be scheduled to run after InstallFinalize:

<InstallExecuteSequence>
  <Custom Action=”LaunchApplication” After=”InstallFinalize” >NOT(REMOVE ~=”ALL”)</Custom>
</InstallExecuteSequence>

In our case we don’t want the action to execute if we are removing the software, only on install and repair. Therefore we specify the condition ‘NOT(REMOVE ~=”ALL”)’, more details can be found here.

NOTE: Before setting this condition, the entry in the Add Remove Programs control panel would not automatically be deleted on uninstall, it would only disappear after a manual refresh. If the uninstall process returns a code other than 0, an error has occurred and so the refresh is not triggered. To see this was the case, I enabled verbose logging via msiexec and removed the MSI using the command line. The log showed that the custom action was failing because the path didn’t exist – because we had just removed it. The non-zero return code was logged.

Pulling It All Together

The final part of the script declares a feature – an installable unit. In our case we have a single feature which installs everything.

    <Feature Id=”ExpertAssistant”
             Title=”Expert Assistant”
             Level=”1″>
      <ComponentGroupRef Id=”ExpertAssistantCO.Binaries” />
      <ComponentGroupRef Id=”ExpertAssistantCO.Content” />
      <ComponentGroupRef Id=”ExpertAssistantCO.ConfigSettings” />
      <ComponentRef Id=”ApplicationShortcut” />
      <ComponentRef Id=”RemoveRunRegistryEntry” />
    </Feature>
  </Product>
</Wix>

Here the various component groups we’ve declared come into play, rather than listing out every feature individually we can reference the logical component group.

And we are done. We don’t require a UI for our installer and so I’ve not looked into that in any depth. WiX does fully support defining an installation wizard UI and even supports custom UI.

The last piece of the WiX tooling is the Deployment Foundation Toolkit which I’ll save for the next post.

Full product.wxs script for reference (with the GUIDs taken out) to close…

<?xml version=”1.0″ encoding=”UTF-8″?>
<Wix xmlns=”
http://schemas.microsoft.com/wix/2006/wi” xmlns:util=”http://schemas.microsoft.com/wix/UtilExtension” RequiredVersion=”3.5.0.0″>
  <Product Id=”?”
           Name=”ExpertAssistant”
           Language=”1033″
           Codepage=”1252″
           Version=”8.0.0.0″
           Manufacturer=”Aderant”
           UpgradeCode=”?”>
    <Package InstallerVersion=”200″
             Compressed=”yes”
             Manufacturer=”Aderant”
             Description=”Expert Assistant Installer”
             InstallScope=”perUser”
    />

    <Property Id=”EXPERTSHAREPATH” Value=”\\MyShare\ExpertShare” />
    <Property Id=”EXPERTENVIRONMENTNAME” Value=”MyEnvironment” />
    <Property Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\Environment\Applications” />
    <Property Id=”ARPPRODUCTICON” Value=”icon.ico” />
    <Property Id=”ARPNOMODIFY” Value=”1″ />
    <Property Id=”WixShellExecTarget” Value=”[#filExpertAssistantCOexe]”/>

    <SetProperty Id=”dirEnvironmentFolder” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]” After=”CostInitialize”/>
    <SetProperty Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]\Applications” After=”CostInitialize” />
   
    <Property Id=”EnableUserControl” Value=”1″ />

    <Icon Id=”icon.ico” SourceFile=”$(var.ExpertAssistantCO.TargetDir)\Expert_Assistant_Icon.ico”/>

    <Media Id=”1″ Cabinet=”media1.cab” EmbedCab=”yes” />

    <Directory Id=”TARGETDIR” Name=”SourceDir”>
      <Directory Id=”dirAderant” Name=”AderantExpert”>
        <Directory Id=”dirEnvironmentFolder” Name=”[EXPERTENVIRONMENTNAME]” >
          <Directory Id=”INSTALLLOCATION” Name=”ExpertAssistantInstaller”>
            <Component Id=”cmpConfigEnvironmentName” Guid=”?” KeyPath=”yes”>
              <util:XmlFile Id=”xmlConfigEnvironmentName”
                            Action=”setValue”
                            ElementPath=”/configuration/appSettings/add[\[]@key=’EnvironmentName'[\]]/@value”
                            File=”[INSTALLLOCATION]\ExpertAssistantCO.exe.config”
                            Value=”[EXPERTENVIRONMENTNAME]”
                            />
            </Component>
            <Component Id=”cmpConfigExpertSharePath” Guid=”?” KeyPath=”yes”>
              <util:XmlFile Id=”xmlConfigExpertSharePath”
                            Action=”setValue”
                            ElementPath=”/configuration/appSettings/add[\[]@key=’ExpertSharePath'[\]]/@value”
                            File=”[INSTALLLOCATION]\ExpertAssistantCO.exe.config”
                            Value=”[EXPERTSHAREPATH]”
                            />
            </Component>
            <Component Id=”cmpConfigLocalInstallationPath” Guid=”?” KeyPath=”yes”>
              <util:XmlFile Id=”xmlConfigLocalInstallationPath”
                            Action=”setValue”
                            ElementPath=”/configuration/appSettings/add[\[]@key=’LocalInstallationPath'[\]]/@value”
                            File=”[INSTALLLOCATION]\ExpertAssistantCO.exe.config”
                            Value=”[EXPERTLOCALINSTALLPATH]”
                            />
            </Component>
          </Directory>
          <Directory Id=”ProgramMenuFolder”>
            <Directory Id=”ApplicationProgramsFolder” Name=”Aderant”>
            </Directory>
          </Directory>
        </Directory>
      </Directory>
    </Directory>

    <ComponentGroup Id=”ExpertAssistantCO.ConfigSettings”>
      <ComponentRef Id=”cmpConfigEnvironmentName”/>
      <ComponentRef Id=”cmpConfigExpertSharePath”/>
      <ComponentRef Id=”cmpConfigLocalInstallationPath”/>
    </ComponentGroup>

    <DirectoryRef Id=”ApplicationProgramsFolder”>
      <Component Id=”ApplicationShortcut” Guid=”?” >
        <Shortcut Id=”ApplicationStartMenuShortcut”
           Name=”Expert Assistant”
           Description=”Expert Assistant”
           Target=”[INSTALLLOCATION]\ExpertAssistantCO.exe”
           />
        <RegistryValue Root=”HKCU” Key=”Software\Aderant\ExpertAssistant_[EXPERTENVIRONMENTNAME]” Name=”installed” Type=”integer” Value=”1″ KeyPath=”yes”/>
        <RemoveFolder Id=”ApplicationProgramsFolder” On=”uninstall”/>
      </Component>
    </DirectoryRef>

    <DirectoryRef Id=”ApplicationProgramsFolder”>
      <Component Id=”RemoveRunRegistryEntry” Guid=”?”>
        <RegistryValue Root=”HKCU” Key=”Software\Microsoft\Windows\CurrentVersion\Run” Name=”ADERANT.ExpertAssistant” KeyPath=”yes” Type=”string” Value=””/>
        <Registry Action=”removeKeyOnUninstall” Root=”HKCU” Key=”Software\Microsoft\Windows\CurrentVersion\Run\ADERANT.ExpertAssistant” />
      </Component>
    </DirectoryRef>
   
    <CustomAction Id=”LaunchApplication” BinaryKey=”WixCA” DllEntry=”WixShellExec” Impersonate=”yes” />
    <InstallExecuteSequence>
      <Custom Action=”LaunchApplication” After=”InstallFinalize” >NOT(REMOVE ~=”ALL”)</Custom>
    </InstallExecuteSequence>

    <Feature Id=”ExpertAssistant”
             Title=”Expert Assistant”
             Level=”1″>
      <ComponentGroupRef Id=”ExpertAssistantCO.Binaries” />
      <ComponentGroupRef Id=”ExpertAssistantCO.Content” />
      <ComponentGroupRef Id=”ExpertAssistantCO.ConfigSettings” />
      <ComponentRef Id=”ApplicationShortcut” />
      <ComponentRef Id=”RemoveRunRegistryEntry” />
    </Feature>
  </Product>
</Wix>

WiXing Lyrical (part 1)

Continuing on from the previous post, it’s time to take a look at the customization requirements that brought about the creation of a custom bootstrap process for our desktop installations.

The customization capabilities within the Expert product are extensive, they support changes to the  domain model, business process and user interface. Many of these changes result in the need to deploy custom assemblies to the workstations running the Expert applications. If the out-of-the-box ClickOnce manifests were used to manage these changes, they would need to be updated by the customization process. Instead of doing this, we chose a solution similar to Google Chrome and created our own bootstrap mechanism to manage updating the client software. The ClickOnce infrastructure is used to ‘install an installer’. I’m not going to drill into the bootstrapper, Pete has already discussed some of the performance aspects here. Instead we’ll walk through the process of creating an MSI to replace the ClickOnce based installation.

This was not my first MSI authoring, in the past I’d been exposed to InstallShield, Wise and WiX, but I hadn’t done anything in the area for around 4-5 years. The last installer I wrote was using Windows Installer Xml (WiX) when it had just been publicly released from Microsoft and the memory was not a pleasant one. The good news is that in the intervening years, my biggest issue with WiX has been resolved – there is now good documentation and a healthy community supporting it. Rather than waiting until the end to list a couple of resources, here are the main references I used:

The first thing to note is that WiX is a free, open source toolkit that is fully supported by Microsoft. It does not ship with any Visual Studio version and must be downloaded. The current version, and the version I used, is v3.5 though there is a v3.6 RC0 available. The actual download of the bits is available from CodePlex here, the SourceForge site links to this.

There are three components in the installer:

  1. WiX – command line tools, Xml schemas, extensions
  2. Votive – a Visual Studio plug-in
  3. Deployment Tools Foundation (DTF) – a managed library for programming against MSIs.

In addition to the WiX Toolkit, another tool to have is Orca which is available in the Windows SDK. Orca is an editor for the MSI database format, allowing an MSI to be easily inspected and edited.

The WiX toolset is summarized concisely by the following diagram (taken from http://wix.sourceforge.net/coretoolset.html)

clip_image001

While it is possible to work directly from the command line, the Visual Studio integration is a compelling option. While the Extension Manager Online gallery contains a WiX download:

image

This did not work for me. Instead I downloaded and installed the MSI from the CodePlex site.

Votive, the Visual Studio plug-in, encourages making the installer part of your solution. It adds a number of project types to the product:

image

You can add the set-up project alongside the project containing the source code and maintain the whole solution together – installation should not be a last minute scramble.

image

Getting started is straightforward, you just add a reference in your set-up project to the project containing the application that you want to install. In our case, we want to install the bootstrapper contained in the ExpertAssistantCO project:

image

The set-up project is configured out of the box to generate a couple of WXS files for you based on the VS project file. The command line tool HEAT can create a WXS file from a number of different sources including a directory, VS project or an existing MSI. To enable HEAT, set the Harvest property to true and this will re-create the WXS files on each build based upon the project file. By simply adding a project reference to the set-up project you will have a MSI on the next solution compile. The intermediary files can be found by choosing to show all files in the Solution Explorer:

image

The obj/Debug subfolder will contain the generated wxs file and the wixobj files compiled from them. The bin\Debug subfolder is the default location for the MSI. The two files of interest are the ExpertAssisantCO.wxs, which contains the files from the referenced ExpertAssistantCO project and the Product.wxs which contains configuration information for the installation process. Building the project will invoke the WiX tooling: candle, preprocesser that transforms .wix into .wixobj, and light, processes wixobj files to create an MSI.

Of course, the out-of-the-box experience can only go so far and so the generated WXS likely needs to be augmented. Our approach was to use HEAT (via the Harvest option) to generate the initial WXS files and then the generation was disabled. The ExpertAssistantCO.wxs file was moved into the project for manual editing and the product.wxs file contains the bulk of the custom code.

The ExpertAssistantCO.wxs contains the list of files involved in the project, this includes source files, built files and documentation. Each file is wrapped in a separate Fragment and given a unique component Id:

<Fragment>
<DirectoryRef Id=”INSTALLLOCATION”>
<Component Id=”cmpExpertAssistantCOexe”
Guid=”????????-????-????-????-????????????” KeyPath=”yes”>
<File Id=”filExpertAssistantCOexe” Source=”$(var.ExpertAssistantCO.TargetDir)\ExpertAssistantCO.exe” />
</Component>
</DirectoryRef>
</Fragment>

The generated code has been changed to specify an explicit GUID and to set the KeyPath attribute. The actual GUID has been replaced with ?s, the KeyPath attribute is used to determine if the component already exists – more here. The shouting INSTALLLOCATION is an example of a public property, in the MSI world public properties are declared in full uppercase. The directory the file will be installed into is declared in the product.wxs file and referenced here. The $(var.ExpertAssistantCO.TargetDir) demonstrates a pre-processor directive that allows VS solution properties to be accessed, in this case to determine the source location of the file.

Components can be grouped together to provide more manageable units, for example:

<Fragment>
<ComponentGroup Id=”ExpertAssistantCO.Binaries”>
<ComponentRef Id=”cmpExpertAssistantCOexe” />
<ComponentRef Id=”cmpICSharpCodeZipLib” />
<ComponentRef Id=”cmpAderantDeploymentClient”/>
</ComponentGroup>
</Fragment>

The generated referencedProject.wxs file contains each file declared within a component and then logical groupings for the components. The more interesting aspects of WiX belong to the product.wix file. This is where registry keys, short cuts, remove actions and other items are set-up.

On to the product.wix then and the first line:

<Wix xmlns=”http://schemas.microsoft.com/wix/2006/wi”
xmlns:util=”
http://schemas.microsoft.com/wix/UtilExtension”
RequiredVersion=”3.5.0.0″>

Here we are referencing two namespaces, the default namespace is the standard WiX schema and the second util namespace enables the use of a WiX extension library. Extension libraries contain additional functionality usually grouped by a common thread. The library needs to be added as a project reference:

image

Available extension libraries can be found in C:\Program Files (x86)\Windows Installer XML v3.5\bin :

image

A drill-down into the various extension schemas is provided here. The utility extension referenced above allows Xml file manipulation which we will see later. The requiredVersion sets the version of WiX we are depending on to compile the file.

Next we define the product:

<Product Id=”????????-????-????-????-????????????”
Name=”ExpertAssistant”
Language=”1033″
Codepage=”1252″
Version=”8.0.0.0″
Manufacturer=”Aderant”
UpgradeCode=”????????-????-????-????-????????????”>

The GUIDs are used to uniquely identify the product installer and so you want to ensure you create a valid GUID using tooling such as GuidGen.exe. Following the product, we define the package:

<Package InstallerVersion=”200″
Compressed=”yes”
Manufacturer=”Aderant”
Description=”Expert Assistant Installer”
InstallScope=”perUser”
/>

The attribute worth calling out here is the InstallScope. This can be set to perMachine or perUser, setting to perMachine requires elevated privileges to install. We want to offer an install to all users without requiring elevated privilege and so have a per user install. This has implications later on when we have to set-up user specific items such as Start Menu shortcuts.

Next we move onto properties:

<!– Properties –>
<Property Id=”EXPERTSHAREPATH” Value=”\\MyShare\ExpertShare” />
<Property Id=”EXPERTENVIRONMENTNAME” Value=”MyEnvironment” />
<Property Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\” />

<Property Id=”ARPPRODUCTICON” Value=”icon.ico” />
<Property Id=”ARPNOMODIFY” Value=”1″ />
<Property Id=”WixShellExecTarget” Value=”[#filExpertAssistantCOexe]”/>

The first three properties are custom public properties defined for this installer, public properties must be declared all upper case. A public property can be provided on the msiexec command line or via an MST file to customize the property value.

The properties with a prefix of ARP relate to the Add Remove Programs control panel, now Programs and Features in Windows 7. The ARPPRODUCTICON is used to set the icon that appears in the installed programs list. The ARPNOMODIFY property removes the Change option from the control panel options:

image

In contrast, Visual Studio SP1 supports the Change option:

image

Other ARP properties can be found here.

The final property WixShellExecTarget specifies the file to be executed when the install completes. This is a required parameter of a custom action that we will come to later. The [#filExpertAssistantCOexe] is a reference to a file declared in the ExpertAssistantCO.wxs file.

<File Id=”filExpertAssistantCOexe” Source=”$(var.ExpertAssistantCO.TargetDir)\ExpertAssistantCO.exe” />

Next up we come to one of the areas that stumped me for a while, how to set a property value. Some attributes can directly reference a property by surrounding the property name in [] and have the value swapped in, e.g.

<Directory Id=”dirEnvironmentFolder” Name=”[EXPERTENVIRONMENTNAME]” >

However this is not supported when setting properties, therefore the following does not result in substitution:

<Property Id=”Composite” Value=”[PROPERTY1] and [PROPERTY2]” />

Instead the SetProperty element is used:

<!– Set-up environment specific properties–>
<SetProperty Id=”dirEnvironmentFolder” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]” After=”CostInitialize”/>
<SetProperty Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]\Applications” After=”CostInitialize” />

When setting the property, the appropriate time to perform the action needs to be set using either the Before or After attribute. This injects the action into the appropriate place in the list of actions to perform. To determine the order, I used Orca to view the InstallExecuteSequence:

image

A final property that is set is the EnableUserControl, which allows the installer to pass all public properties to the server side during a managed install.

<Property Id=”EnableUserControl” Value=”1″ />

Note: the preferred approach is to set the Secure attribute individually to Yes on each property declaration that supports user control (I have only just learnt this while writing up the posting).

A final element for this post is Icon which specifies an icon file.

<Icon Id=”icon.ico” SourceFile=”$(var.ExpertAssistantCO.TargetDir)\Expert_Assistant_Icon.ico”/>

The icon Id was used by the ARPPRODUCTICON to set the icon seen in the install programs control panel.

There’s more to come, however this post has become long enough in its own right. In the next post I’ll drill into setting registry keys, determining the install location, Start Menu settings and more.

So far we’ve walked through the following WiX code:

<?xml version=”1.0″ encoding=”UTF-8″?>
<Wix xmlns=”
http://schemas.microsoft.com/wix/2006/wi” xmlns:util=”http://schemas.microsoft.com/wix/UtilExtension” RequiredVersion=”3.5.0.0″>
<Product Id=”????????-????-????-????-????????????”
Name=”ExpertAssistant”
Language=”1033″
Codepage=”1252″
Version=”8.0.0.0″
Manufacturer=”Aderant”
UpgradeCode=”????????-????-????-????-????????????”>
<Package InstallerVersion=”200″
Compressed=”yes”
Manufacturer=”Aderant”
Description=”Expert Assistant Installer”
InstallScope=”perUser”
/>

    <!– Public Properties –>
<Property Id=”EXPERTSHAREPATH” Value=”\\MyShare\ExpertShare” />
<Property Id=”EXPERTENVIRONMENTNAME” Value=”MyEnvironment” />
<Property Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\Environment\Applications” />
<Property Id=”ARPPRODUCTICON” Value=”icon.ico” />
<Property Id=”ARPNOMODIFY” Value=”1″ />
<Property Id=”WixShellExecTarget” Value=”[#filExpertAssistantCOexe]”/>

    <!– Set-up environment specific properties–>
<SetProperty Id=”dirEnvironmentFolder” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]” After=”CostInitialize”/>
<SetProperty Id=”EXPERTLOCALINSTALLPATH” Value=”C:\AderantExpert\[EXPERTENVIRONMENTNAME]\Applications” After=”CostInitialize” />

<!– All users to access the properties, not just elevated users–>
<Property Id=”EnableUserControl” Value=”1″ />

    <!– Icons –>
<Icon Id=”icon.ico” SourceFile=”$(var.ExpertAssistantCO.TargetDir)\Expert_Assistant_Icon.ico”/>

Deploying Desktop Applications

Over the last couple of weeks I’ve been re-examining the desktop deployment strategy we have for Aderant applications. This installer navel gazing has resulted in augmenting our existing ClickOnce based approach with an MSI. The next couple of blog posts will drill into my ClickOnce headaches and my embracing of WiX to create MSIs.

Desktop Deployment Challenges

During the first steps into the new millennium, rich/smart/fat client applications which install binary files onto the local host fell out of fashion in the face of the centralized web deployment model. Deploying desktop applications was a delicate process: prior to the world of managed code, COM applications roamed the world and brought with them the plague of ‘DLL Hell’. Installing a new piece of software into a Windows environment could very well result in previously stable applications breaking, registries full of COM ClassIDs could become corrupt and the use of regedit and regsvr32 became almost accepted mainstream tools. The Microsoft .NET framework worked very hard overcome the fragility of the COM world but introduces its own complexities: managing versions, the global assembly cache, shadow copies and so on. Regardless of the technology, the software must be rolled out to each individual desktop and managed on each desktop. Installer technology supported the concept of repair but did not support automatic upgrades; additionally the installer often required administrator privileges to install.

A Mexican stand-off ensued: desktop applications allowed richer user experiences, a simpler programming model and the ability to run independently of a network connection; web applications have a much simpler deployment model and updates.

Of course, it is quite natural to want the best of both worlds and so various approaches have evolved to try to satisfy this, one of these approaches is ClickOnce.

Microsoft ClickOnce

As part of the .NET v2 framework, Microsoft introduced a new deployment option for desktop applications called ClickOnce. The goals of ClickOnce are to solve the following three major issues:

  1. Difficulty updating a deployed application
  2. Impact of the application on the users’ desktop
  3. Administrator privileges required to install apps

Conceptually, ClickOnce provides a central deployment model for desktop applications which does not require the installing user to have elevated privileges. It supports both the automatic repair and upgrading of applications and supports the use of an HTTP channel to distribute the installer. This last point is a game changer – desktop applications could now be installed and updated over HTTP on port 80, not requiring any additional firewall exceptions. This allows ClickOnce to be used to deploy applications over the internet as well as a corporate network. A couple of high profile, non-Microsoft examples of ClickOnce adoption demonstrates an industry acceptance: Google Chrome and Join.Me.

Corporate vs. Internet Installation

The ability to distribute software over the internet is something of a double edged sword should you choose to use ClickOnce to distribute software within a corporate network. The internet is untrustworthy, therefore mechanisms are required within ClickOnce to establish trust. These mechanisms are on by default and some cannot be switched off. To discuss some of these first requires an understanding of how an application is deployed via ClickOnce.

A ClickOnce deployment is basically managed by two XML files called manifests. There is an application manifest (.manifest) and a publisher manifest (.application). There’s the first annoyance – the application manifest does not have the .application file extension, which is so confusing. The application manifest contains a description of the files to be deployed and other settings that are controlled by the author of the application. The publisher manifest points to the application manifest, and contains information that is pertinent to the firm installing the application such as the publisher Url, update policy and the minimum required version. Each manifest is by default tamper proof, the manifests include a hash so that they cannot be altered without the use of appropriate tools such as the Manifest Generation and Editing Tool(mage.exe) – note: changing the application manifest invalidates the publisher manifest. It is possible to remove the hash from individual files within the manifest. The tamper proofing is only part of the trust story, you also need to be able to trust that the install comes from a trusted application vendor and that is was installed from a trusted publisher. Identity on the internet is established using certificates and so both of the manifests can be signed using a Code Signing certificate as evidence.

Time for the second annoyance… to install a ClickOnce application silently requires that both the manifests are signed using a certificate that is trusted as a software publisher by the machine undertaking the installation. This is a serious pain for ISVs. In our case, we can signed the application manifest with our company certificate and indeed should do so. However, if one of the manifests is signed then the other must also be signed, therefore the publisher manifest must be signed. The publisher of the application is the firm installing our software and so they must have a code signing certificate and ensure that the certificate is trusted by each desktop machine. If you choose to sign neither manifest, the installing user will see a security dialog window asking them if they trust the publisher and location – most corporate users will not know what they are being asked.

An additional step in the trust process is ensuring that the publisher URL belongs to the Trusted Site  (or Local Intranet Site) security zone of IE.

All of this protection makes sense for internet deployments but less so for intranet deployments within a corporate network which is implicitly trusted.

While the silent install is possible, if somewhat complex, a silent remove is not supported, the user will always be prompted to confirm the removal of the application. This single anomaly causes a great deal of frustration within the firms we provide software too. It is possible to achieve an automated uninstall but it requires the use of some nasty automation code:

On Error Resume Next
Set objShell = WScript.CreateObject(“WScript.Shell”)
objShell.Run “taskkill /f /im [your app process name]*”
objShell.Run “[your app uninstall key]”
Do Until Success = True
Success = objShell.AppActivate(“[your window title]”)
Wscript.Sleep 200
Loop
objShell.SendKeys “OK”

This VBScript snippet was taken from:http://social.msdn.microsoft.com/Forums/en-US/winformssetup/thread/51a44139-2477-4ebb-8567-9189063cf340/. The app uninstall key will be found in the registry:

image

Why am I so down on this? ClickOnce is a pull installation, it is expected that a user will choose to install the software. This is not so often the case within a corporate environment with a locked down desktop – applications are pushed to users using management software such as System Center Configuration Manager or via Active Directory group policies.

Additional Headaches

Beyond the heartache to perform a silent install and remove of the application, there are some other pain points of note.

ClickOnce and Citrix don’t mix well. ClickOnce permits installs by a non-privileged user, to do so the application is installed into the users local profile, this means on a Citrix environment each user has a separate install of the software taking up disk space.

The path of the application is a monster too, for example the Join.Me installer placed the files into:

C:\Users\username\AppData\Local\Apps\2.0\1A6OWNZM.ETC\EEWH8699.6PL\join..tion_43a0dbe7f0f75062_0001.0000_9871fcdc8aa605d7

If you have either long filenames or a deep file hierarchy, you can run out of characters for your file paths (260 character limit).

ClickOnce does not provide any extensibility points to run custom pre or post deployment actions.

In our particular case, we need to deploy assemblies to the application after the initial installation due to on-site firm customizations. ClickOnce does have a self-updating capability but we needed something more dynamic – more on this later…

In Defence of ClickOnce

This has been a rather negative look at ClickOnce so far so I do feel the need to provide some positive balance. ClickOnce is great at deploying apps over HTTP, it provides a self-repair and self-updating installer, it allows non-admin users to install software safely, it goes to great lengths to ensure the software you install can be trusted.

The case for ClickOnce was strong enough that we’ve used it for the Aderant Golden Gate desktop applications. The negatives are from unusual customization scenarios and field experience within corporate networks which has lead to the investigation into a MSI centric approach.

If you are pursuing a ClickOnce installer and need to dig a little deeper than the VS2010 template, I  strongly recommend getting Brian Noyes book: http://www.amazon.com/Smart-Client-Deployment-ClickOnce-Applications/dp/0321197690.

In the next post, I’ll discuss our customization requirements that lead to using ClickOnce to install a bootstrapper.

Co-ordinating deployments using the Parallel class in .NET 4.0

It’s been a long time since the last entry, the new year brings with it a fresh post based on some of the deployment work I’ve been looking at recently. This work has opened my eyes to the support for parallel co-ordination of work within .NET 4…

Recently I’ve been looking at the deployment approach we have for our services with an eye to reducing the time it takes for a full deployment. There are two simple concepts that leapt out: the first is to use a pull rather than a push model; the second is to deploy to all of the servers in parallel. This second point becomes increasing important as more servers get involved in hosting the services.

Pull versus Push
One of the most basic operations performed by the deployment engine is the copying of files to the application servers that host the various services within our product. The file copying was originally implemented as a push: the deployment agent performs the copy to the target server using an administration share, e.g. \\appserver01.domain.com\d$\AderantExpert\Live\ . This requires the deployment engine to run with administrator privilege on the remote machines which is not ideal.

An alternative is to send a script to the target server containing the copy the commands, the target server is then responsible for pulling the file to its local storage from a network share (which can be secured appropriately). The deployment engine is responsible for creating the script from the deployment model and co-ordinating the execution of the scripts across the various application servers.

PowerShell remoting is a great option for the remote execution of scripts and it’s quite straight forward to transform an object model into a PowerShell script using LINQ. I created a small script library class that provides common functions, for example:

internal class PowerShellScriptLibrary {
    internal static void ImportModules(StringBuilder script) {
    script.AppendLine("import-module WebAdministration");
    script.AppendLine("import-module ApplicationServer");
}

internal static void StopWindowsServices(string filter, StringBuilder script) {
    script.AppendLine("# Stop Windows Services");
    script.AppendLine(string.Format("Stop-Service {0}", filter));
}

internal static void CreateTargetDirectories(string rootPath, IEnumerable fileSpecifications, StringBuilder script) {
    script.AppendLine("# Create the required folder structure");
    fileSpecifications
        .Where(spec => !string.IsNullOrWhiteSpace(spec.TargetFile.TargetRelativePath))
        .Select(x => x.TargetFile)
        .Distinct()
        .ToList()
        .ForEach(targetFile => {
            string path = Path.Combine(rootPath, targetFile.TargetRelativePath);
            script.AppendLine(string.Format("if(-not(Test-Path '{0}'))", path));
            script.AppendLine("{");
            script.AppendLine(string.Format("\tNew-Item '{0}' -ItemType directory", path));
            script.AppendLine("}");
        });
 }


The library is then used to create the required script by calling the various functions, the examples below are for the patching approach that allows updates to be installed without requiring a full remove and redeploy:

private string GenerateInstallScriptForPatch(Server server, IEnumerable filesToDeploy, Environment environment, string patchFolder) {
    StringBuilder powershellScript = new StringBuilder();

    PowerShellScriptLibrary.ImportModules(powershellScript);
    PowerShellScriptLibrary.StopWindowsServices("ADERANT*", powershellScript);
    PowerShellScriptLibrary.StopAppFabricServices(environment, powershellScript);
    PowerShellScriptLibrary.CreateTargetDirectories(server.ExpertPath, filesToDeploy, powershellScript);
    PowerShellScriptLibrary.CreatePatchRollback(server, patchFolder, filesToDeploy, powershellScript);
    PowerShellScriptLibrary.CopyFilesFromSourceToServer(environment, server, filesToDeploy, powershellScript);
    PowerShellScriptLibrary.UpdateFactoryBinFromExpertShare(server, environment.NetworkSharePath, powershellScript);
    PowerShellScriptLibrary.StartAppFabricServices(environment, powershellScript);
    PowerShellScriptLibrary.StartWindowsServices("ADERANT*", powershellScript);

    return powershellScript.ToString();
}

Though it is possible to treat NTFS as a transactional system (see http://msdn.microsoft.com/en-us/library/bb968806(v=VS.85).aspx ), and therefore have it participate in atomic actions, I didn’t walk this path. Instead I chose the compensation route and so when the model is transformed into a script I create both an install script and a compensate script which is executed in the event of anything going wrong.

private string GenerateRollbackScriptForPatch(Server server, IEnumerable filesToDeploy, Environment environment, string patchFolder) {
    StringBuilder powershellScript = new StringBuilder();

    PowerShellScriptLibrary.ImportModules(powershellScript);
    PowerShellScriptLibrary.StopWindowsServices("ADERANT*", powershellScript);
    PowerShellScriptLibrary.StopAppFabricServices(environment, powershellScript);
    PowerShellScriptLibrary.RollbackPatchedFiles(server, patchFolder, filesToDeploy, powershellScript);
    PowerShellScriptLibrary.StartAppFabricServices(environment, powershellScript);
    PowerShellScriptLibrary.StartWindowsServices("ADERANT*", powershellScript);

    return powershellScript.ToString();
}

The scripts simply take a copy of the existing files that will be replaced before replacing them with the new versions. If anything goes wrong during the patch install, the compensating script is executed to restore the previous files.

Given that a server specific script is now generated per application server, because different servers host different roles and therefore require different files, the deployment engine has the opportunity to pass the script to the server; ask it to execute it and then wait for the OK from each server. If one server has an error then all can have the compensation script executed as required.

Parallelizing a deployment
Before looking at ome co-ordination code for the deployment engine, I want to explicitly note that there are two different and often confused concepts:
• Asynchronous execution
• Parallel execution

An asynchronous execution involves a call to begin a method and then a callback from that method when the work is complete. IO operations are natural candidates for asynchronous calls to ensure that the calling thread is not blocked waiting on the IO to complete. Single threaded frameworks such as UI are the most common place to see a push for asynchronous programming. In .NET 3, the Windows Workflow Foundation provided an excellent asynchronous programming model where asynchronous activities are co-ordinated by a single scheduler thread. It is bad practice to have this scheduler thread block or perform long running operations as it stalls the workflow progress when in a parallel activity. It is better to schedule multiple asynchronous activities in parallel when possible and have these execute on separate worker threads.

Parallel execution involves breaking a problem into small parts that can be executed in parallel due to the multi-core nature of todays CPUs. Rather than having a single core work towards an answer, many cores can participate in the calculation. To reduce the elapsed time, the time experienced by the end user, of a calculation, it may be possible to execute a LINQ query over all available cores (typically 2, 4 or 8). Linq now has the .AsParallel() extension method which can be applied to queries to enable parallel execution of the query. Of course, profiling is required to determine if the query performs better in parallel for typical data sets.

.NET 4 added the Task Parallel Library into the core runtime. This library adds numerous classes to the BCL to make parallel programming and the writing of co-ordination logic much simpler. In particular the Parallel class can be used to easily schedule multiple threads of work. For example:

Parallel.Invoke(
    () => Parallel.ForEach(updateMap, server =>
        serverInstallationScripts.Add(server.Key, GenerateInstallScriptForPatch(server.Key, server.Value, environment, patchFolder))),
    () => Parallel.ForEach(updateMap, server =>
        serverRollbackScripts.Add(server.Key, GenerateRollbackScriptForPatch(server.Key, server.Value, environment, patchFolder)))
);

The above code is responsible for creating the install and compensate PowerShell scripts from the deployment model discussed above. There are two levels of parallelism going on here. First the generation of the install and compensate scripts are scheduled at the same time using a Parallel.Invoke() call. Then a Parallel.ForEach() is used to generate the required script for each application server defined in the environment in parallel. The runtime is responsible for figuring out how best to achieve this, as a programmer we simply declare what we want to happen. In the above code the updateMap is an IDictionary<server, IList>, this is a list of files to deploy to each server keyed on the server.

I was simply blown away by how simple and yet how powerful this programming model is.

PowerShell Part 2 – Installing a new service

Following on from the brief introduction to PowerShell, let’s walk through the installation script…

The script installs a simple Magic Eight Ball service that will return a pseudo-random answer to any question it’s given. The service is written as a WCF service in C#, the files to deploy are available from http://public.me.com/stefsewell/ , have a look in TechEd2010/DEV306-WindowsServerAppFabric/InstallationSource. The folder contains a web.config to set up the service activation and a bin folder with the service implementation. The PowerShell scripts are also available from the file share, look in Powershell folder in DEV306…

Pre-requisite Checking

The script begins by checking a couple of pre-requisites. If any of these checks fail then we do not attempt to install the service, instead the installing admin is told of the failed checks. There are a number of different checks we can make, in this script we check the OS version, that dependent services are installed and that the correct version of the .NET framework is available.

First we need a variable to hold whether or not we have a failure:

$failedPrereqs = $false

Next we move on to our first check: that the correct version of Windows being used:

$OSVersion = Get-WmiObject Win32_OperatingSystem
if(-not $OSVersion.Version.StartsWith('6.1')) {
    Write-Host "The operating system version is not supported, Windows 7 or Windows Server 2008 required."
    $failedPrereqs = $true
    # See http://msdn.microsoft.com/en-us/library/aa394239(v=VS.85).aspx for other properties of Win32_OperatingSystem
    # See http://msdn.microsoft.com/en-us/library/aa394084(VS.85).aspx for additional WMI classes
}

The script fetches the Win32_OperatingSystem WMI object for interrogation using Get-WmiObject. This object contains a good deal of useful information, links are provided above to let you drill down into other properties. The script checks the Version to ensure that we are working with either Windows 7 or Windows Server 2008, in which case the version starts with “6.1”.

Next we look for a couple of installed services:

# IIS is installed
$IISService = Get-Service -Name 'W3SVC' -ErrorAction SilentlyContinue
if(-not $IISService) {
    Write-Host "IIS is not installed on" $env:computername
    $FailedPrereqs = $true
}

# AppFabric is installed
$AppFabricMonitoringService = Get-Service -Name 'AppFabricEventCollectionService' -ErrorAction SilentlyContinue
if(-not $AppFabricMonitoringService) {
    Write-Host "AppFabric Monitoring Service is not installed on" $env:computername
    $FailedPrereqs = $true
}

$AppFabricMonitoringService = Get-Service -Name 'AppFabricWorkflowManagementService' -ErrorAction SilentlyContinue
if(-not $AppFabricMonitoringService) {
    Write-Host "AppFabric Workflow Management Service is not installed on" $env:computername
    $FailedPrereqs = $true
}

A basic pattern is repeated here using the Get-Service command to determine if a particular Windows Service is installed on the machine.

With the service requirements checked, we look to see if we have the correct version of the .NET framework installed. In our case we want the RTM of version 4 and go to the registry to validate this.

$frameworkVersion = get-itemProperty -Path 'HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full' -ErrorAction SilentlyContinue
if(-not($frameworkVersion) -or (-not($frameworkVersion.Version -eq '4.0.30319'))){
    Write-Host "The RTM version of the full .NET 4 framework is not installed."
    $FailedPrereqs = $true
}

The registry provider is used, HKLM: [HKEY_LOCAL_MACHINE], to look up a path in the registry that should contain the version. If the key is not found or the value is incorrect we fail the test.

Those are all the checks made in the original script from the DEV306 session, however there is great feature in Windows Server 2008 R2 that allows very simple querying of the installed Windows features. I found this by accident:

>Get-Module -ListAvailable

This command lists all of the available modules on a system, the ServerManager module looked interesting:

>Get-Command -Module ServerManager

CommandType Name Definition
----------- ---- ----------
Cmdlet Add-WindowsFeature Add-WindowsFeature [-Name] [-IncludeAllSubFeature] [-LogPath ] [-...
Cmdlet Get-WindowsFeature Get-WindowsFeature [[-Name] ] [-LogPath ] [-Verbose] [-Debug] [-Err...
Cmdlet Remove-WindowsFeature Remove-WindowsFeature [-Name] [-LogPath ] [-Concurrent] [-Restart...

A simple add/remove/get interface which allows you to easily determine which Windows roles and features are installed – then add or remove as required. This is ideal for pre-requisite checking as we can now explicitly check to see if the WinRM IIS Extensions are installed for example:

import-module ServerManager

if(-not (Get-WindowsFeature ‘WinRM-IIS-Ext’).Installed) {
    Write-Host "The WinRM IIS Extension is not installed"
}

Simply calling Get-WindowsFeature lists all features and marks-up those that are installed with [X]:

PS>C:\Windows\system32> Get-WindowsFeature

Display Name Name
------------ ----
[ ] Active Directory Certificate Services AD-Certificate
[ ] Certification Authority ADCS-Cert-Authority
[ ] Certification Authority Web Enrollment ADCS-Web-Enrollment
[ ] Certificate Enrollment Web Service ADCS-Enroll-Web-Svc
[ ] Certificate Enrollment Policy Web Service ADCS-Enroll-Web-Pol
[ ] Active Directory Domain Services AD-Domain-Services
[ ] Active Directory Domain Controller ADDS-Domain-Controller
[ ] Identity Management for UNIX ADDS-Identity-Mgmt
[ ] Server for Network Information Services ADDS-NIS
[ ] Password Synchronization ADDS-Password-Sync
[ ] Administration Tools ADDS-IDMU-Tools
[ ] Active Directory Federation Services AD-Federation-Services
[ ] Federation Service ADFS-Federation
[ ] Federation Service Proxy ADFS-Proxy
[ ] AD FS Web Agents ADFS-Web-Agents
[ ] Claims-aware Agent ADFS-Claims
[ ] Windows Token-based Agent ADFS-Windows-Token
[ ] Active Directory Lightweight Directory Services ADLDS
[ ] Active Directory Rights Management Services ADRMS
[ ] Active Directory Rights Management Server ADRMS-Server
[ ] Identity Federation Support ADRMS-Identity
[X] Application Server Application-Server
[X] .NET Framework 3.5.1 AS-NET-Framework
[X] AppFabric AS-AppServer-Ext
[X] Web Server (IIS) Support AS-Web-Support
[X] COM+ Network Access AS-Ent-Services
[X] TCP Port Sharing AS-TCP-Port-Sharing
[X] Windows Process Activation Service Support AS-WAS-Support
[X] HTTP Activation AS-HTTP-Activation
[X] Message Queuing Activation AS-MSMQ-Activation
[X] TCP Activation AS-TCP-Activation
...

The right hand column contains the name of the feature to use via the command.

I ended up writing a simple function to check for a list of features:

<#
.SYNOPSIS
Checks to see if a given set of Windows features are installed.    

.DESCRIPTION
Checks to see if a given set of Windows features are installed.

.PARAMETER featureSetArray
An array of strings containing the Windows features to check for.

.PARAMETER featuresName
A description of the feature set being tested for.

.EXAMPLE
Check that a couple of web server features are installed.

Check-FeatureSet -featureSetArray @('Web-Server','Web-WebServer','Web-Common-Http') -featuresName 'Required Web Features'

#>
function Check-FeatureSet{
    param(
        [Parameter(Mandatory=$true)]
        [array] $featureSetArray,
        [Parameter(Mandatory=$true)]
        [string]$featuresName
    )
    Write-Host "Checking $featuresName for missing features..."

    foreach($feature in $featureSetArray){
        if(-not (Get-WindowsFeature $feature).Installed){
            Write-Host "The feature $feature is not installed"
        }
    }
}

The function introduces a number of PowerShell features such as comment documentation, functions, parameters and parameter attributes. I don’t intend to dwell on any as I hope the code is quite readable.

Then to use this:

# array of strings containing .NET related features
$dotNetFeatureSet = @('NET-Framework','NET-Framework-Core','NET-Win-CFAC','NET-HTTP-Activation','NET-Non-HTTP-Activ')

# array of string containing MSMQ related features
$messageQueueFeatureSet = @('MSMQ','MSMQ-Services','MSMQ-Server')

Check-FeatureSet $dotNetFeatureSet '.NET'
Check-FeatureSet $messageQueueFeatureSet 'Message Queuing'

To complete the pre-requisite check, after making each individual test the failure variable is evaluated. If true then the script ends with a suitable message, otherwise we go ahead with the install.

Installing the Service

The first step in the installation is to copy the required files from a known location. This is a pull model – the target server pulls the files across the network, rather than having the files pushed on to the server via an administration share or such like [e.g. \\myMachine\c$\Services\].

$sourcePath = '\\SomeMachine\MagicEightBallInstaller\'
$installPath = 'C:\Services\MagicEightBall'

if(-not (Test-Path $sourcePath)) {
Write-Host 'Cannot find the source path ' $sourcePath
Throw (New-Object System.IO.FileNotFoundException)
}

if(-not (Test-Path $installPath)) {
New-Item -type directory -path $installPath
Write-Host 'Created service directory at ' $installPath
}

Copy-Item -Path (Join-Path $sourcePath "*") -Destination $installPath -Recurse

Write-Host 'Copied the required service files to ' $installPath

The file structure is copied from a network share onto the machine the script is running on. The Test-Path command determines whether a path exists an allows appropriate action to be taken. To perform a recursive copy the Copy-Item command is called, using the Join-Path command to establish the source path. These path commands can be used with any provider, not just the file system.

With the files and directories in place, we now need to host the service in IIS. To do this we need to use the PowerShell module for IIS:

import-module WebAdministration # require admin-level privileges

Next…

$found = Get-ChildItem IIS:\AppPools | Where-Object {$_.Name -eq "NewAppPool"}
if(-not $found){
    New-WebAppPool 'NewAppPool'
}

We want to isolate our service into its own pool so we check to see if NewAppPool exists and if not we create it. We are using the IIS: provider to treat the web server as if it was a file system, again we just use standard commands to query the path.

Set-ItemProperty IIS:\AppPools\NewAppPool -Name ProcessModel -Value @{IdentityType=3;Username="MyServer\Service.EightBall";Password="p@ssw0rd"} # 3 = Custom

Set-ItemProperty IIS:\AppPools\NewAppPool -Name ManagedRuntimeVersion -Value v4.0

Write-Host 'Created application pool NewAppPool'

Having created the application pool we set some properties. In particular we ensure that .NET v4 is used and that a custom identity is used. The @{} syntax allows us to construct new object instances – in this case a new process model object.

New-WebApplication -Site 'Default Web Site' -Name 'MagicEightBall' -PhysicalPath $installPath -ApplicationPool 'NewAppPool' -Force

With the application pool in place and configured, we next set-up the web application itself. The New-WebApplication command is all we need, giving it the site, application name, physical file system path and application pool.

Set-ItemProperty 'IIS:/Sites/Default Web Site/MagicEightBall' -Name EnabledProtocols 'http,net.tcp' # do not include spaces in the list!

Write-Host 'Created web application MagicEightBall'

To enable both HTTP and net.tcp endpoints, we simply update the EnabledProtocols property of the web application. Thanks to default endpoints in WCF4, this is all we need to do get both protocols supported. Note: do not put spaces into the list of protocols.

Configuring AppFabric Monitoring

We now have enough script to create the service host, but we want to add AppFabric monitoring. Windows Server AppFabric has a rich PowerShell API, to access it we need to import the module:

import-module ApplicationServer

Next we need to create our monitoring database:

[Reflection.Assembly]::LoadWithPartialName("System.Data")

$monitoringDatabase = 'MagicEightBallMonitoring'
$monitoringConnection = New-Object System.Data.SqlClient.SqlConnectionStringBuilder -argumentList "Server=localhost;Database=$monitoringDatabase;Integrated Security=true"
$monitoringConnection.Pooling = $true

We need a couple of variables: a database name and a connection string. We use the SqlConnectionStringBuilder out of the System.Data assembly to get our connection string. This demonstrates the deep integration between PowerShell and .NET.

Add-WebConfiguration -Filter connectionStrings -PSPath "MACHINE/WEBROOT/APPHOST/Default Web Site/MagicEightBall" -Value @{name="MagicEightBallMonitoringConnection"; connectionString=$monitoringConnection.ToString()}

We add the connection string to our web application configuration.

Initialize-ASMonitoringSqlDatabase -Admins 'Domain\AS_Admins' -Readers 'DOMAIN\AS_Observers' -Writers 'DOMAIN\AS_MonitoringWriters' -ConnectionString $monitoringConnection.ToString() -Force

And then we create the actual database, passing in the security groups. While local machine groups can be used, in this case I’m mocking a domain group which is more appropriate for load balanced scenarios.

Set-ASAppMonitoring -SiteName 'Default Web Site' -VirtualPath 'MagicEightBall' -MonitoringLevel 'HealthMonitoring' -ConnectionStringName 'MagicEightBallMonitoringConnection'

The last step is to enable monitoring for the web application, above we are setting a ‘health monitoring’ level which is enough to populate the AppFabric dashboard inside the IIS manager.

Set-ASAppServiceMetadata -SiteName 'Default Web Site' -VirtualPath 'MagicEightBall' -HttpGetEnabled $True

Last of all we ensure that meta data publishing is available for our service. This allows us to test the service using the WCFTestClient application.

Configuration for Kerberos

This is a summary of the voodoo required to get WCF services hosted in IIS to work with a load balancer and kerberos. This took me way longer than I had hoped to figure out so I hope I can save someone else that pain.

We have recently been running some load and stress tests against our latest Golden Gate SP1 product which supports the horizontal scale out of workflow services. This scale out capability is one of the core features of Windows Server AppFabric. Our software is designed to run in an ‘on premise’ scenario and leverages Windows integrated security for authorization of users. A major performance improvement we discovered during our original Golden Gate testing was to ensure kerberos was used rather than NTLM when performing Windows Authentication. We wanted to ensure that our new services were using kerberos for Windows authentication since we had moved some of our services from being hosted as a Windows Service to being hosted in IIS, in particular the workflow services.

Note: in addition to performance advantages, you need to use Kerberos if you want to achieve multi-hop delegation of credentials, NTLM does not support this. The resources at the end of this post discuss this further.

In this post I’m going to walk through a worked example and give a checklist to follow. In a later post I may drill down into a little more of the background, in the meantime I’ll include some additional resources at the end.

Scenario
The scenario involves three application servers that are configured into a network load balanced (NLB) cluster using NLB in Windows Server 2008. The machine names are:
• svexpgg310.ap.aderant.com
• svexpgg311.ap.aderant.com
• svexpgg312.ap.aderant.com

The virtual host name for the NLB is svnlb301.ap.aderant.com.

The NLB is set-up to load balance traffic on port 80, for our HTTP based services and the port range 18180-18199 for our Windows Services. Each of the servers runs all of the services that we support horizontal scale out for and one of the servers (310) runs the services that only support a single instance. In a typical installation we have around 15 services, rather than list out all of these I’ll concentrate on two types:
• services hosted in IIS that expose HTTP endpoints
• services hosted as Windows Services that expose net.tcp endpoints

Alongside the three application servers is a database server that hosts the ADERANT Expert database, the AppFabric monitoring database and the AppFabric workflow persistence database.

The basicHttpBinding configuration used to enable Windows authentication is as follows:

      <basicHttpBinding>
        <binding name="expertBasicHttpBinding" maxReceivedMessageSize="2147483647">
          <readerQuotas maxArrayLength="2147483647" maxStringContentLength="2147483647" />
          <security mode="TransportCredentialOnly">
            <transport clientCredentialType="Windows" proxyCredentialType="Windows">
              <extendedProtectionPolicy policyEnforcement="Never" />
            </transport>
          </security>
        </binding>
      </basicHttpBinding>

1. The servers must be in the local intranet zone of any calling machines.
As of Windows Server 2003, by default only the local intranet zone supports the passing of credentials for Windows Integrated authentication between machines. This makes sense as you rarely want to pass your Windows credentials beyond your own domain. At ADERANT we have a group policy set-up so that all machines have any machine with a name matching *.aderant.com registered in the local intranet zone.

You can explicitly name the servers for the zone, also ensure that the servers are not listed in the Trusted Sites zone.

2. Windows Services exposing WCF net.tcp endpoints must have SPNs registered for both the application server and the network load balancer addresses.

When a non-basicHttpBinding is used, such as net.tcp, the WCF infrastructure checks to ensure that the service is running under the identity that the client expects. This prevents ‘man-in-the-middle’ attacks where someone spoofs the service you want to call with their own for some nefarious purpose. When you generate a service proxy against a net.tcp endpoint you’ll see something similar to the following configuration snippet in the app.config:

<client>
  <endpoint
    address="net.tcp://myserver.mydomain.com:8003/servicemodelsamples/service/spnIdentity"
    binding="netTcpBinding"
    bindingConfiguration="netTcpBinding_ICalculator_Windows"
    contract="ICalculator"
    name="netTcpBinding_ICalculator">
    <identity>
      <servicePrincipalName value="CalculatorSvc/myServer.myDomain.com:8003" />
    </identity>
  </endpoint>
</client>

There is an identity element that specifies the expected identity of the service host. There are two different options supported: and . If your service is published on a domain and you always expect the client calling the service to be online, then the userPrincipalName is easiest to configure. The value attribute contains the identity that the service is running as, e.g. value=“ADERANT_AP\service.expert”.

Alternatively you can set a servicePrincipalName, as above. The service principal name (SPN) is broken down into three parts:

serviceClassName / address [: portNumber]

The service class name is a token that uniquely represents the service. Common service classes are HTTP and HOST, the example above is using CalculatorSvc to uniquely identify a calculation service. At ADERANT we use class names such as ExpertConfigurationSvc. After the service class name comes the machine name, e.g. SVEXPGG310. Note that the NetBIOS name and the fully qualified domain names are considered to be different, it is common place to register both. For example:

ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG310:18180

Once we have an SPN, it must be registered in Active Directory (AD) against the user account used to run the service. We recommend a service account along the lines of myDomain\service.expert to run the ADERANT services. To register this account with an SPN there is a command line tool setspn:

setspn -A ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180 service.expert

As part of our deployment tooling we automatically generate a batch file containing all the SPNs that require to be registered in AD for a given environment. An SPN must not be registered twice, this will cause errors. To see the SPNs currently registered against a user you can use the setspn tool using the -L option and passing the account name:

setspn -L service.expert

If we take our configuration service as an example, we need the following SPNs registered in AD for the scenario environment:

ExpertConfigurationSvc/SVNLB301.ap.aderant.com:18180
ExpertConfigurationSvc/SVNLB301:18180
ExpertConfigurationSvc/SVEXPGG310.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG310:18180
ExpertConfigurationSvc/SVEXPGG311.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG311:18180
ExpertConfigurationSvc/SVEXPGG312.ap.aderant.com:18180
ExpertConfigurationSvc/SVEXPGG312:18180

If you are running a development workstation, you will often see HOST/localhost as the SPN generated by the svcutil for locally hosted WCF services. This indicates that the service is expected to be running on the local machine.

If the service needs to support delegation then the AD account used to run the service must have this enabled:

The account must also be granted ‘Log on as a service’ rights on the application server hosting the service. This can be set-up using the local machine policies admin tool or pushed out via group policy.

3. Load balanced WCF Services hosted in IIS, using HTTP bindings, must have HTTP SPNs added for the account of the application pool.

By default an SPN is created in AD for the machine account of a server running IIS, for example HTTP/SVEXPGG310. In a load balanced scenario the machine account SPN cannot be used to issue a kerberos ticket because it is different for each machine in the application farm. Instead the kerberos ticket needs to be issued using the identity of the application pool that the web service is running under. If you have multiple application pools, these must all be running under the same account. The application pool account must have SPNs registered for the HTTP service as follows:

setspn -A HTTP/svnlb301.ap.aderant.com service.expert
setspn -A HTTP/svnlb301 service.expert
setspn -A HTTP/svexpgg310.ap.aderant.com service.expert
setspn -A HTTP/svexpgg310 service.expert
setspn -A HTTP/svexpgg311.ap.aderant.com service.expert
setspn -A HTTP/svexpgg311 service.expert
setspn -A HTTP/svexpgg312.ap.aderant.com service.expert
setspn -A HTTP/svexpgg312 service.expert

Here we have both the NetBIOS and FQDNs for the servers and the load balancer.

4. Load balanced WCF services hosted in IIS, using HTTP bindings, must use the Application Pool credentials to issue kerberos tickets.

In addition to adding the SPNs in 3, now change IIS so that it uses the app pool credentials for the kerberos ticket. This can be done either through the configuration manager in IIS or from the command line.

The obscured section path is system.webServer/security/authentication/windowsAuthentication.
From a command line:
appcmd set config /section:windowsAuthentication /useAppPoolCredentials:true

This has to be set on all of the application servers within the application farm.

While in IIS configuration, it is also worth setting authPersistNonNTLM to true, see http://support.microsoft.com/kb/954873 for details.

5. Enabled Windows Authentication on the required web applications in IIS.
There are two parts to this, the first of which is to ensure that the Windows Authentication provider for IIS is installed. This can be checked in the Windows features control panel.

The next step isto enable the Windows Authentication on the website itself. From the dashboard for the site, open the Authentication manager and then ensure that Windows Authentication is enabled:

While you are here, it’s worth checking the advanced properties of the Windows Authentication (available from the context menu) to ensure that Kernel-mode authentication is set.

This can also be set programmatically:

appcmd set config “Default Web Site/MyWebService” -section:system.webServer/security/authentication/windowsAuthentication /enabled:true /commit:apphost

Wrap up & Testing
Those are the key steps required to get kerberos working in a load balanced environment:
1. ensure the servers are in the local intranet zone.
2. create and register SPNs for net.tcp services for all app servers and the load balancer.
3. create and register HTTP SPNs for all app servers and the load balancer.
4. take care to avoid duplicate SPNs.
5. understand that NetBIOS and FQDNs require separate SPNs.
6. set useAppPoolCredentials to true on all IIS servers in the app farm.
7. run all application pools using a common domain service account, give this account permission to delegate and log on as a service.
8. ensure the web applications for the services have Windows authentication enabled.

It’s mostly straight forward once you’ve been through the steps once.

The easiest tool to test with is a browser and Fiddler. From within Fiddler you can look at the authorization headers for the HTTP requests which will show you if kerberos or NTLM is used. We expose an OData service which requires Windows authentication, it was very easy to trace the authentication negotiation going on for this site within Fiddler.

Resources
Security in WCF (MSDN Magazine): http://msdn.microsoft.com/en-us/magazine/cc163570.aspx

Patterns & Practices Kerberos Overview: http://msdn.microsoft.com/en-us/library/ff649429.aspx

Patterns & Practices WCF Security Guide: http://msdn.microsoft.com/en-us/library/ff650794.aspx

Service Deployment

Deployment is one of those tasks that can often be left late in the development lifecycle, though it is a non-trivial problem. The adoption of continuous integration as part of an agile approach encourages the deployment aspects to be undertaken along side the development so that at the end of each sprint, the stakeholder has an installable piece of software delivered. When creating a service orientated architecture the deployment problem increases in complexity. Gone are the days of a SQL script for the database server and an installer for the client machine. Now there are often tens of servers interacting in a medium scale solution, often in a web or application server farm to provide both resilience and scale out capabilities. Almost two years ago I took a step back and looked at how we were deploying software and saw that there had to be a better way. We were installing early versions of the Golden Gate software onto customer sites and experiencing a lot teething problems getting the system running. Often the problems were due to the servers not having the required pre-requisites installed such as the .NET framework, they did not have the correct services running and so on. In an attempt to document the installation process we ended up with an installation guide that was rapidly approaching 100 pages. There had to be a better way…

Environment and Role Manifests
I’m on occasion reminded that I’m primarily paid to think so I took a deep breath and started to think about the problem. What would the ideal situation be? The first, and in many ways the biggest, realization is that we wanted to treat the deployment of the whole system as a unit of work. We wanted to allow an administrator to define where they wanted our software to be deployed into their site and then they simply click ‘go’.

The definition of the system would include a list of the servers they wanted to use and the roles they wanted the server to perform. Windows Server has the concept of a role, when setting up a new installation you choose what you want the server to do; is it the active directory controller, is it an application server, is it a file server, is it a web server? Depending upon which roles you allocate, different features are available. Some roles are incompatible on the same server, some roles are dependent upon other roles being satisfied by other servers. The role concept was something we also required as we had a number of different server components: configuration, security, workflow, messaging and application services. Each component was a unit of deployment, a server could be allocated the workflow role for example, which contained a number of services such as instance management and task management . We did not want to have to walk / remote onto to each server and perform an installation, we wanted a central process co-ordinate and manage the installation across all of the servers.

We needed a collective term for the definition of a complete deployment and in the end I chose the term environment. This came from my days working for an internet bank where we had a strictly defined set of staging platforms (environments) that code had to work its way through on the way to production; integration test, system test, user acceptance test, pre production. The environment is the root level object in a system deployment and contains information such as the environment name, the list of servers to install to, common file locations such as the install directory and others. A firm is expected to have multiple environments, as a minimum: development, test and production/live.

The concepts of the environment and the role are similar to the two manifests that ClickOnce uses to control client installations: the publisher manifest and the application manifest. The publisher manifest is owned by the company that is running the software and it includes information specific to them such as the installation URL. The application manifest is owned the the company who authored the software and includes all of the files required on the client to run the software (amongst other details). In fact I drew a lot of inspiration from ClickOnce, what we wanted was a ClickOnce mechanism for the server deployment. ClickOnce is driven from the two XML manifest files that declare what is required, these are given to the ClickOnce engine to action and the deployment takes place. I’m a big fan of both declarative programming and modeling so I wanted a deployment model that could be actioned. This was 12 months before all the excitement around Oslo and DSLs flared up (and then died down again). We had seen that both WPF and WF worked well as XAML driven runtimes (in .NET 3.X) and so the basic concepts of a deployment model and runtime took shape.

In summary an environment contains a mapping of servers to roles. A role represents an installable server component. Both the environment and role details are captured as manifest files which can be described in XML.

Environment Manifest
The environment manifest is quite simple and most easily explained with an example:

<environment    name="Local" 
                networkSharePath="C:\ExpertShare\Beaker" 
                sourcePath="C:\ExpertSource"
                createClickOnceDeployments="true" 
                expertServiceUser="Domain\service.expert"
                expertServicePassword="SOrtabXXXXX5GF3SDKIEw==">
  <expertDatabaseServer serverName="dbserver.domain.com" serverInstance="">
    <databaseConnection     databaseName="Expert" 
                            username="cmsdbo" 
                            password="eo4G3S2KLO05EzgQb3Q==" />
  </expertDatabaseServer>
  <servers>
    <server name="appserver.domain.com" 
            expertPath="C:\AderantExpert\{{Name}}" 
            skipPrerequisitesCheck="false" servicesWebsite="Default Web Site">
      <roles>
        <role type="configuration"/>
        <role type="customworkflows"/>
        <role type="employeeIntake"/>
        <role type="fileopening"/>
        <role type="identity"/>
        <role type="messaging"/>
        <role type="queryservice"/>
        <role type="security"/>
        <role type="workflow">
          <roleParameters>
            <roleParameter name="defaultSmtpHost" value="smtp.dev.domain.com" />
            <roleParameter name="defaultSmtpPort" value="25" />
            <roleParameter name="defaultFromEmailAddress" value="wfadmin@domain.com" />
          </roleParameters>
        </role>
      </roles>
    </server>
  </servers>
</environment>

This example manifest captures the environment details specific to the installing firm such as the server names, database details, installation source and so on. In this simple example only one application server is specified for brevity, which runs all of the roles. In reality there would be multiple servers listed each running the roles in a load balanced configuration.

Role Manifest
A role manifest defines the pre-requisites, the files and the services deployed as a unit.

Prerequisite Checking
As mentioned, the first problem we hit during a deployment was pre-requisites. How could we be sure that a server was capable of running our software? There were a number of aspects to this:
• was a supported OS installed
• were the correct operating system components installed
• were third party dependencies met
• were the correct supporting services running
• were the components correctly configured

The pre-requisites vary by component so in the role definition we have a section of checks that must all pass before the deployment can proceed. One of the first examples we saw was that the Microsoft Distributed Transaction Co-ordinator (MSDTC) was not enabled on many of the servers. If it was enabled, then the configuration was incorrect and the machine would not accept remote transactions. For Windows Services, the service control manager (SCM) can be queried to find the state of a service and the registry contained the configuration keys for the component settings. The big problem here was the poor support for remote processes in Windows, coming from a UNIX background it has always frustrated me. At the time Powershell v1 was full of promise but it did not support remote sessions, that was coming in v2. Powershell v2 was a CTP and did not look like it would be ready in time. While a number of shell commands have built-in support for running against a remote machine, there were enough gaps, version incompatibilities between 2003 and 2008 or performance issues that in the end I wrote a Windows service that would perform the checking. Using an xcopy deployment and the SC command it is possible to remotely deploy, register and start a Windows service. This service accepts a list of pre-requisite to check and returned a list of results: pass or fail. The pre-requisites required by a role are defined within the role manifest, examples are:

<registryPrerequisite
     path="HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\Security\NetworkDtcAccess"
 value="1"
 description="MSDTC configured to allow remote access." />

<servicePrerequisite
      serviceName="WinRM"
      description="Ensure Windows Remote Management (WS-Management) service is available" />

Required Files
A role contains a list of the files required to be installed on the server and where the files need to go. An installation of Expert has a root directory specified by the installing administrator and then the structure is fixed under that:

Each file to be copied is captured in a files section in the role manifest, an example is:

    <file   filename="Aderant.Framework.Notes.dll"
            deploymentLocation="Local"
            targetRelativePath="LegacyServices" />
    <file   filename="Aderant.Framework.Notes.Presentation.dll"
            deploymentLocation="Local"
            targetRelativePath="LegacyServices" />
    <file   filename="Aderant.Framework.Notes.Services.dll"
            deploymentLocation="Local"
            targetRelativePath="LegacyServices" />

In order to be flexible, the file specification allows the source and target paths to be specified as well as the source and target filenames. This allows us to perform any manipulation of the file structure that we need to.

Services
In Golden Gate SP1 we support host services either as Windows Services under the SCM or in IIS under AppFabric. We are in the process of moving all of our services to AppFabric/IIS however this is not yet complete. Therefore a role manifest may contain a section for Windows Services:

  <serviceHost exeName="Expert.Notes.Service"
             serviceName="Aderant.Framework.Services.NotesService:{{Name}}"
             displayName="ADERANT Notes Services ({{Name}} instance)"
             description="Host for Notes Services for the {{Name}} environment."
             watchFiles="Aderant.Framework.*.dll"
             dependencies="MSMQ">
    <services>
      <service name="Notes"
               assemblyName="Aderant.Framework.Notes.Services.dll"
               entryPoint="Aderant.Framework.Notes.Service.Host.NotesService"
               requiresThread="true"
               serviceName="ADERANT Notes Service"
               proxyInterface="Aderant.Framework.Notes.Service.INotesService"
               serviceClass="ExpertNotesSvc"
               port="[[notesServicePort]]" />
    </services>
  </serviceHost>

and AppFabric hosted services:

  <appFabricServiceHost>
    <applicationPools>
      <applicationPool name="[[workflowApplicationPool]]"
                       netVersion="V4.0" />
      <applicationPool name="[[workflowApplicationPool]]"
                       netVersion="V4.0" />
    </applicationPools>
    <services>
      <service
        name="TaskManagement"
        proxyInterface="Aderant.Tasks.Interfaces.Service.ITaskManagementService"
        applicationPool="[[workflowApplicationPool]]"
        serviceType="FrameworkServices"
        supportedProtocols="http"
        allowAnonymousAuthentication="true"
        allowWindowsAuthentication="true" />
    </services>
  </appFabricServiceHost>

In both cases, the information required to create an host a service is provided. For Windows based services we have a reusable service host exe, AppFabric extends IIS and WAS to provide the hosting.

Deployment Engine
Up to this point we really been looking at the deployment model and how it is captured in the two manifests. These manifests are just an XML serialization of a deployment model. When we load an environment we just map from the XML into an in memory object graph of the environment. We now need something to action the model, and this is the deployment engine.

The deployment engine itself is the coordinator that executes a number of deployment actions. A deployment action performs a piece of work required in a deployment, its interface is as follows:

namespace Aderant.Framework.Deployment.Actions {
    public interface IDeploymentAction: IDeploymentMessage {
        void Deploy(Environment environment);
        void Clean(Environment environment);
        void Validate(Environment environment);
    }
}

The deployment engine supports a set of actions that can be performed to an environment. The three key actions are: deploy, remove (clean in the interface) and validate. When the deployment engine is asked to perform a ‘deploy’, it asks each of the deployment actions in turn to ‘deploy’. We have a library of around 30 deployment actions, examples are:

• AppFabricHostingAction
• FileDeploymentAction
• LoadBalancingConfigurationAction
• ServiceHostBuilderAction
• SQLScriptRunnerAction

Each action in turn knows how to deploy, remove and validate its role in a deployment. The validate action is very important, it allows an administrator to check to see if a pre-installed environment still meets the pre-requisites, still has the required files in place and has the required services up and running. For example it allows an administrator to easy see that a registry setting is no longer correctly set. The deployment actions in turn rely on a set of controller classes that interact with external components such as AppFabric, the file system, the Windows service manager, MSMQ and others. The separation of controller from the deployment actions also a high degree of code re-use as well as better unit testing.

While the deployment engine is currently C# code, it would be relatively easy to move it to a workflow. The deployment engine is a coordinator and therefore the control flow would be quite naturally captured as a workflow. The deployment actions would become an activity library.

As it stands the deployment engine is a command line utility, however it does have a WPF UI that calls through to it (in a very similar model to AppFabric calling the Powershell API from the IIS Manager add-in).

The environment manifest in the screenshot above shows a small load balanced environment being used to host multiple instances of our services.

The declarative deployment model and runtime is a good candidate for a DSL. In fact we prototyped a visual DSL using the Visual DSL toolkit for Visual Studio. This allowed an administrator to literally draw out the deployment diagram for an environment, which was then transformed via a T4 template into an environment XML file. This could then be executed via the deployment engine and used to deploy a full system.

Hunting Zombies (orphaned IIS Web Applications)

Following on from the previous post, it’s time to look at one of the more sensitive areas of AppFabric… the IIS configuration.

When you run many of the AppFabric configuration commands via Powershell or the IIS Manager, the result is a change to a web.config file. IIS configuration is hierarchical with settings being inherited from parent nodes as we saw with connection strings. The implication of this is that when determining the correct settings for a web application, a series of configuration files are parsed. An error in any one of these configuration files can lead to a broken system. The event logs mentioned in the previous post are a good place to look for these errors, the offending configuration files will often be named in the log entry.

[Update: AppFabric has a one time inheritance model for its configuration, if you choose to provide a configuration setting at a node then this overrides the configuration set at a parent node. The scope / granularity of this is all AppFabric config. Microsoft tried to provide a merged inheritance model but it is a non-trivial problem and did not make v1.]

A common issue on a development workstation is the configuration getting left behind due to poor housekeeping. For example, you map a folder into IIS as a web application, this folder contains other subfolders which in turn are also mapped as web applications. If you remove the parent web application without first removing the child applications then the child configuration remains. It cannot be seen via IIS Manager as there is no way to reach it, however you can easily see it through Powershell. One of the many awesome features in Powershell is the provider model which allows any hierarchical system to be navigated in a consistent way. The canonical example is the file system, we are all used to: cd, dir, etc to navigate around. Well, these same commands (which are actually aliases in Powershell to standard verb-noun commands) can be used to navigate other hierarchies, for example IIS.

From a Powershell console running with elevated status (run as Admin), you can do the following:

First you need to add the IIS Management module to the session:

PS> import-module WebAdministration

You can the navigate the IIS structure by changing the ‘drive’ to be IIS:

> IIS:
> ls

Both the dir and ls commands are mapped to get-childitem powershell command via an alias providing a standard Windows console or UNIX console experience. Listing the children at the root level gives us access to the application pools, web sites and SSL bindings. Following through the example above, we navigate to the default web site and then list all of its children. In my case this maps exactly to what is shown in IIS:

Hunting Zombies
So, let’s makes some zombies…

I created a new folder C:\ZombieParent and added two sub folders, ZombieChild1 and ZombieChild2. I then mapped the parent folder to a web application called Zombies and converted the two sub folders also to web applications. Re-running the get-childitem commands now shows:

You can see the three web applications at the end of the list, in IIS Manager we have:

Let’s now remove the parent Zombies web application:


In IIS Manager we no longer see the ZombieChild1 or ZombieChild2 web applications that we can still see via Powershell.

This can be the source of many weird and wonderful errors when working with AppFabric as it tries to parse configuration for zombie web applications. If you are getting strange behavior it is well worth launching a Powershell console and going on a zombie hunt. The web applications left behind can be removed via the console:


Powershell can be a sensitive soul…

I’ll mention another gotcha that tripped me up… case sensitivity. IIS allows you to promote a physical path, to a virtual directory, to a web application. E.g.

> cd \inetpub\wwwroot\
> mkdir test
> IIS:
> cd '\Sites\Default Web Site'
> dir

 directory test C:\inetpub\wwwroot\test

 > new-webvirtualdirectory test -physicalpath 'c:\inetpub\wwwroot\test'
 > dir

 virtualDirectory test C:\inetpub\wwwroot\test

 > remove-webvirtualdirectory test
 > dir

 directory test C:\inetpub\wwwroot\test

However if the case of the directory/virtual directory/web application does not match exactly then you get the following behavior:

> import-module WebAdministration
 > cd \inetpub\wwwroot\
 > mkdir test
 > IIS:
 > cd '\Sites\Default Web Site'
 > dir

 directory test C:\inetpub\wwwroot\test

 > new-webvirtualdirectory Test -physicalpath 'c:\inetpub\wwwroot\test'
 > dir

 directory test C:\inetpub\wwwroot\test
 virtualDirectory Test C:\inetpub\wwwroot\test

 > remove-webvirtualdirectory Test
 > dir

 directory test C:\inetpub\wwwroot\test
 virtualDirectory Test C:\inetpub\wwwroot\test

Here we created a new physical directory under the wwwroot folder and then mapped a virtual directory to this location but used a name of Test rather then test. When we get-childitem and we see two entries: ‘test’ for the physical path and ‘Test’ for the virtual directory. Then we remove the virtual directory but it is not deleted and no error is reported.

This caused a heap of confusion for me when automating our deployments so beware of case! This has been raised with Microsoft as an issue. I found that the ConvertTo-WebApplication cmdlet worked for my needs without the case issues.

How to diagnose errors in AppFabric monitoring configuration

It wasn’t the best Friday, my external hard drive died taking my work iTunes library with it and I wasn’t having much fun with AppFabric either. The dashboard showed no data and the Windows application event log kept filling up with login errors. Looking back, the afternoon was useful since I learned that little bit more about AppFabric though I didn’t get any ‘real’ work done.

I started off reading this: http://social.technet.microsoft.com/wiki/contents/articles/appfabric-items-to-check-when-configuring-appfabric-monitoring.aspx before getting stuck in.

AppFabric has two data stores: a monitoring store and a workflow persistence store. These stores are paired with two Windows services, an event collection service paired with the monitoring store and a workflow management service paired with the workflow persistence store.

Lets start with the event collection service and monitoring store. This service is responsible for capturing the WF and WCF events emitted by services hosted in IIS/WAS and storing them in the monitoring store. These events are used to populate the dashboard that is integrated into IIS Manager. To enable capture of events you can use the ‘Manage WF and WCF Services | Configure…’ option in the web application context menu or the Powershell commands Set-ASAppMonitoring and Start-ASAppMonitoring. For help on these commands call get-help, e.g. ‘get-help Set-ASAppMonitoring’, from a Powershell command line.

When you set up monitoring you need to provide a connection string name and set the monitoring level. As a minimum, the level needs to be set to Health Monitoring to populate the AppFabric dashboard. Below this are the levels Off and Errors Only which are self explanatory. Above this level are End-to-End Monitoring and Troubleshooting both of which capture additional information. End-toEnd Monitoring adds a header into WCF traffic to allow a logical call sequence to be followed. When a WCF service calls another WCF service the header is flowed across the call providing a correlation token for querying by. Note that the capture levels are cumulative, the higher level setting includes all of the events from the settings below. The higher the setting, the greater the impact on the performance of the system as more resources are required to capture and log the monitored events. For day to day operations health monitoring is recommended with the more verbose options used when required to aid troubleshooting. The connection string is a named connection string value, set as a property of the web application (or one of its ancestors). The connection string dashboard page is available from the ASP.NET section of the Features View for the web application.

Clicking on the Connection Strings option brings up the following:


Note that IIS configuration is hierarchical, the connection strings available to the Magic8Ball web application are both inherited which means they are defined at a higher node in the tree. In this case the strings are defined in the machine web.config found at %SystemDrive%\Windows\Microsoft.NET\Framework64\v4.0.30128\Config (I’m using 64-bit Windows and .NET 4.0 RC). When installing AppFabric the default connection strings are written into the machine level web.config. In my case, both connection strings are set-up to use integrated security.

The event collection service is a Windows Service and so managed through the services administration snap-in, services.msc. To help set up integrated security from Windows through to SQL Server, I run the services under a domain account. Note that if you plan to use a machine that is not always on a domain, you need to use a local machine account.


This account needs to have login rights to the SQL Server and should be mapped to the ASMonitoringDbWriter role. In my case I’ve mapped the user to all three roles set up in the monitoring store.

There are four Jobs managed by the SQL Agent that are used to populate and manage the tables in the monitoring database. These are:

The SQL Server Agent must be running on for the tables to be populated. The Import*Events jobs run every 10 seconds by default, if they are not correctly set up your application event log soon fills up with errors and warnings (as I found). These jobs call stored procedures defined in the monitoring database: ASImportTransferEvents, ASImportWcfEvents, ASImportWFEvents and run as the AS_MonitoringDbJobsAdmin. The AutoPurge job is scheduled to run once every minute and calls the ASAutoPurge stored procedure. These stored procedures in turn call ASInternal_* versions of themselves and you can drill into the SQL to see exactly what they do. To housekeep the monitoring database you can use the Clear-ASMonitoringSqlDatabase command. An other option is to move the events to an archive database so that the queries feeding the dashboard remain responsive, see Set-ASMonitoringSqlDatabaseArchiveConfiguration. The archive database can then be managed as per any audit requirements you may have.

To monitor the SQL Agent jobs, you can use the Job Activity Monitor:

The Windows Event Viewer is a great help tracking down the cause of issues and AppFabric sets up a couple of customs logs.

To see the Debug and Analytic logs you need to set the following:

Right click on a debug or analytic log and enable it. Make sure you disable it when you are finished to prevent performance degradation due to high volume event capture.

From these logs I could determine that my IIS configuration had invalid entries, the SQL Server login was failing for the Event Collector and so on. I’ll talk more about diagnosing IIS configuration issues and the workflow persistence store in the next post…