Service Deployment
May 30, 2010 Leave a comment
Deployment is one of those tasks that can often be left late in the development lifecycle, though it is a non-trivial problem. The adoption of continuous integration as part of an agile approach encourages the deployment aspects to be undertaken along side the development so that at the end of each sprint, the stakeholder has an installable piece of software delivered. When creating a service orientated architecture the deployment problem increases in complexity. Gone are the days of a SQL script for the database server and an installer for the client machine. Now there are often tens of servers interacting in a medium scale solution, often in a web or application server farm to provide both resilience and scale out capabilities. Almost two years ago I took a step back and looked at how we were deploying software and saw that there had to be a better way. We were installing early versions of the Golden Gate software onto customer sites and experiencing a lot teething problems getting the system running. Often the problems were due to the servers not having the required pre-requisites installed such as the .NET framework, they did not have the correct services running and so on. In an attempt to document the installation process we ended up with an installation guide that was rapidly approaching 100 pages. There had to be a better way…
Environment and Role Manifests
I’m on occasion reminded that I’m primarily paid to think so I took a deep breath and started to think about the problem. What would the ideal situation be? The first, and in many ways the biggest, realization is that we wanted to treat the deployment of the whole system as a unit of work. We wanted to allow an administrator to define where they wanted our software to be deployed into their site and then they simply click ‘go’.
The definition of the system would include a list of the servers they wanted to use and the roles they wanted the server to perform. Windows Server has the concept of a role, when setting up a new installation you choose what you want the server to do; is it the active directory controller, is it an application server, is it a file server, is it a web server? Depending upon which roles you allocate, different features are available. Some roles are incompatible on the same server, some roles are dependent upon other roles being satisfied by other servers. The role concept was something we also required as we had a number of different server components: configuration, security, workflow, messaging and application services. Each component was a unit of deployment, a server could be allocated the workflow role for example, which contained a number of services such as instance management and task management . We did not want to have to walk / remote onto to each server and perform an installation, we wanted a central process co-ordinate and manage the installation across all of the servers.
We needed a collective term for the definition of a complete deployment and in the end I chose the term environment. This came from my days working for an internet bank where we had a strictly defined set of staging platforms (environments) that code had to work its way through on the way to production; integration test, system test, user acceptance test, pre production. The environment is the root level object in a system deployment and contains information such as the environment name, the list of servers to install to, common file locations such as the install directory and others. A firm is expected to have multiple environments, as a minimum: development, test and production/live.
The concepts of the environment and the role are similar to the two manifests that ClickOnce uses to control client installations: the publisher manifest and the application manifest. The publisher manifest is owned by the company that is running the software and it includes information specific to them such as the installation URL. The application manifest is owned the the company who authored the software and includes all of the files required on the client to run the software (amongst other details). In fact I drew a lot of inspiration from ClickOnce, what we wanted was a ClickOnce mechanism for the server deployment. ClickOnce is driven from the two XML manifest files that declare what is required, these are given to the ClickOnce engine to action and the deployment takes place. I’m a big fan of both declarative programming and modeling so I wanted a deployment model that could be actioned. This was 12 months before all the excitement around Oslo and DSLs flared up (and then died down again). We had seen that both WPF and WF worked well as XAML driven runtimes (in .NET 3.X) and so the basic concepts of a deployment model and runtime took shape.
In summary an environment contains a mapping of servers to roles. A role represents an installable server component. Both the environment and role details are captured as manifest files which can be described in XML.
Environment Manifest
The environment manifest is quite simple and most easily explained with an example:
<environment name="Local" networkSharePath="C:\ExpertShare\Beaker" sourcePath="C:\ExpertSource" createClickOnceDeployments="true" expertServiceUser="Domain\service.expert" expertServicePassword="SOrtabXXXXX5GF3SDKIEw=="> <expertDatabaseServer serverName="dbserver.domain.com" serverInstance=""> <databaseConnection databaseName="Expert" username="cmsdbo" password="eo4G3S2KLO05EzgQb3Q==" /> </expertDatabaseServer> <servers> <server name="appserver.domain.com" expertPath="C:\AderantExpert\{{Name}}" skipPrerequisitesCheck="false" servicesWebsite="Default Web Site"> <roles> <role type="configuration"/> <role type="customworkflows"/> <role type="employeeIntake"/> <role type="fileopening"/> <role type="identity"/> <role type="messaging"/> <role type="queryservice"/> <role type="security"/> <role type="workflow"> <roleParameters> <roleParameter name="defaultSmtpHost" value="smtp.dev.domain.com" /> <roleParameter name="defaultSmtpPort" value="25" /> <roleParameter name="defaultFromEmailAddress" value="wfadmin@domain.com" /> </roleParameters> </role> </roles> </server> </servers> </environment>
This example manifest captures the environment details specific to the installing firm such as the server names, database details, installation source and so on. In this simple example only one application server is specified for brevity, which runs all of the roles. In reality there would be multiple servers listed each running the roles in a load balanced configuration.
Role Manifest
A role manifest defines the pre-requisites, the files and the services deployed as a unit.
Prerequisite Checking
As mentioned, the first problem we hit during a deployment was pre-requisites. How could we be sure that a server was capable of running our software? There were a number of aspects to this:
• was a supported OS installed
• were the correct operating system components installed
• were third party dependencies met
• were the correct supporting services running
• were the components correctly configured
The pre-requisites vary by component so in the role definition we have a section of checks that must all pass before the deployment can proceed. One of the first examples we saw was that the Microsoft Distributed Transaction Co-ordinator (MSDTC) was not enabled on many of the servers. If it was enabled, then the configuration was incorrect and the machine would not accept remote transactions. For Windows Services, the service control manager (SCM) can be queried to find the state of a service and the registry contained the configuration keys for the component settings. The big problem here was the poor support for remote processes in Windows, coming from a UNIX background it has always frustrated me. At the time Powershell v1 was full of promise but it did not support remote sessions, that was coming in v2. Powershell v2 was a CTP and did not look like it would be ready in time. While a number of shell commands have built-in support for running against a remote machine, there were enough gaps, version incompatibilities between 2003 and 2008 or performance issues that in the end I wrote a Windows service that would perform the checking. Using an xcopy deployment and the SC command it is possible to remotely deploy, register and start a Windows service. This service accepts a list of pre-requisite to check and returned a list of results: pass or fail. The pre-requisites required by a role are defined within the role manifest, examples are:
<registryPrerequisite path="HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\Security\NetworkDtcAccess" value="1" description="MSDTC configured to allow remote access." /> <servicePrerequisite serviceName="WinRM" description="Ensure Windows Remote Management (WS-Management) service is available" />
Required Files
A role contains a list of the files required to be installed on the server and where the files need to go. An installation of Expert has a root directory specified by the installing administrator and then the structure is fixed under that:
Each file to be copied is captured in a files section in the role manifest, an example is:
<file filename="Aderant.Framework.Notes.dll" deploymentLocation="Local" targetRelativePath="LegacyServices" /> <file filename="Aderant.Framework.Notes.Presentation.dll" deploymentLocation="Local" targetRelativePath="LegacyServices" /> <file filename="Aderant.Framework.Notes.Services.dll" deploymentLocation="Local" targetRelativePath="LegacyServices" />
In order to be flexible, the file specification allows the source and target paths to be specified as well as the source and target filenames. This allows us to perform any manipulation of the file structure that we need to.
Services
In Golden Gate SP1 we support host services either as Windows Services under the SCM or in IIS under AppFabric. We are in the process of moving all of our services to AppFabric/IIS however this is not yet complete. Therefore a role manifest may contain a section for Windows Services:
<serviceHost exeName="Expert.Notes.Service" serviceName="Aderant.Framework.Services.NotesService:{{Name}}" displayName="ADERANT Notes Services ({{Name}} instance)" description="Host for Notes Services for the {{Name}} environment." watchFiles="Aderant.Framework.*.dll" dependencies="MSMQ"> <services> <service name="Notes" assemblyName="Aderant.Framework.Notes.Services.dll" entryPoint="Aderant.Framework.Notes.Service.Host.NotesService" requiresThread="true" serviceName="ADERANT Notes Service" proxyInterface="Aderant.Framework.Notes.Service.INotesService" serviceClass="ExpertNotesSvc" port="[[notesServicePort]]" /> </services> </serviceHost>
and AppFabric hosted services:
<appFabricServiceHost> <applicationPools> <applicationPool name="[[workflowApplicationPool]]" netVersion="V4.0" /> <applicationPool name="[[workflowApplicationPool]]" netVersion="V4.0" /> </applicationPools> <services> <service name="TaskManagement" proxyInterface="Aderant.Tasks.Interfaces.Service.ITaskManagementService" applicationPool="[[workflowApplicationPool]]" serviceType="FrameworkServices" supportedProtocols="http" allowAnonymousAuthentication="true" allowWindowsAuthentication="true" /> </services> </appFabricServiceHost>
In both cases, the information required to create an host a service is provided. For Windows based services we have a reusable service host exe, AppFabric extends IIS and WAS to provide the hosting.
Deployment Engine
Up to this point we really been looking at the deployment model and how it is captured in the two manifests. These manifests are just an XML serialization of a deployment model. When we load an environment we just map from the XML into an in memory object graph of the environment. We now need something to action the model, and this is the deployment engine.
The deployment engine itself is the coordinator that executes a number of deployment actions. A deployment action performs a piece of work required in a deployment, its interface is as follows:
namespace Aderant.Framework.Deployment.Actions { public interface IDeploymentAction: IDeploymentMessage { void Deploy(Environment environment); void Clean(Environment environment); void Validate(Environment environment); } }
The deployment engine supports a set of actions that can be performed to an environment. The three key actions are: deploy, remove (clean in the interface) and validate. When the deployment engine is asked to perform a ‘deploy’, it asks each of the deployment actions in turn to ‘deploy’. We have a library of around 30 deployment actions, examples are:
• AppFabricHostingAction
• FileDeploymentAction
• LoadBalancingConfigurationAction
• ServiceHostBuilderAction
• SQLScriptRunnerAction
Each action in turn knows how to deploy, remove and validate its role in a deployment. The validate action is very important, it allows an administrator to check to see if a pre-installed environment still meets the pre-requisites, still has the required files in place and has the required services up and running. For example it allows an administrator to easy see that a registry setting is no longer correctly set. The deployment actions in turn rely on a set of controller classes that interact with external components such as AppFabric, the file system, the Windows service manager, MSMQ and others. The separation of controller from the deployment actions also a high degree of code re-use as well as better unit testing.
While the deployment engine is currently C# code, it would be relatively easy to move it to a workflow. The deployment engine is a coordinator and therefore the control flow would be quite naturally captured as a workflow. The deployment actions would become an activity library.
As it stands the deployment engine is a command line utility, however it does have a WPF UI that calls through to it (in a very similar model to AppFabric calling the Powershell API from the IIS Manager add-in).
The environment manifest in the screenshot above shows a small load balanced environment being used to host multiple instances of our services.
The declarative deployment model and runtime is a good candidate for a DSL. In fact we prototyped a visual DSL using the Visual DSL toolkit for Visual Studio. This allowed an administrator to literally draw out the deployment diagram for an environment, which was then transformed via a T4 template into an environment XML file. This could then be executed via the deployment engine and used to deploy a full system.