TestComplete is Kind of Amazing

January 20. 2015 0 Comments

I’ve been doing a lot of Test Automation lately. In a weird co-incidence, a lot of swearing and gnashing of teeth as well. Strange that.

One of the automation tools that I’ve been using is TestComplete, and I’ve been pleasantly surprised to find that it doesn’t get in my way nearly as much as I would have expected. In comparison, I’ve had far more problems with the behaviour of the application under test rather than the execution environment, which says a lot for TestComplete’s ability to interact with horrifying VB6/WinForms chimera applications.

Its not all puppies and roses though. TestComplete is very much a tabs and text boxes kind of application and while parts of the tool have some nice Intellisense (more than I would have expected) its certainly no Visual Studio. In addition, binary files make meaningful diffs impossible, its licensing model is draconian at best and the price is insane.

But aside from those gripes, it certainly does get the job done.

When I decided that in order to test a piece of functionality involving the database, I would need to leverage some AWS EC2 images, I was unsurprised to find that it had support for executing Powershell scripts. I seems to be able to do just about anything really.

Well, I suppose support is a strong word.

Scripted Interactions

TestComplete supports a number of scripting languages for programmatically writing tests. There is also a very useful record-replay GUI which actually creates a programmatically accessible map of your application, which can then be used in scripts if you want.

When you create a TestComplete project, you can choose from VBScript, JScript and two other scripting varieties that slip my mind. After you’ve created the project, you don’t seem to be able to change this choice, so make sure you pick one you want to use.

I did not create the project.

The person who did picked JScript, which I have never written before. But at least it wasn't VBScript I suppose.

My understanding of JScript is limited, but I believe it is a Microsoft implementation of some Javascript standard. I have no idea which one, and its certainly not recent.

Anyway, I’m still not entirely sure how it all fits together, but from the JScript code you can access some of the .NET Framework libraries, like System.Management.Automation.

Using this namespace, you can create an object that represents a Powershell process, configure it with some commands, and then execute it.

Hence Powershell support.

I Have the Power

I’ve written about my growing usage of Powershell before, but I’ve found it to be a fantastically powerful tool. It really can do just about anything. In this case, I gravitated towards it because I’d already written a bunch of scripts for initialising AWS EC2 instances, and I wanted to reuse them to decrease the time it would take me to put together this functional test. For this test I would need to create an AMI containing a database in the state that I wanted it to be in, and then spin up instances of that AMI whenever I wanted to run the test.

In the end, as it is with these sorts of things, it wasn’t setting up the EC2 instances that took all of my time. That yak had already been shaved.

The first problem I had to solve was actually being able to run a Powershell script file using the component above.

Anyone who has ever used Powershell is probably very familiar with the Execution Policy of the system in question. Typically this defaults to Restricted, which means you are not allowed to execute scripts.

Luckily you can override this purely for the process you are about to execute, without impacting on the system at large. This, of course, means that you can change the Execution Policy without needing Administrative privileges.

Set-ExecutionPolicy RemoteSigned -Scope Process -Force

The second problem I had to solve was getting results out of the Powershell component in TestComplete. In my case I needed to get the Instance ID that was returned by the script, and then store it in a TestComplete variable for use later on. I’d need to do the same thing with the IP address, obtained after waiting for the instance to be ready.

Waiting for an instance to be ready is actually kind of annoying. There are two things you need to check. The first is whether or not the instance is considered running. The second is whether or not the instance is actually contactable through its network interface. Waiting for an instance to be running only takes about 15-30 seconds. Waiting for an instance to be contactable takes 5-10 minutes depending on how capricious AWS is feeling. As you can imagine, this can make testing your “automatically spin up an instance” script a very frustrating experience. Long feedback loops suck.

When you execute a series of commands through the Powershell component, the return value is a collection of PSObjects. These objects are what would have normally been returned via the output stream. In my case I was returning a single string, so I needed to get the first entry in the collection, and then get its ImmediateBaseObject property. To get the string value I then had to get the OleValue property.

Tying both of the above comments together, here is the collection of functions that I created to launch Powershell from TestComplete.

function CreatePowershellExecutionObject()
{
  var powershell = dotNET.System_Management_Automation.Powershell.Create();
  // This is set before every script because otherwise we can't execute script files,
  // Thats where most of our powershell lives.
  powershell.AddScript("Set-ExecutionPolicy RemoteSigned -Scope Process -Force");
  
  // This redirects the write-host function to nothingness, because otherwise any
  // script executed through this componenent will fail if it has the audacity to
  // write-host. This is because its non-interactive and write-host is like a special
  // host channel thing that needs to be implemented.
  powershell.AddScript("function write-host { }");
  
  return powershell;
}

function InvokePowershellAndThrowIfErrors(powershell)
{
  var result = powershell.Invoke();
  
  if (powershell.HadErrors)
  {
    var firstError = powershell.Streams.Error.Item(0);
    if (firstError.ErrorDetails != null)
    {
      throw new Error(firstError.ErrorDetails);
    }
    else
    {
      throw new Error(firstError.Exception.ToString());
    }
  }
  
  return result;
}

function GetCommonPowershellAwsScriptArguments()
{
  var awsKey = Project.Variables.AwsKey;
  var awsSecret = Project.Variables.AwsSecret;
  var awsRegion = Project.Variables.AwsRegion;
  return "-SuppliedAwsKey " + awsKey + " -SuppliedAwsSecret " + awsSecret + " -SuppliedAwsRegion " + awsRegion + " "
}

function StartAwsVirtualMachineWithFullDatabaseAndSetInstanceId()
{
  var gatewayVersion = GetVersionOfGatewayInstalled();
  var scriptsPath = GetScriptsDirectoryPath();
  var executeInstanceCreationScript = "& \"" + 
    scriptsPath + 
    "\\functional-tests\\create-new-max-size-db-ec2-instance.ps1\" " + 
    GetCommonPowershellAwsScriptArguments() +
    "-BuildIdentifier " + gatewayVersion;

  var powershell = CreatePowershellExecutionObject();
  powershell.AddScript(executeInstanceCreationScript);
  
  Indicator.PushText("Spinning up AWS EC2 Instance containing Test Database");
  var result = InvokePowershellAndThrowIfErrors(powershell);
  Indicator.PopText();
  
  var instanceId = result.Item(0).ImmediateBaseObject.OleValue; 
  
  KeywordTests.WhenADatabaseIsAtMaximumSizeForExpress_ThenTheFilestreamConversionStillWorks.Variables.FullDbEc2InstanceId = instanceId;
}

Notice that I split the instance creation from the waiting for the instance to be ready. This is an optimisation. I create the instance right at the start of the functional test suite, and then execute other tests while that instance is being setup in AWS. By the time I get to it, I don’t have to wait for it to be ready at all. Less useful when testing the database dependent test by itself, but it shaves 6+ minutes off the test suite when run together. Every little bit counts.

Application Shmaplication

Now that the test database was being setup as per my wishes, it was a simple matter to record the actions I wanted to take in the application and make some checkpoints for verification purposes.

Checkpoints in TestComplete are basically asserts. Record a value (of many varied types, ranging from Onscreen Object property values to images) and then check that the application matches those values when it is run.

After recording the test steps, I broke them down into reusable components (as is good practice) and made sure the script was robust in the face of failures and unexpected windows (and other things).

The steps for the executing the test using the application itself were easy enough, thanks to TestComplete.

Tidbits

I did encounter a few other things while setting this test up that I think are worth mentioning.

The first was that the test needed to be able to be run from inside our VPC (Virtual Private Cloud) in AWS as well as from our local development machines. Actually running the test was already a solved problem (solved when I automated the execution of the functional tests late last year), but making a connection to the virtual machine hosting the database in AWS was a new problem.

Our AWS VPC is fairly locked down (for good reason) so machines in there generally can’t access machines in the outside word except over a few authorised channels (HTTP and HTTPS through a proxy for example). Even though the database machine was sitting in the same VPC as the functional tests worker, I had planned to only access it through its public IP address (for simplicity). This wouldn’t work without additional changes to our security model, which would have been a pain (I have no control over those policies).

This meant that in one case I needed to use the Public IP Address of the instance (when it was being run from our development machines) and in the other I needed to use the Private IP Address.

Code to select the correct IP address fit nicely into my Powershell script to wait for the instance to be ready, which already returned an IP address. All I had to do was test the public IP over a specific port and depending on whether or not it worked, return the appropriate value. I did this using the TCPClient class in the .NET framework.

The second thing I had to deal with was a dependency change.

Previously our TestComplete project was not dependent on anything except itself. It could be very easily compressed into a single file and tossed around the internet as necessary. Now that I had added a dependency on a series of Powershell scripts I had to change the execution script for our Functional Tests to pack up and distribute additional directories. Nothing too painful luckily, as it was as simple enough matter to include more things in the compressed archive.

The final problem I ran into was with the application under test.

Part of the automated Functional Test is to open the database. When you ask the application to do that, it will pop up a fairly standard dialog with server & database name.

Being the helpful VB6 application that it is, it also does a search of the network to find any database servers that are available, so you can quickly select them from a dropdown. Being a single threaded application without a great amount of thought being put into user experience, the dialog freezes for a few moments while it does the search.

If you try to find that Window or any of its components using TestComplete while its frozen, the mapping for the dialog box changes from what you would expected (Aliases.Application.DialogName) to something completely different (Process(“ApplicationName”).Window(“DialogName”).Control(“Name”)). Since the test was recorded using the first alias, it then times out when looking for the control it expects, and fails.

I got around this by introducing a delay before even attempting to look for any controls on that dialog.

Depressingly, this is a common solution to solving automation problems like that.

Conclusion

If you were expecting this section to be where I tie everything together and impart on you some TL;DR lessons, then prepare yourself for disappointment.

The one thing I will say though is that setting up good automated Functional Tests takes a hell of a lot of effort. it gets easier as you spend more time doing it, but sometimes I question the benefits. Certainly its an interesting exercise and you learn a lot about the system under test, but the tests you create are usually fragile and overly complex.

Functional Tests that automate an application definitely shouldn’t be your only variety of test that’s for sure.

Regardless of the adventure above, TestComplete is pretty great, and it certainly makes the whole automated testing process much easier than it would be if you used something else.

Like CodedUI.

Which I have done before.

I don’t recommend it.

Crystal Clear as Mud

January 13. 2015 0 Comments

You may have noticed the Crystal Reports tag attached to this post.

Yes.

Crystal Reports.

The much maligned, overly complex and generally poorly regarded report generation software that I’m sure all software developers have had to touch at some point or another.

Well, it was finally my turn to touch it. I’ve managed to dodge it for the last 10 years of my career, but it looks like my number finally came up and I needed to delve into the belly of the beast to try and figure out why it was generating incorrect output.

Unluckily for me it wasn’t even a newer version of Crystal Reports, it was an old version. For all I know the component might be amazing now, but I suspect it has its reputation as a result of the older versions anyway, so I at least knew what I was getting myself into.

Reporting Issues

I’m currently working on a legacy line-of-business application written primarily in VB6. We are in the process of (slowly) moving it into .NET (which is good), but being that this is a gradual process, we still need to maintain and support the older code, ensuring that it continues to work as it always has.

This particular application, like many line-of-business applications, has the ability to generate many different varieties of reports. Suffice to say though, they are important to the users of the software.

The reports are generated using an old ActiveX version of Crystal Reports, for use with VB6. I don’t actually know what version it is.

There was a bug logged that a particular section of one of the reports generated by the application was accumulating values between rows for a particular field.

That’s right, values were accumulating, meaning that a column in the first row would have some text in it like “Discombobulation Widget”, and the second row would have “Discombobulation Widget Frazwozzer Widget”, when it should have just said “Frazwozzer Widget”. The next row was worse, containing 3 different values jammed together. The weirdest part was, it was only happening for some of the rows, not all of them. Only rows representing data of a particular type would exhibit this behaviour, others would be fine.

This was a new thing, as apparently the report had worked correctly in the past.

Of course, the obvious cause would be that the dataset behind the report was wrong, and contained the wrong values for those fields. Something that could be dealt with inside SQL or maybe VB6 if you’re unlucky.

That was not the cause. The data was fine.

It was coming from inside the report…

Maw of Madness

The reports in the application are defined as .rpt files, which seem to be the Crystal Reports file format. The files are binary of course, so diff tools are useless for tracking changes,

Anyway, after some digging I figured out that Crystal Reports has the concept of formulas, which seem to define the calculations behind a field. Fields are what you use to show actual data in the report, and seem to be derived from the dataset supplied to the report. In the case of this report, the dataset was being supplied via an SQL query on a temporary table that the application had constructed prior to running the report.

The field that was accumulating values between rows was called “details”, which isn’t all that important.

It had a formula though, which looked like this:

stringVar sDetails;

if {@TableType} = 2 and ({Transactions.TransType} = "RE" or {Transactions.TransType} = "LE") then
(
    if IsNull({Transactions.FromText}) or {Transactions.FromText} = "" then
        if IsNull({Transactions.FromRef}) or {Transactions.FromRef} = "" then
            sDetails:={Transactions.Details} 
        else
            sDetails:={Transactions.Details} + " (" + {Transactions.FromRef}+")"
    else
        sDetails:={Transactions.Details} + " (" + {Transactions.FromText} + ")";
)
else if {@TableType} = 0 then
(
     //( //Bank Ledger
    if not isNull({Owners2.FileAs}) and {Owners2.FileAs} <> "" then
        sDetails:={Owners2.FileAs} + " - " + {Transactions.Details}
    else if isNull({Owners2.ID}) and not isNull({Creditors2.FileAs}) and {Creditors2.FileAs} <> "" then
        sDetails:={Creditors2.FileAs} + " - " + {Transactions.Details}        
    else                
        sDetails:={Transactions.Details}
)
else if {@TableType} = 4 then //Sale
(
   if {Transactions.LedgerType}="C" then
        if isnull({Transactions.FromText}) or {Transactions.FromText}="" then
            sDetails:={Transactions.Details}
        else
            sDetails:={Transactions.Details}+" ("+{Transactions.FromText}+")"
)
else if {@TableType} = 5 then //Tenant
(
    if not isNull({ID.ActionID}) then
        sDetails:={Actions.Subject}
    else if not isNull({ID.LogID}) then
        sDetails:={Log.Details}
    else
        sDetails:={Transactions.Details}
)
else
    sDetails:={Transactions.Details};


if {Transactions.LedgerType} ="D" then
    if isnull({Transactions.Paidby}) or {Transactions.Paidby}="" then
        sDetails:=sDetails
    else
        sDetails:=sDetails+"("+{Transactions.Paidby}+")";


if {ID.TextField2}="EFT" or {ID.TextField2}="JNL" then
    sDetails:=sDetails + " " + {ID.TextField3};

sDetails;

By the way, that’s not any normal language to my knowledge. Its some Crystal Reports specific script.

Regardless of what it is, it’s not good code, but it doesn’t look like it has any obvious bugs that would be causing it to carry over data for previous rows.

I’ll cut right to the chase, even though it took me quite a while to figure it out.

See that variable declaration at the top of the code block above?

stringVar sDetails;

That’s not locally scoped.

I know. How could that possibly not be locally scoped. Its right there. Its literally declared inside the script block.

As a result of it not being locally scoped, the values held within it can bleed over when the formula is evaluated for each row. That means there is a code path somewhere in those nested if statements that does not set the variable directly, and instead uses the value already there, which is the root cause of the accumulating values.

That piece of code is at the bottom, executed when the LedgerType is D. It takes the current value and just appends some additional information. When the TableTable is 4 (Sale) and the LedgerType is D, there is no code path that initialises the sDetails variable, meaning it keeps whatever value it had in it from the previous run.

The fix?

Initialise the variable to the empty string at the top and run away as quickly as possible.

Protect Yourself

Well, maybe not run away as quickly as possible.

I didn’t mention it earlier, but there were no tests for this report. I think you probably saw that coming though.

Nothing was being run that would automatically check that is contents were approximately what you would expect. Being that the report had worked in the past, this sort of test would have caught the error (which was introduced when some “improvements” were being made) very close to when the code was actually written, making it much easier to fix.

I would like to rectify the lack of test at some stage, but I know putting that sort of test in place, for the very first time, will not be easy, and will be extremely time consuming.

Testing the code in an isolated way seems to be impossible, as parts of it live in VB6 and other parts live in Crystal Reports, which seems to have a hard dependency on an actual database (actually a temporary table in the database that the application puts together to make the report work, but same idea). That means no easily stubbed out unit test.

The only way I can see to test it end-to-end would be to create a functional test that runs the application itself to create the entries needed for the report, generates the report and then checks it against a screenshot or something similarly simple and robust.

I have not yet done this, but I hope to soon.

Conclusion

Typically I like to close out my blog posts with some sort of lesson learned, or something that I have taken away from the situation that may help me (and maybe you) in the future.

I’m not entirely sure what I can take away from this experience.

Don’t trust third party components?

Sometimes variables that appear local aren’t?

Maybe the lesson is that there isn’t always a lesson, and sometimes software is just super depressing.

You’re Mocking Me Aren’t You

January 6. 2015 0 Comments

Object Relational Mappers (ORMs) are useful tools. If you don’t want to have to worry about writing the interactions to a persistence layer yourself, they are generally a good idea. Extremely powerful, they let you focus on describing the data that you want, rather than manually hacking out (and then maintaining!) the queries yourself, and help with change tracking, transactional constructs and other things.

Testing code that uses an ORM, however, is typically a pain. At least in my experience.

People typically respond to this pain by abstracting the usage of their ORM away, by introducing repositories or some other persistence strategy pattern. They use the ORM inside the repository, and then use the more easily mocked repository everywhere else, where it can be substituted with a much smaller amount of effort. There are other benefits to this approach, including the ability to model the domain more accurately (can abstract away the persistence structure) and the ability to switch out the ORM you use for some other persistence strategy without having to make lots of changes. Possibly.

The downside of creating an abstraction like the above is that you lose a lot of ORM specific functionality, which can be quite powerful. One of the most useful feature of ORMs in C# is to be able to write Linq queries directly against the persistence layer. Doing this allows for all sorts of great things, like only selecting the properties you want and farming out as much of the work as possible to the persistence layer (maybe an SQL database), rather than doing primitive queries and putting it all together in memory. If you do want to leverage that power, you are forced to either make your abstraction leaky, exposing bits of the ORM through it (which makes mocking it harder) or you have to write the needed functionality again yourself, except into your interface, which is duplicate work.

Both of those approaches (expose the ORM, write an abstraction layer) have their upsides and downsides so like everything in software it comes down to picking the best solution for your specific environment.

In the past, I’ve advocated creating an abstraction to help isolate the persistence strategy, usually using the Repository pattern. These days I’m not so sure that is the best way to go about it though, as I like to keep my code as simple as possible and the downsides of introducing another layer (with home grown functionality similar to but not quite the same as the ORM) have started to wear on me more and more.

EF You

I’ve recently started working on an application that uses Entity Framework 6, which is a new experience for me, as all of my prior experience with ORM’s was via NHibernate, and to be brutally honest, there wasn’t much of it.

Alas, this application does not have very many tests, which is something that I hate, so I have been trying to introduce tests into the codebase as I add functionality and fix bugs.

I’m going to assume at this point that everyone who has ever done and programming and writes tests has tried to add tests into a codebase after the fact. Its hard. Its really hard. You have to try and resist the urge to rebuild everything and try and find ways to add testability to an architecture that was never intended to be testable without making too many changes or people start to get uncomfortable.

I understand that discomfort. I mean that's one of the biggest reasons you have tests in the first place, so you can make changes without having to worry about breaking stuff. Without those tests, refactoring to introduce tests is viewed as a risky activity, especially when you first start doing it.

Anyway, I wanted to write an integration tests for a particular piece of new functionality, to verify that everything worked end to end. I’ve written about what I consider an integration test before, but in essence it is any test that involves multiple components working together. These sorts of tests are usually executed with as many things configured and setup the same as the actual application, with some difficult or slow components that sit right at the boundaries being substituted. Persistence layers (i.e. databases) are a good thing to substitute, as well as non-local services, because they are slow (compared to in memory) and usually hard to setup or configure.

In my case I needed to find a way to remove the dependency on an external database, as well as a number of services. The services would be easy, because its relatively trivial to introduce an interface to encapsulate the required behaviour from a service, and then provide an implementation just for testing.

The persistence layer though…

This particular application does NOT use the abstraction strategy that I mentioned earlier. It simply exposes the ability to get a DbContext whenever something needs to access to a persistent store.

A for Effort

Being that the application in question used EF6, I thought that it would be easy enough to leverage the Effort library.

Effort provides an in-memory provider for Entity Framework, allowing you to easily switch between whatever your normal provider is (probably SQL Server) for one that runs entirely in memory.

Notice that I said I thought that it would be easy to leverage Effort…

As is always the case with this sort of thing, the devil is truly in the details.

It was easy enough to introduce a factory to create the DbContext that the application used instead of using its constructor. This allowed me to supply a different factory for the tests, one that leveraged Effort’s in-memory provider. You accomplish this by making sure that there is a constructor for the DbContext that takes a DbConnection, and then use Effort to create one of its fancy in-memory connections.

On the first run of the test with the new in-memory provider, I got one of the least helpful errors I have ever encountered:

System.InvalidOperationException occurred: Sequence contains no matching element
  StackTrace:
       at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source, Func`2 predicate)
       at System.Data.Entity.Utilities.DbProviderManifestExtensions.GetStoreTypeFromName(DbProviderManifest providerManifest, String name)     at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.Configure(EdmProperty column, EntityType table, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.<>c__DisplayClass1.<Configure>b__0(Tuple`2 pm)
       at System.Data.Entity.Utilities.IEnumerableExtensions.Each[T](IEnumerable`1 ts, Action`1 action)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.PrimitivePropertyConfiguration.Configure(IEnumerable`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive.BinaryPropertyConfiguration.Configure(IEnumerable`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride, Boolean fillFromExistingConfiguration)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.StructuralTypeConfiguration.ConfigurePropertyMappings(IList`1 propertyMappings, DbProviderManifest providerManifest, Boolean allowOverride)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.ConfigurePropertyMappings(DbDatabaseMapping databaseMapping, EntityType entityType, DbProviderManifest providerManifest, Boolean allowOverride)
       at System.Data.Entity.ModelConfiguration.Configuration.Types.EntityTypeConfiguration.Configure(EntityType entityType, DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.ConfigureEntityTypes(DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.ModelConfiguration.Configuration.ModelConfiguration.Configure(DbDatabaseMapping databaseMapping, DbProviderManifest providerManifest)
       at System.Data.Entity.DbModelBuilder.Build(DbProviderManifest providerManifest, DbProviderInfo providerInfo)
       at System.Data.Entity.DbModelBuilder.Build(DbConnection providerConnection)
       at System.Data.Entity.Internal.LazyInternalContext.CreateModel(LazyInternalContext internalContext)
       at System.Data.Entity.Internal.RetryLazy`2.GetValue(TInput input)

Keep in mind, I got this error when attempting to begin a transaction on the DbContext. So the context had successfully been constructed, but it was doing…something…during the begin transaction that was going wrong. Probably initialization.

After a significant amount of reading, I managed to find some references to the fact that Effort doesn’t support certain SQL Server specific column types. Makes sense in retrospect, although at the time I didn’t even know you could specify provider specific information like that. I assume it was all based around automatic translation between CLR types and the underlying types of the provider.

There is a lot of entities in this application and the all have a large amount of properties. I couldn’t read through all of the classes to find what the problem was, and I didn’t even know exactly what I was looking for. Annoyingly, the error message didn’t say anything about what the actual problem was, as you can see above. So, back to first principles. Take everything out, and start reintroducing things until it breaks.

I turns out quite a lot of the already existing entities were specifying the types (using strings!) in the Column attribute of their properties. The main offenders were the “timestamp” and “money” data types, which Effort did not seem to understand.

Weirdly enough, Effort had no problems with the Timestamp attribute when specified on a property. It was only when the type “timestamp” was specified as a string in the Column attribute that errors occurred.

The issue here was of course that the type was string based, so the only checking that occurred, occurred at run-time. Because I had introduced a completely different provider to the mix, and the code was written assuming SQL Server, it would get to the column type in the initialisation (which is lazy, because it doesn’t happen until you try to use the DbContext) and when there was no matching column type returned by the provider, it would throw the exception above.

Be Specific

Following some advice on the Effort discussion board, I found some code that moved the SQL Server specific column types into their own attributes. These attributes would then only be interrogated when the connection of the DbContext was actually an SQL Server connection. Not the best solution, but it left the current behaviour intact, while allowing me to use an in-memory database for testing purposes.

Here is the attribute, it just stores a column type as a string.

public class SqlColumnTypeAttribute : Attribute
{
    public SqlColumnTypeAttribute(string columnType = null)
    {
        ColumnType = columnType;
    }

    public string ColumnType { get; private set; }
}

Here is the attribute convention, which EF uses to define a rule that will interpret attributes and change the underlying configuration.

public class SqlColumnTypeAttributeConvention : PrimitivePropertyAttributeConfigurationConvention<SqlColumnTypeAttribute>
{
    public override void Apply(ConventionPrimitivePropertyConfiguration configuration, SqlColumnTypeAttribute attribute)
    {
        if (!string.IsNullOrWhiteSpace(attribute.ColumnType))
        {
            configuration.HasColumnType(attribute.ColumnType);
        }
    }
}

Here is a demo DbContext showing how I used the attribute convention. Note that the code only gets executed if the connection is an SqlConnection.

public partial class DemoDbContext : DbContext
{
    public DemoDbContext(DbConnection connection, bool contextOwnsConnection = true)
        : base(connection, contextOwnsConnection)
    {
    
    }
    
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        if (Database.Connection is SqlConnection)
        {
            modelBuilder.Conventions.Add<SqlColumnTypeAttributeConvention>();
        }
    }
}

Finally, here is the attribute being used in an entity. Previously this entity would have simply had a [Column(TypeName = “timestamp”)] attribute on the RowVersion property, which causes issue with Effort.

public partial class Entity
{
    [Key]
    public int Id { get; set; }

    [SqlColumnType("timestamp")]
    [MaxLength(8)]
    [Timestamp]
    public byte[] RowVersion { get; set; }
}

Even though there was a lot of entities with a lot of properties, this was an easy change to make, as I could leverage a regular expression and find and replace.

Of course, it still didn’t work.

I was still getting the same error after making the changes above. I was incredibly confused for a while, until I did a search for “timestamp” and found an instance of the Column attribute where it supplied both the data type and the order. Of course, my regular expression wasn’t smart enough to pick this up, so I had to manually go through and split those two components (Type which Effort didn’t support and Order which it did) manually wherever they occurred. Luckily it was only about 20 places, so it was easy enough to fix.

And then it worked!

No more SQL Server dependency for the integration tests, which means they are now faster and more controlled, with less hard to manage dependencies.

Of course, the trade-off for this is that the integration tests are no longer testing as close to the application as they could be, but that’s why we have functional tests as well, which run through the instaled application, on top of a real SQL Server instance. You can still choose to run the integration tests with an SQL Server connection if you want, but now you can use the much faster and easier to manage in-memory database as well.

Conclusion

Effort is awesome. Apart from the problems caused by using SQL Server specific annotations on common entities, Effort was extremely easy to setup and configure.

I can’t really hold the usage of SQL Server specific types against the original developers though, as I can’t imagine they saw the code ever being run on a non-SQL Server provider. Granted, it would have been nice if they had of isolated the SQL Server specific stuff from the core functionality, but that would have been unnecessary for their needs at the time, so I understand.

The biggest problem I ran into was the incredibly unhelpful error message coming from EF6 with regards to the unsupported types. If the exception had stated what the type was that couldn’t be found and for which property in which class, I wouldn’t have had to go to so much trouble to find out what the actual problem was.

Its never good being confronted with an entirely useless exception message, and we have to always be careful to make sure that our exceptions fully communicate the problem so that they help future developers, instead of just getting in the way.

A little Effort goes a long way after all.

I’m In Your Objects, Intercepting Your Calls

December 30. 2014 0 Comments

I have an interface.

I’m sure you have some interfaces as well. If you don’t use interfaces you’re probably doing it wrong anyway.

My interface is a bit of a monolith, which happens sometimes. It’s not so big that I can justify investing the time in splitting it apart, but it’s large enough that its hard to implement and just kind of feels wrong. I don’t need to implement this interface manually very often (yay for mocking frameworks like NSubstitute) so I can live with it for now. Not everything can be perfect alas.

This particular interface allows a user to access a RESTful web service, and comes with some supporting data transfer objects.

I recently had the desire/need to see what would happen to the user experience of an application using this interface (and its default implementation) if the internet connection was slow, i.e. the calls to the service were delayed or timed out.

Obviously I could have implemented the interface with a wrapper and manually added slowdown/timeout functionality. As I mentioned previously though, there were enough methods in this interface that that sounded like a time consuming proposition. Not only that, but it would mean I would be tightly coupled to the interface, just to introduce some trivial slowdown code. If the interface changed, I would need to change my slowdown code. That’s bad, as the functionality of my code is distinctly separate from the functionality of the interface, and should be able to reuse that (admittedly trivial) code anywhere I like.

Plus I’m a lazy programmer. I’ll always go out of my way to write as little code as possible.

Aspect Oriented Programming

What I wanted to do was to be able to describe some behaviour about all of the calls on my interface, without actually having to manually write the code myself.

Luckily this concept has already been invented by people much smarter than me. Its generally referred to as Aspect Oriented Programming (AOP). There’s a lot more to AOP than just adding functionality unobtrusively to calls through an interface, but fundamentally it is about supporting cross cutting concerns (logging, security, throttling, auditing, etc) without having to rewrite the same code over and over again.

In this particular case I leveraged the IInterceptor interface supplied by the Castle.DynamicProxy framework. Castle.DynamicProxy is included in the Castle.Core package, and is part of the overarching Castle Project. It is a utility library for generating proxies for abstract classes and interfaces and is used by Ninject and NSubstitute, as well as other Dependency Injection and mocking/substitution frameworks.

Castle.DynamicProxy provides an interface called IInterceptor.

public interface IInterceptor
{
    void Intercept(IInvocation invocation);
}

Of course, that definition doesn’t make a lot of sense without the IInvocation interface (trimmed of all comments for brevity).

public interface IInvocation
{
    object[] Arguments { get; }
    Type[] GenericArguments { get; }
    object InvocationTarget { get; }
    MethodInfo Method { get; }
    MethodInfo MethodInvocationTarget { get; }
    object Proxy { get; }
    object ReturnValue { get; set; }
    Type TargetType { get; }
    object GetArgumentValue(int index);
    MethodInfo GetConcreteMethod();
    MethodInfo GetConcreteMethodInvocationTarget();
    void Proceed();
    void SetArgumentValue(int index, object value);
}

You can see from the above definition that the IInvocation provides information about the method that is being called, along with a mechanism to actually call the method (Proceed).

You can implement this interface to do whatever you want, so I implemented an interceptor that slowed down all method calls by some configurable amount. You can then use your interceptor whenever you create a proxy class.

public class DelayingInterceptor : IInterceptor
{
    private static readonly TimeSpan _Delay = TimeSpan.FromSeconds(5);

    public DelayingInterceptor(ILog log)
    {
        _Log = log;
    }

    private readonly ILog _Log;

    public void Intercept(IInvocation invocation)
    {
        _Log.DebugFormat("Slowing down invocation of [{0}] by [{1}] milliseconds.", invocation.Method.Name, _Delay.TotalMilliseconds);
        Thread.Sleep(_Delay);
        invocation.Proceed();
    }
}

Proxy classes are fantastic. Essentially they are automatically implemented wrappers for an interface, often with a concrete implementation to wrap supplied during construction. When you create a proxy you can choose to supply an interceptor that will be automatically executed whenever a method call is made on the interface.

This example code shows how easy it is to setup a proxy for the fictitious IFoo class, and delay all calls made to its methods by the amount described above.

IFoo toWrap = new Foo();

var generator = new ProxyGenerator();
var interceptor = new DelayingInterceptor(log4net.LogManager.GetLogger("TEST"));
var proxy = generator.CreateInterfaceProxyWithTarget(toWrap, interceptor);

As long as you are talking in interfaces (or at the very least abstract classes) you can do just about anything!

Stealthy Interception

If you use Ninject, it offers the ability to add interceptors to any binding automatically using the optional Ninject.Extensions.Interception library.

You still have to implement IInterceptor, but you don’t have to manually create a proxy yourself.

In my case, I wasn’t able to leverage Ninject (even though the application was already using it), as I was already using a Factory that had some logic in it. This stopped me from simply using Ninject bindings for the places where I was using the interface. I can see Ninjects support for interception being very useful though, now that I understand how interceptors work. In fact, since my slowdown interceptor is very generic, I could conceivably experiment with slowdowns at various levels in the application, from disk writes to background processes, just to see what happens. Its always nice to have that sort of power to see how your application will actually react when things are going wrong.

Other Ways

I’m honestly not entirely sure if Interceptors fit the classic definition of Aspect Oriented Programming. They do allow you to implement cross cutting concerns (like my slowdown), but I generally see AOP referred to in the context of code-weaving.

Code-weaving is where code is automatically added into your classes at compile time. You can use this to automatically add boilerplate code like null checking on constructor arguments as whatnot without having to write the code yourself. Just describe that you want the parameters to be null checked and the code will be added at compile time. I’m not overly fond of this approach personally, as I like having the code in source control represent reality. I can imagine using code-weaving might lead to situations where it is more difficult to debug the code because the source doesn’t line up with the compiled artefacts.

I don’t have any experience here, I’m just mentioning it for completeness.

Conclusion

In cases where you need to be able to describe some generic piece of code that occurs for all method calls of an interface, Interceptors are fantastic. They raise the level that you are coding at in my opinion, moving beyond writing code that directly tells the computer what to do and into describing the behaviour that you want. This leads to less code that needs to be maintained and less hard couplings the interfaces (as you would get if you implemented the wrapper yourself). Kind of like using an IoC container with your tests (enabling you to freely change your constructor without getting compile errors) you can freely change your interface and not have it impact your interceptors.

I’m already thinking of other ways in which I can leverage interceptors. One that immediately comes to mind is logging calls to the service, and timing how long they take, which is invaluable when investigating issues at the users end and for monitoring performance.

Interceptors provide great modular and decoupled way to accomplish certain cross cutting concerns, and I’m going to try and find more ways to leverage them in the future now that I know they exist.

You should to.

Unless you aren’t using interfaces.

Then you’ve got bigger problems.

Automating Functional Tests, Part 02

December 23. 2014 0 Comments

So, we decided to automate the execution of our Functional tests. After all, tests that aren't being run are effectively worthless.

In Part 1, I gave a little background, mentioned the scripting language we were going to use (Powershell), talked about programmatically spinning up the appropriate virtual machine in AWS for testing purposes and then how to communicate with it.

This time I will be talking about automating the installation of the software under test, running the actual tests and then reporting the results.

Just like last time, here is a link to the GitHub repository with sanitized versions of all the scripts (which have been updated significantly), so you don't have to piece them together from the snippets I’m posting throughout this post.

Installed into Power

Now that I could create an appropriate virtual machine as necessary and execute arbitrary code on it remotely, I needed to install the actual software under test.

First I needed to get the installer onto the machine though, as we do not make all of our build artefacts publically available on every single build (so no just downloading it from the internet). Essentially I just needed a mechanism to transfer files from one machine to another. Something that wouldn’t be too difficult to setup and maintain.

I tossed up a number of options:

FTP. I could setup an FTP server on the virtual machine and transfer the files that way. The downside of this is that I would have to setup an FTP server on the virtual machine, and make sure it was secure; and configured correctly. I haven’t setup a lot of FTP servers before, so I decided not to do this one.
SSH + File Transfer. Similar to the FTP option, I could install an SSH server on the virtual machine and then use something like SCP to securely copy the files to the machine. This would have been the easiest option if the machine was Linux based, but being a Windows machine it was more effort than it was worth.
Use an intermediary location, like an Amazon S3 bucket. This is the option I ended up going with.

Programmatically copying files to an Amazon S3 bucket using Powershell is fairly straightforward, although I did run into two issues.

Folders? What Folders?

Even though its common for GUI tools that sit on top of Amazon S3 to present the information as a familiar folder/file directory structure, that is entirely not how it actually works. In fact, thinking of the information that you put in S3 in that way will just get you into trouble.

Instead, its much more accurate to think of the information you upload to S3 to be key/value pairs, where the key tends to look like a fully qualified file path.

I made an interesting error at one point and uploaded 3 things to S3 with the following keys, X, X\Y and X\Z. The S3 website interpreted the X as a folder, which meant that I was no longer able to access the file that I had actually stored at X, at least through the GUI anyway. This is one example of why thinking about S3 as folders/files can get you in trouble.

Actually uploading files to S3 using Powershell is easy enough. Amazon supply a set of cmdlets that allow you to interact with S3, and those cmdlets are pre-installed on machines originally created using an Amazon supplied AMI.

With regards to credentials, you can choose to store the credentials in configuration, allowing you to avoid having to enter them for every call, or you can supply then whenever you call the cmdlets. Because this was a script, I chose to supply them on each call, so that the script would be self contained. I’m not a fan of global settings in general, they make me uncomfortable. I feel that it makes the code harder to understand, and in this case, it would have obfuscated how the cmdlets were authenticating to the service.

The function that uploads things to S3 is as follows:

function UploadFileToS3
{
    param
    (
        [string]$awsKey,
        [string]$awsSecret,
        [string]$awsRegion,
        [string]$awsBucket,
        [System.IO.FileInfo]$file,
        [string]$S3FileKey
    )

    write-host "Uploading [$($file.FullName)] to [$($awsRegion):$($awsBucket):$S3FileKey]."
    Write-S3Object -BucketName $awsBucket -Key $S3FileKey -File "$($file.FullName)" -Region $awsRegion -AccessKey $awsKey -SecretKey $awsSecret

    return $S3FileKey
}

The function that downloads things is:

function DownloadFileFromS3
{
    param
    (
        [string]$awsKey,
        [string]$awsSecret,
        [string]$awsRegion,
        [string]$awsBucket,
        [string]$S3FileKey,
        [string]$destinationPath
    )

    $destinationFile = new-object System.IO.FileInfo($destinationPath)
    if ($destinationFile.Exists)
    {
        write-host "Destination for S3 download of [$S3FileKey] ([$($destinationFile.FullName)]) already exists. Deleting."
        $destinationFile.Delete()
    }

    write-host "Downloading [$($awsRegion):$($awsBucket):$S3FileKey] to [$($destinationFile.FullName)]."
    Read-S3Object -BucketName $awsBucket -Key $S3FileKey -File "$($destinationFile.FullName)" -Region $awsRegion -AccessKey $awsKey -SecretKey $awsSecret | write-host

    $destinationFile.Refresh()

    return $destinationFile
}

Once the installer was downloaded on the virtual machine, it was straightforward to install it silently.

if (!$installerFile.Exists)
{
    throw "The Installer was supposed to be located at [$($installerFile.FullName)] but could not be found."
}

write-host "Installing Application (silently) from the installer [$($installerFile.FullName)]"
# Piping the results of the installer to the output stream forces it to wait until its done before continuing on
# with the remainder of the script. No useful output comes out of it anyway, all we really care about
# is the return code.
& "$($installerFile.FullName)" /exenoui /qn /norestart | write-host
if ($LASTEXITCODE -ne 0)
{
    throw "Failed to Install Application."
}

We use Advanced Installer, which in turn uses an MSI, so you’ll notice that there are actually a number of switches being used above to get the whole thing to install without human interaction. Also note the the piping to write-host, which ensures that Powershell actually waits for the installer process to finish, instead of just starting it and then continuing on its merry way. I would have piped to write-output, but then the uncontrolled information from the installer would go to the output stream and mess up my return value.

Permissions Denied

God. Damn. Permissions.

I had so much trouble with permissions on the files that I uploaded to S3. I actually had a point where I was able to upload files, but I couldn’t download them using the same credentials that I used to upload them! That makes no goddamn sense.

To frame the situation somewhat, I created a user within our AWS account specifically for all of the scripted interactions with the service. I then created a bucket to contain the temporary files that are uploaded and downloaded as part of the functional test execution.

The way that S3 defines permissions on buckets is a little strange in my opinion. I would expect to be able to define permissions on a per user basis for the bucket. Like, user X can read from and write to this bucket, user Y can only read, etc. I would also expect files that are uploaded to this bucket to then inherit those permissions, like a folder, unless I went out of my way to change them. That’s the mental model of permissions that I have, likely as a result of using Windows for many years.

This is not how it works.

Yes you can define permissions on a bucket, but not for users within your AWS account. It doesn’t even seem to be able to define permissions for other specific AWS accounts either. There’s a number of groups available, one of which is Authenticated Users, which I originally set with the understanding that it would give authenticated users belonging to my AWS account the permissions I specified. Plot twist, Authenticated Users means any AWS user. Ever. As long as they are authenticating. Obviously not what I wanted. At least I could upload files though (but not download them).

Permissions are not inherited when set through the simple options I mentioned above, so any file I uploaded had no permissions set on it.

The only way to set permissions with the granularity necessary and to have them inherited automatically is to use a Bucket Policy.

Setting up a Bucket Policy is not straight forward, at least to a novice like myself.

After some wrangling with the helper page, and some reading of the documentation, here is the bucket policy I ended up using, with details obscured to protect the guilty.

{
    "Version": "2008-10-17",
    "Id": "Policy1417656003414",
    "Statement": [
        {
            "Sid": "Stmt1417656002326",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[USER IDENTIFIER]"
            },
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::[BUCKET NAME]/*"
        },
        {
            "Sid": "Stmt14176560023267",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::[USER IDENTIFIER]"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::[BUCKET NAME]"
        }
    ]
}

The policy actually reads okay once you have it, but I’ll be honest, I still don’t quite understand it. I know that I’ve specifically given the listed permissions on the contents of the bucket to my user, and also given List permissions on the bucket itself. This allowed me to upload and download files with the user I created, which is what I wanted. I’ll probably never need to touch Bucket Policies again, but if I do, I’ll make more of an effort to understand them.

Testing My Patience

Just like the installer, before I can run the tests I need to actually have the tests to run.

We currently use TestComplete as our Functional test framework. I honestly haven’t looked into TestComplete all that much, apart from just running the tests, but it seems to be solid enough. TestComplete stores your functional tests in a structure similar to a Visual Studio project, with a project file and a number of files under that that define the actual tests and the order to run them in.

For us, our functional tests are stored in the same Git repository as our code. We use feature branches, so it makes sense to run the functional tests that line up with the code that the build was made from. The Build Agent that builds the installer has access to the source (obviously), so its a fairly simple matter to just zip up the definitions and any dependent files, and push that archive to S3 in the exact same manner as the installer, ready for the virtual machine to download.

Actually running the tests though? That was a bastard.

As I mentioned in Part 1, after a virtual machine is spun up as part of this testing process, I use Powershell remote execution to push a script to the machine for execution.

As long as you don’t want to do anything with a user interface, this works fantastically.

Functional tests being based entirely around interaction with the user interface therefore prove problematic.

The Powershell remote session that is created when executing the script remotely does not have any ability for user-interactivity, and cannot to my knowledge. Its just something that Powershell can’t do.

However, you can remotely execute another tool, PSExec, and specify that it run a process locally in an interactive session.

$testExecute = 'C:\Program Files (x86)\SmartBear\TestExecute 10\Bin\TestExecute.exe'
$testExecuteProjectFolderPath = "$functionalTestDefinitionsDirectoryPath\Application"
$testExecuteProject = "$testExecuteProjectFolderPath\ApplicationTests.pjs"
$testExecuteResultsFilePath = "$functionalTestDefinitionsDirectoryPath\TestResults.mht"

write-host "Running tests at [$testExecuteProject] using TestExecute at [$testExecute]. Results going to [$testExecuteResultsFilePath]."
# Psexec does a really annoying thing where it writes information to STDERR, which Powershell detects as an error
# and then throws an exception. The 2>&1 redirects all STDERR to STDOUT to get around this.
# Bit of a dirty hack here. The -i 2 parameter executes the application in interactive mode specifying
# a pre-existing session with ID 2. This is the session that was setup by creating a remote desktop
# session before this script was executed. Sorry.
& "C:\Tools\sysinternals\psexec.exe" -accepteula -i 2 -h -u $remoteUser -p $remotePassword "$testExecute" "$testExecuteProject" /run /SilentMode /exit /DoNotShowLog /ExportLog:$testExecuteResultsFilePath 2>&1 | write-host
[int]$testExecuteExitCode = $LASTEXITCODE

The –i [NUMBER] in the command above tells PSExec to execute the process in an interactive user session, specifically the session with the ID specified. I’ve hardcoded mine to 2, which isn’t great, but works reliably in this environment because the remote desktop session I create after spinning up the instance always ends up with ID 2. Hacky.

Remote desktop session you may ask? In order for TestComplete (well TestExecute technically) to execute the tests correctly you need to actually have a desktop session setup. I assume this is related to it hooking into the UI user mouse and keyboard hooks or something, I don’t really know. All I know is that it didn't work without a remote desktop session of some sort.

On the upside, you can automate the creation of a remote desktop session with a little bit of effort, although there are two hard bits to be aware of.

Who Are You?

There is no obvious way to supply credentials to the Remote Desktop client (mstsc.exe). You can supply the machine that you want to make the connection to (thank god), but not credentials. I think there might be support for storing this information in an RDP file though, which seem to be fairly straightforward. As you can probably guess from my lack of knowledge about that particular approach, that’s not what I ended up doing.

I still don’t fully understand the solution, but you can use the built in windows utility cmdkey to store credentials for things. If you store credentials for the remote address that you are connecting to, the Remote Desktop client will happily use them.

There is one thing you need to be careful with when using this utility to automate Remote Desktop connections. If you clean up after yourself (by deleting the stored credentials after you use them) make sure you wait until the remote session is established. If you delete the credentials before the client actually uses them you will end up thinking that the stored credentials didn’t work, and waste a day investigating and trialling VNC solutions which ultimately don’t work as well as Remote Desktop before you realise the stupidity of your mistake. This totally happened to a friend of mine. Not me at all.

Anyway, the entirety of the remote desktop script (from start-remote-session.ps1):

param (
    [Parameter(Mandatory=$true,Position=0)]
    [Alias("CN")]
    [string]$ComputerNameOrIp,
    [Parameter(Mandatory=$true,Position=1)]
    [Alias("U")] 
    [string]$User,
    [Parameter(Mandatory=$true,Position=2)]
    [Alias("P")] 
    [string]$Password
)

& "$($env:SystemRoot)\system32\cmdkey.exe" /generic:$ComputerNameOrIp /user:$User /pass:$Password | write-host

$ProcessInfo = new-object System.Diagnostics.ProcessStartInfo

$ProcessInfo.FileName = "$($env:SystemRoot)\system32\mstsc.exe" 
$ProcessInfo.Arguments = "/v $ComputerNameOrIp"

$Process = new-object System.Diagnostics.Process
$Process.StartInfo = $ProcessInfo
$startResult = $Process.Start()

Start-Sleep -s 15

& "$($env:SystemRoot)\system32\cmdkey.exe" /delete:$ComputerNameOrIp | write-host

return $Process.Id

Do You Trust Me?

The second hard thing with the Remote Desktop client is that it will ask you if you trust the remote computer if you don’t have certificates setup. Now I would typically advise that you setup certificates for this sort of thing, especially if communicating over the internet, but in this case I was remoting into a machine that was isolated from the internet within the same Amazon virtual network, so it wasn’t necessary.

Typically, clicking “yes” on an identity verification dialog isn't a problem, even if it is annoying. Of course, in a fully automated environment, where I am only using remote desktop because I need a desktop to actually be rendered to run my tests, it’s yet another annoying thing I need to deal with without human interaction.

Luckily, you can use a registry script to disable the identity verification in Remote Desktop. This had to be done on the build agent instance (the component that actually executes the functional tests).

Windows Registry Editor Version 5.00
[HKEY_CURRENT_USER\Software\Microsoft\Terminal Server Client]
    "AuthenticationLevelOverride"=dword:00000000

Report Soldier!

With the tests actually running reliably (after all the tricks and traps mentioned above), all that was left was to report the results to TeamCity.

It was trivial to simply get an exit code from TestExecute (0 = Good, 1 = Warnings, 2 = Tests Failed, 3 = Tests Didn’t Run). You can then use this exit code to indicate to TeamCity whether or not the tests succeeded.

if ($testResult -eq $null)
{
    throw "No result returned from remote execution."
}

if ($testResult.Code -ne 0)
{
    write-host "##teamcity[testFailed name='$teamCityFunctionalTestsId' message='TestExecute returned error code $($testResult.Code).' details='See artifacts for TestExecute result files']"
}
else
{
    write-host "##teamcity[testFinished name='$teamCityFunctionalTestsId'"
}

That's enough to pass or fail a build.

Of course, if you actually have failing functional tests you want a hell of a lot more information in order to find out whythey failed. Considering the virtual machine on which the tests were executed will have been terminated at the end of the test run, we needed to extract the maximum amount of information possible.

TestComplete (TestExecute) has two reporting mechanisms, not including the exit code from above.

The first is an output file, which you can specify when running the process. I chose an MHT file, which is a nice HTML document showing the tests that ran (and failed), and which has embedded screenshots taken on the failing steps. Very useful.

The second is the actual execution log, which is attached to the TestComplete project. This is a little harder to use, as you need to take the entire project and its log file and open it in TestComplete, but is great for in depth digging, as it gives a lot of information about which steps are failing and has screenshots as well.

Both of these components are zipped up on the functional tests worker and then placed into S3, so the original script can download them and attach them the TeamCity build artefacts. This is essentially the same process as for getting the test definitions and installer to the functional tests worker, but in reverse, so I won’t go into any more detail about it.

Summary

So, after all is said and done, I had automated:

The creation of a virtual machine for testing.
The installation of the latest build on that virtual machine.
The execution of the functional tests.
The reporting of the results, in both a programmatic (build pass/fail) and human readable way.

There were obviously other supporting bits and pieces around the core processes above, but there is little point in mentioning them here in the summary.

Conclusion

All up, I spent about 2 weeks of actual time on automating the functional tests.

A lot of the time was spent familiarising myself with a set of tools that I’d never (or barely) used before, like TestComplete (and TestExecute), the software under test, Powershell, programmatic access to Amazon EC2 and programmatic access to Amazon S3.

As with all things technical, I frequently ran into roadblocks and things not working the way that I would have expected them too out of the box. These things are vicious time sinks, involving scouring the internet for other people who’ve had the same issue and hoping to all that is holy that they solved their problem and then remembered to come back and share their solution.

Like all things involving software, I fought complexity every step of way. The two biggest offenders were the complexity in handling errors in Powershell in a robust way (so I could clean up my EC2 instances) and actually getting TestExecute to run the damn tests because of its interactivity requirements.

When all was said and done though, the functional tests are now an integral part of our build process, which means there is far more incentive to adding to them and maintaining them. I do have some concerns about their reliability (UI focused tests are a bit like that), but that can be improved over time.