Time Out

August 30. 2016 0 Comments

No blog posts for the next 3 weeks because I’m on holidays.

Frankly, I would be surprised if anyone noticed.

I’ll be back with a new post on the 21st of September, 2016.

Sometimes Its Just Your Fault

August 23. 2016 0 Comments

I’ve mentioned a number of times about the issues we’ve had with RavenDB and performance.

My most recent post on the subject talked about optimizing some of our most common queries by storing data in the index, to avoid having to retrieve full documents only to extract a few fields. We implement this solution as a result of guidance from Hibernating Rhinos, who believed that it would decrease the amount of memory churn on the server, because it would no longer need to constantly switch the set of documents currently in memory in order to answer incoming queries. Made sense to me, and they did some internal tests that showed that indexes that stored the indexed fields dramatically reduced the memory footprint of their test.

Alas, it didn’t really help.

Well, that might not be fair. It didn’t seem to help, but that may have been because it only removed one bottleneck in favour of another.

I deployed the changes to our production environment a few days ago, but the server is still using a huge amount of memory, still regularly dumping large chunks of memory and still causing periods of high latency while recovering from the memory dump.

Unfortunately, we’re on our own as far as additional optimizations go, as Hibernating Rhinos have withdrawn from the investigation. Considering the amount of time and money they’ve put into this particular issue, I completely understand their decision to politely request that we engage their consulting services in order to move forward with additional improvements. Their entire support team was extremely helpful and engaged whenever we talked to them, and they did manage to fix a few bugs on the way through, even if it was eventually decided that our issues were not the result of a bug, but simply because of our document and load profile.

What Now?

Its not all doom and gloom though, as the work on removing orphan/abandoned data is going well.

I ran the first round of cleanup a few weeks ago, but the results were disappointing. The removal of the entities that I was targeting (accounts) and their related data only managed to remove around 8000 useless documents from the total of approximately 400 000.

The second round is looking much more promising, with current tests indicating that we might be able to remove something like 300 000 of those 400 000 documents, which is a pretty huge reduction. The reason why this service, whose entire purpose is to be a temporary data store, is accumulating documents, is currently unknown. I’ll have to get to the bottom of that once I’ve dealt with the immediate performance issues.

The testing of the second round of abandoned document removal is time consuming. I’ve just finished the development of the tool and the first round of testing that validated that the system behaved a lot better with a much smaller set of documents in it (using a partial clone of our production environment), even when hit with real traffic replicated from production, which was something of a relief.

Now I have to test that the cleanup scripts and endpoints work as expected when they have to remove data from the server that might have large amounts of binary data linked to it.

This requires a full clone of the production environment, which is a lot more time consuming than the partial clone, because it also has to copy the huge amount of binary data in S3. On the upside, we have scripts that we can use to do the clone (as a result of the migration work), and with some tweaks I should be able to limit the resulting downtime to less than an hour.

Once I’ve verified that the cleanup works on as real of an environment as I can replicate, I’ll deploy it to production and everything will be puppies and roses.

Obviously.

Conclusion

This whole RavenDB journey has been a massive drain on my mental capacity for quite a while now, and that doesn’t even take into account the drain on the business. From the initial failure to load test the service properly (which I now believe was a result of having an insufficient number of documents in the database combined with unrealistic test data), to the performance issues that occurred during the first few months of release (dealt with by scaling the underlying hardware) all the way through to the recent time consuming investigative efforts, the usage of RavenDB for this particular service was a massive mistake.

The disappointing part is that it all could have easily gone the other way.

RavenDB might very well have been an amazing choice of technology for a service focused around being a temporary data store, which may have lead to it being used in other software solutions in my company. The developer experience is fantastic, being amazingly easy to use once you’ve got the concepts firmly in hand and its very well supported and polished. Its complicated and very different from the standard way of thinking (especially regarding eventual consistency), so you really do need to know what you’re getting into, and that level of experience and understanding was just not present in our team at the time the choice was made.

Because of the all issues we’ve had, any additional usage of RavenDB would be met with a lot of resistance. Its pretty much poison now, at least as far as the organisation is concerned.

Software development is often about choosing the right tool for the job and the far reaching impact of those choices cannot be underestimated.

Slow Deployments Make Me Sad

August 16. 2016 0 Comments

As part of the work I did recently to optimize our usage of RavenDB, I had to spin up a brand new environment for the service in question.

Thanks to our commitment to infrastructure as code (and our versioned environment creation scripts), this is normally a trivial process. Use TeamCity to trigger the execution of the latest environment package and then play the waiting game. To setup my test environment, all I had to do was supply the ID of the EC2 volume containing a snapshot of production data as an extra parameter.

Unfortunately, it failed.

Sometimes that happens (an unpleasant side effect of the nature of AWS), so I started it up again in the hopes that it would succeed this time.

It failed again.

Repeated failures are highly unusual, so I started digging. It looked as though the problem was that the environments were simply taking too long to “finish” provisioning. We allocate a fixed amount of time for our environments to complete, which includes the time required to create the AWS resources, deploy the necessary software and then wait until the resulting service can actually be hit from the expected URL.

Part of the environment setup is the execution of the CloudFormation template (and all of the AWS resources that go with that). The other part is deploying the software that needs to be present on those resources, which is where we use Octopus Deploy. At startup, each instance registers itself with the Octopus server, applying tags as appropriate and then a series of Octopus projects (i.e. the software) is deployed to the instance.

Everything seemed to be working fine from an AWS point of view, but the initialisation of the machines was taking too long. They were only getting through a few of their deployments before running out of time. It wasn’t a tight time limit either, we’d allocated a very generous 50 minutes for everything to be up and running, so it was disappointing to see that it was taking so long.

The cfn-init logs were very useful here, because they show start/end timestamps for each step specified. Each of the deployments was taking somewhere between 12-15 minutes, which is way too slow, because there are at least 8 of them on the most popular machine.

This is one of those cases where I’m glad I put the effort in to remotely extract log files as part of the reporting that happens whenever an environment fails. Its especially valuable when the environment creation is running in a completely automated fashion, like it does when executed through TeamCity, but its still very useful when debugging an environment locally. The last thing I want to do is manually remote into each machine in the environment and grab its log files in order to try and determine where a failure is occurring when I can just have a computer do it for me.

Speed Deprived Octopus

When using Octopus Deploy (at least the version we have, which is 2.6.4), you can choose to either deploy a specific version of a project or deploy the “latest” version, leaving the decision of what exactly is the latest version up to Octopus. I was somewhat uncomfortable with the concept of using “latest” for anything other than a brand new environment that had never existed before, because initial testing showed that “latest” really did mean the most recent release.

The last thing I wanted to do was to cede control over exactly what as getting deployed to a new production API instance when I had to scale up due to increased usage.

In order to alleviate this, we wrote a fairly simple Powershell script that uses the Octopus .NET Client library to find what the best version to deploy is, making sure to only use those releases that have actually been deployed to the environment, and discounting deployments that failed.

$deployments = $repository.Deployments.FindMany({ param($x) $x.EnvironmentId -eq $env.Id -and $x.ProjectId -eq $project.Id })

if ($deployments | Any)
{
    Write-Verbose "Deployments of project [$projectName] to environment [$environmentName] were found. Selecting the most recent successful deployment."
    $latestDeployment = $deployments |
        Sort -Descending -Property Created |
        First -Predicate { $repository.Tasks.Get($_.TaskId).FinishedSuccessfully -eq $true } -Default "latest"

    $release = $repository.Releases.Get($latestDeployment.ReleaseId)
}
else
{
    Write-Verbose "No deployments of project [$projectName] to environment [$environmentName] were found."
}

$version = if ($release -eq $null) { "latest" } else { $release.Version }

Write-Verbose "The version of the recent successful deployment of project [$projectName] to environment [$environmentName] was [$version]. 'latest' indicates no successful deployments, and will mean the very latest release version is used."

return $version

We use this script during the deployment of the Octopus projects that I mentioned above.

When it was first written, it was fast.

That was almost two years ago though, and now it was super slow, taking upwards of 10 minutes just to find what the most recent release was. As you can imagine, this was problematic for our environment provisioning, because each one had multiple projects being deployed during initialization and all of those delays added up very very quickly.

Age Often Leads to Slowdown

The slowdown didn’t just happen all of a sudden though. We had noticed the environment provisioning getting somewhat slower from time to time, but we’d assumed it was a result of there simply being more for our Octopus server to do (more projects, more deployments, more things happening at the same time). It wasn’t until it reached that critical point where I couldn’t even spin up one of our environments for testing purposes, that it needed to be addressed.

Looking at the statistics of the Octopus server during an environment provisioning, I noticed that it was using 100% CPU for the entire time that it was deploying projects, up until the point where the environment timed out.

Our Octopus server isn’t exactly a powerhouse, so I thought maybe it had just reached a point where it simply did not have enough power to do what we wanted it to do. This intuition was compounded by the knowledge that Octopus 2.6 uses RavenDB as its backend, and they’ve moved to SQL Server for Octopus 3, citing performance problems.

I dialed up the power (good old AWS), and then tried to provision the environment again.

Still timed out, even though the CPU was no longer maxxing out on the Octopus server.

My next suspicion was that I had made an error of some sort in the script, but there was nothing obviously slow about it. It makes a bounded query to Octopus via the client library (.FindMany with a Func specifying the environment and project, to narrow down the result set) and then iterates through the resulting deployments (by date, so working backwards in time) until it finds an acceptable deployment for identifying the best release.

Debugging the automated tests around the script, the slowest part seemed to be when it was getting the deployments from Octopus. Using Fiddler, the reason for this became obvious.

The call to .FindMany with a Func was not optimizing the resulting API calls using the information provided in the Func. The script had been getting slower and slower over time because every single new deployment would have to retrieve every single previous deployment from the server (including those unrelated to the current environment and project). This meant hundreds of queries to the Octopus API, just to get what version needed to be deployed.

The script was initially fast because I wrote it when we only had a few hundred deployments recorded in Octopus. We have thousands now and it grows every day.

What I thought was a bounded query, wasn’t bounded in any way.

But Age Also Means Wisdom

On the upside, the problem wasn’t particularly hard to fix.

The Octopus HTTP API is pretty good. In fact, it actually has specific query parameters for specifying environment and project when you’re using the deployments endpoint (which is another reason why i assumed the client would be optimised to use them based on the incoming parameters). All I had to do was rewrite the “last release to environment” script to use the API directly instead of relying on the client.

$environmentId = $env.Id;

$headers = @{"X-Octopus-ApiKey" = $octopusApiKey}
$uri = "$octopusServerUrl/api/deployments?environments=$environmentId&projects=$projectId"
while ($true) {
    Write-Verbose "Getting the next set of deployments from Octopus using the URI [$uri]"
    $deployments = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -Verbose:$false
    if (-not ($deployments.Items | Any))
    {
        $version = "latest";
        Write-Verbose "No deployments could not found for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)]. Returning the value [$version], which will indicate that the most recent release is to be used"
        return $version
    }

    Write-Verbose "Finding the first successful deployment in the set of deployments returned. There were [$($deployments.TotalResults)] total deployments, and we're currently searching through a set of [$($deployments.Items.Length)]"
    $successful = $deployments.Items | First -Predicate { 
        $uri = "$octopusServerUrl$($_.Links.Task)"; 
        Write-Verbose "Getting the task the deployment [$($_.Id)] was linked to using URI [$uri] to determine whether or not the deployment was successful"
        $task = Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -Verbose:$false; 
        $task.FinishedSuccessfully; 
    } -Default "NONE"
    
    if ($successful -ne "NONE")
    {
        Write-Verbose "Finding the release associated with the successful deployment [$($successful.Id)]"
        $release = Invoke-RestMethod "$octopusServerUrl$($successful.Links.Release)" -Headers $headers -Method Get -Verbose:$false
        $version = $release.Version
        Write-Verbose "A successful deployment of project [$projectName ($projectId)] was found in environment [$environmentName ($environmentId)]. Returning the version of the release attached to that deployment, which was [$version]"
        return $version
    }

    Write-Verbose "Finished searching through the current page of deployments for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)] without finding a successful one. Trying the next page"
    $next = $deployments.Links."Page.Next"
    if ($next -eq $null)
    {
        Write-Verbose "There are no more deployments available for project [$projectName ($projectId)] in environment [$environmentName ($environmentId)]. We're just going to return the string [latest] and hope for the best"
        return "latest"
    }
    else
    {
        $uri = "$octopusServerUrl$next"
    }
}

Its a little more code than the old one, but its much, much faster. 200 seconds (for our test project) down to something like 6 seconds, almost all of which was the result of optimizing the number of deployments that need to be interrogated to find the best release. The full test suite for our deployment scripts (which actually provisions test environments) halved its execution time too (from 60 minutes to 30), which was nice, because the tests were starting to get a bit too slow for my liking.

Summary

This was one of those cases where an issue had been present for a while, but we’d chalked it up to growth issues rather than an actual inefficiency in the code.

The most annoying part is that I did think of this specific problem when I originally wrote the script, which is why I included the winnowing parameters for environment and project. It’s pretty disappointing to see that the client did not optimize the resulting API calls by making use of those parameters, although I have a suspicion about that.

I think that if I were to run the same sort of code in .NET directly (rather than through Powershell) that the resulting call to .FindMany would actually be optimized in the way that I thought it was. I have absolutely no idea how Powershell handles the equivalent of Lambda expressions, so if it was just passing a pure Func through to the function instead of an Expression<Func>, there would be no obvious way for the client library to determine if it could use the optimized query string parameters on the API.

Still, I’m pretty happy with the end result, especially considering how easy it was to adapt the existing code to just use the HTTP API directly.

It helps that its a damn good API.

Indexes. Storing Data Since A Few Days Ago

August 9. 2016 0 Comments

Posted in:
ravendb
performance

Progress!

With what? With our performance issues with RavenDB, that’s what.

Before I get into that in too much detail, I’ll try to summarise the issue at hand to give some background information. If you’re interested, the full history is available by reading the posts in the RavenDB category.

Essentially, we are using RavenDB as the persistence layer for a service that acts as a temporary waypoint for data, allowing for the connection of two disparate systems that are only intermittently connected. Unfortunately, mostly due to inexperience with the product and possibly as a result of our workload not being a good fit for the technology, we’ve had a lot of performance problems as usage has grown.

We’re not talking about a huge uptick in usage or anything like that either. Usage growth has been slow, and we’re only now dealing with around 1100 unique customer accounts and less than 120 requests per second (on average).

In the time that its taken the service usage to grow that far, we’ve had to scale the underlying infrastructure used for the RavenDB database from an m3.medium all the way through to an r3.8xlarge, as well as incrementally providing it with higher and higher provisioned IOPS (6000 at last count) This felt somewhat ridiculous to us, but we did it in order to deal with periods of high request latency, the worst of which seemed to align with when the database process dropped large amounts of memory, and then thrashed the disk in order to refill it.

As far as we could see, we weren’t doing anything special, and we’ve pretty much reached the end of the line as far as throwing infrastructure at the problem, so we had to engage Hibernating Rhinos (the company behind RavenDB) directly to get to the root of the issue.

To their credit, they were constantly engaged and interested in solving the problem for us. We had a few false starts (upgrading from RavenDB 2.5 to RavenDB 3 surfaced an entirely different issue), but once we got a test environment up that they could access, with real traffic being replicated from our production environment, they started to make some real progress.

The current hypothesis? Because our documents are relatively large (50-100KB) and our queries frequent and distributed across the entire set of documents (because its a polling based system), RavenDB has to constantly read documents into memory in order to respond to a particular query, and then throw those documents out in order to deal with subsequent queries.

The solution? The majority of the queries causing the problem don’t actually need the whole document to return a result, just a few fields. If we use a static index that stores those fields, we take a large amount of memory churn out of the equation.

Index Fingers Are For Pointing. Coincidence?

The post I wrote talking about the complications of testing RavenDB due to its eventual consistency model talked a little bit about indexes, so if you want more information, go have a read of that.

Long story short, indexes in RavenDB are the only way to query data if you don’t know the document ID. With an index you are essentially pre-planning how you want to query your documents by marking certain fields as indexed. You can have many indexes, each using different fields to provide different search capabilities.

However, based on my current understanding, all an index does is point you towards the set of documents that are the result of the query. It can’t actually return any data itself, unless you set it up specifically for that.

In our case, we were using a special kind of index called an auto index, which just means that we didn’t write any indexes ourselves, we just let RavenDB handle it (auto indexes, if enabled, are created whenever a query is made that can’t be answered by an existing index, making the system easier to use in exchange for a reduction in control).

As I wrote above, this meant that while the query to find the matching documents was efficient, those documents still needed to be read from disk (if not already in memory), which caused a high amount of churn with memory utilization.

The suggestion from Hibernating Rhinos was to create a static index and to store the values of the fields needed for the response in the index itself.

I didn’t even know that was a thing you had to do! I assumed that a query on an index that only projected fields already in the index would be pretty close to maximum efficiency, but field values aren’t stored directly in indexes unless specifically requested. Instead the field values are tokenized in various ways to allow for efficient searching.

Creating an index for a RavenDB database using C# is trivial. All you have to do is derive from a supplied class.

public class Entity_ByFieldAAndFieldB_StoringFieldAAndFieldC : AbstractIndexCreationTask<Entity>
{
    public Entity_ByFieldAAndFieldB_StoringFieldAAndFieldC()
    {
        Map = entities =>
            from e in entities
            select new
            {
                FieldA = e.FieldA,
                FieldB = e.FieldB
            };

        Stores.Add(i => i.FieldA, FieldStorage.Yes);
        Stores.Add(i => i.FieldC, FieldStorage.Yes);
    }
}

Using the index is just as simple. Just add another generic parameter to the query that specifies which index to use.

var query = _session
    .Query<Entity, Entity_ByFieldAAndFieldB_StoringFieldAAndFieldC>()
    .Statistics(out stats)
    .Where(x => x.FieldA < "value" && x.FieldB == "other value")
    .OrderBy(x => x.FieldA)
    .Select(e => new ProjectedModel { e.FieldA, e.FieldC });

As long as all of the fields that you’re using in the projection are stored in the index, no documents need to be read into memory in order to answer the query.

The last thing you need to do is actually install the index, so I created a simple RavenDBInitializer class that gets instantiated and executed during the startup of our service (Nancy + Ninject, so it gets executed when the RavenDB document store singleton is initialized).

public class RavenDbInitializer
{
    public void Initialize(IDocumentStore store)
    {
        store.ExecuteIndex(new Entity_ByFieldAAndFieldB_StoringFieldAAndFieldC());
    }
}

All the automated tests worked as expected so the only thing left was to validate any performance improvements.

Unfortunately, that’s where it got hard.

Performance Problems

I mentioned earlier that the way in which we got Hibernating Rhinos to eventually propose this hypothesis was to give them access to a production replica with production traffic being mirrored to it. The traffic replication is setup using a neat little tool called Gor and when we set this replication up, we installed Gor on the database machine directly, because we were only interested in the traffic going to the database.

When it came time to test the optimizations outlined above, I initially thought that I would be able to use the same traffic replication to directly compare the old approach with the new static index based one.

This was not the case.

Because we were only replicating the database traffic (not the API traffic), all of the queries being replicated to the test database had no reference to the new index (when you make a query to Raven, part of the query is specifying which index should be used).

Normally I would use some set of load tests to compare the two approaches, but we’ve been completely unable to replicate enough traffic with the right shape to case performance problems on the Raven database. Its something we need to return to eventually, but I don’t have time to design and create a full load profile at this point (also considering I was the one who wrote the first one, the one that failed miserably to detect and predict any of these issues, I know how complicated and time consuming it can be).

I could see two options to test the new approach at this point:

Add traffic replication to the API instances. This would allow me to directly compare the results of two environments receiving the same traffic, and the new environment would be hitting the index correctly (because it would have a new version of the API).
Just ship it at a time when we have some traffic (which is all the time), but where if anything goes wrong, users are unlikely to notice.

I’m in the process of doing the first option, with the understanding that if I don’t get it done before this weekend that I’m just going to deploy the new API on Sunday and watch what happens.

I’ll let you know how it goes.

Conclusion

The sort of problem that Hibernating Rhinos identified is a perfect example of unintentionally creating issues with a technology simply due to lack of experience.

I had no idea that queries hitting an index and only asking for fields in the index would have to load the entire document before projecting those fields, but now that I know, I can understand why such queries could cause all manner of problems when dealing with relatively large documents.

The upside of this whole adventures is that the process of creating and managing indexes for RavenDB in C# is amazingly developer friendly. Its easily the best experience I’ve had working with a database level optimization in a long time, which was a nice change.

As I’ve repeatedly said, there is nothing inherently wrong with RavenDB as a product. Its easy to develop on and work with (especially with the latest version and its excellent built-in management studio). The problem is that someone made an arbitrary decision to use it without really taking into considering all of the factors, and the people that followed that person (who left being even seeing the project come to fruition) had no idea what was going on.

This has left me in an awkward position, because even if I fix all the problems we’ve had, reduce the amount of infrastructure required and get the whole thing humming perfectly, no-one in the organisation will ever trust the product again, especially at a business level.

Which is a shame.

Its The Most, Peer Review Time, Of The Year

August 2. 2016 0 Comments

I like to think that I consistently seek feedback from my colleagues, but the truth is that I’m more likely to focus on technical work. This is probably because I’m more inclined to work on solving an already acknowledged problem rather than try to find some new problems to work on, which is commonly the outcome of seeding feedback from your peers. It doesn’t help that it requires a specific mindset to really get good value from a peer review.

That’s why this time of year, when my organisation schedules its performance reviews, is good for me. Part of our review process is to gather feedback from a subset of our peers, so I get poked to do what I should be doing anyway, encouraging me to put down the tools for a little while and improve myself professionally with the help of the people around me.

When it comes to peer reviews, personally, I want brutal, unadulterated honesty. If I’ve gone out of my way to seek someone out to get a sense of how they feel about me professionally (or personally, as the two often overlap quite a lot), the last thing I want is for them to hide that because they are afraid of hurting my feelings or some similar sentiment. You really do need to have a good handle on your own emotions and ego before you embark on an a review with this in mind though.

In order to actually engage with people, you want to make sure that you’ve given them something to rally around and get the conversation started. In my case, I try to treat the peer review situation a lot like I would treat a retrospective for an agile team, with two relatively simply questions:

What should I keep doing from your point of view?
What should I stop doing from your point of view?

When targeted at personal and professional interactions with a colleague, these questions can lead to quite a lot of discussion and result in opportunities for improvement and growth.

Nuance

I’m a pretty blunt sort of person, as you might be able to tell from reading any of my previous posts.

For me, I want to get right at any issues that might need to be dealt with as soon as possible, taking emotion and ego out of the equation.

Not everyone is like that, and that sort of confrontational attitude can make people uncomfortable, which is the last thing you want on the table when seeking feedback. The truth of the matter is that each person is very different, and it is as much the responsibility of the person being reviewed to encourage conversation and discussion as it is the reviewer. You have to tailor your responses and comments to whoever you are interacting with, making sure not to accidentally (or on purpose) get them to close off and shut down. Nothing good will come out of that situation.

It might seem like a lot of work, when all you really want is some simple, straightforward feedback, but making sure that everyone is comfortable and open with their thoughts will get a much better result than trying to force your way through a conversation to get at the bits you deem most useful.

Hierarchy

Personally, I’m not the sort of person that puts a lot of stock in a strictly hierarchical view (organisation charts in particular bug me), mostly because I don’t like that archaic way of thinking. Bob reports to Mary reports to Jon reports to Susan just feels like a very out-dated way of thinking, where I prefer a much more collaborative approach where people rally around goals or projects rather than roles and titles. That’s not to say that you can escape from there being some people who have the responsibility (and power) to mandate a decision if the situation requires it, but this should be the exception rather than the norm.

How does this relate to peer reviews?

Well, not everyone thinks like that. Some people are quite fond of the hierarchical model for whatever reason and that can bring a whole host of issues to the process of getting honest feedback. These people are not wrong, they just have a different preference, and you need to be aware of those factors in order to ensure that you get the most out of the review process.

If someone considers themselves above you for example, you can probably count on getting somewhat honest feedback from them, because they feel like they can speak freely with no real fear of repercussion. Depending on the sort of person this can be both good and bad. If you’re really unlucky, you’ll have to engage with a psychopath who is looking to “win” the conversation and use it as some sort of powerplay. If you’re lucky, you’ll just get a good, unfiltered sense of how that person views you and your professional conduct.

The more difficult situation is when someone considers themselves to be below you, which can evoke in them a sense of fear around speaking their mind. This is a bad situation to be in, because it can be very hard to get that person to open up to you, regardless of how many times you make promises that this is a purely informal chat and that you’re interested in improving yourself, not punishing them for telling you how they feel. Again this is one of those situations where if you’ve established a positive and judgement-free relationship, you’ll probably be fine, but if the person has some deep seated issues around a hierarchical structure and your position in it, you’re going to have a bad time.

If you do identify a situation where a peer is afraid of speaking their mind due to perceived positional differences during a peer review, its probably too late. Pick someone else to get feedback from this time, and work on establishing a better relationship with that person so that you can get their feedback next time.

The Flip Side

Up to this point I’ve mostly just talked about how to get feedback from other people to improve yourself, which is only half of the story.

In order to be fair, you need to be able to be the reviewer as well, offering constructive feedback to those who request it, and, if necessary, encouraging people to seek feedback for personal growth as well. This sort of concept isn’t limited to just those in “managerial” positions either, as every team member should be working towards an environment where all members of the team are constantly looking to improve themselves via peer review.

Giving feedback requires a somewhat different set of skills to seeking it, but both have a lot of common elements (which I’ve gone through above). Really, its all about understanding the emotional and intellectual needs of the other person involved, and tailoring the communications based on that. In regards to specific feedback elements, you want to be honest, but you also want to provide a mixture of specifically addressable points alongside more general observations.

Remember, someone seeking feedback has already decided to grow as a person, so you’re just helping them zero in on the parts that are most interesting from your point of view. They are likely seeking the same thing from multiple people, so they might not even change the things you talk about, but its still a good opportunity to help them understand your point of view.

Conclusion

Seeking and giving feedback require an entirely different set of skills to solving problems with software. Both sides of the coin can very quickly end in an emotional place, even if its not intended.

From a reviewee point of view you need to be able to put ego and emotional aside and listen to what the other person has to say, while also encouraging them to say it.

From the reviewer point of view, you need to be able to speak your mind, but not alienate the person seeking feedback.

Its a delicate balancing act, but one that can generate a wealth of dividends in your professional life.

If you’re lucky, it might even make you a better person.