Do They Know It’s Christmas
No post today.
Holidays.
No post today.
Holidays.
Over the last few weeks, I’ve been sharing the bits and pieces that went into our construction of an ELB logs processing pipeline using AWS Lambda.
As I said in the introduction, the body of work around the processor can be broken down into three pieces:
I’ve gone through each of these components in detail in the posts I’ve linked above, so this last post is really just to tie it all together and reflect on the process, as well as provide a place to mention a few things that didn’t neatly fit into any of the buckets above.
Also, I’ll obviously be continuing the pattern of naming every sub-header after a chapter title in the original Half-Life, almost all of which have proven to be surprisingly apt for whatever topic is being discussed. I mean seriously, look at this next one? How is that not perfectly fitting for a conclusion/summary post.
It took a fair amount of effort for us to get to the solution we have in place now, and a decent amount of time. The whole thing was put together over the course of a few weeks by one of the people I work with, with some guidance and feedback from other members of the team from time to time. This timeframe was to develop, test and then deploy the solution into a real production environment, by a person with little to no working knowledge of the AWS toolset, so I think it was a damn good effort.
The most time consuming part was the long turnaround on environment builds, because each build needs to run a suite of tests which involve creating and destroying at least one environment, sometimes more. In reality, this means a wait time or something like 30-60 minutes per build, which is so close to eternity as to be effectively indistinguishable from it. I’ll definitely have to come up with some sort of way to tighten this feedback loop, but being that most of it is actually just waiting for AWS resources, I’m not really sure what I can do.
The hardest part of the whole process was probably just working with Lambda for the first time outside of the AWS management website.
As a team, we’d used Lambda before (back when I tried to make something to clone S3 buckets more quickly), but we’d never tried to manage the various Lambda bits and pieces through CloudFormation.
It turns out that the AWS website does a hell of a lot of things in order to make sure that your Lambda function runs, including dealing with profiles and permissions, network interfaces, listeners and so on. Having to do all of that explicitly through CloudFormation was something of a learning process.
Speaking of CloudFormation and Lambda, we ran into a nasty bug with Elastic Network Interfaces and VPC hosted Lambda functions created through CloudFormation, where the CloudFormation stack doesn’t delete cleanly because the ENI is still in use. It looks like its a known issue, so I assume it will be fixed at some point in the future, but as a result we had to include some additional cleanup in the Powershell that wraps our environment management to check the stack for Lambda functions and manually remove and delete the ENI before we try to delete the stack.
This isn’t the first time we’ve had to manually cleanup resources “managed” by CloudFormation. We do the same thing with S3 buckets because CloudFormation won’t delete a bucket with anything in it (and some of our buckets, like the ELB logs ones, are constantly being written to by other AWS services).
The only other difficult part of the whole thing I’ve already mentioned in the deployment post, which was figuring out how we could incorporate non-machine based Octopus deployments into our environments. For now they just happen after the actual AWS stack is created (as part of the Powershell scripts wrapping the entire process) and rely on having an Octopus tentacle registered in each environment on the Octopus Server machine, used as a script execution point.
Having put this whole system in place, the obvious question is “Was it worth it?”.
For me, the answer is “definitely”.
We’ve managed to retire a few hacky components (a Windows service running Powershell scripts via NSSM to download files from an S3 bucket, for example) and removed an entire machine from every environment that needs to process ELB logs. Its not often that you get to reduce both running and maintenance costs in one blow, so it was nice to get that accomplished.
Ignoring the reduced costs to the business for a second, we’ve also decreased the latency for receiving our ELB logs for analysis because rather than relying on a polling system, we’re now triggering the processing directly when the ELB writes the log file into S3.
Finally, we’ve gained some more experience with systems and services that we haven’t really had a chance to look into, allowing us to leverage that knowledge and tooling for other, potentially more valuable purposes.
All in all, I consider this exercise a resounding success, and I’m happy I was able to dedicate some time to improving an existing process, even though it was already “working”.
Improving existing engineering like this is incredibly valuable to the morale of a development time, which is an important and limited resource.
Another week, another piece of the puzzle.
This time around, I’ll go through how we’ve setup the build and deployment pipeline for the Lambda function code that processes the ELB log files. Its not a particularly complex system, but it was something completely different compared to things we’ve done in the past.
In general terms, building software is relatively straightforward, as long as you have two things:
With at least those two pieces in place (preferably fully automated), your builds are taken care of. Granted, the accomplishing those things is not as simple as two dot points would lead you to believe, but conceptually there is not a lot going on.
Once you have versioned packages that can be reasoned about, all you need to do is figure out how to deliver them to the appropriate places. Again, ideally the whole process is automated. You shouldn’t have to remote onto a machine and manually copy files around, as that hardly ever ends well. This can get quite complicated depending on exactly what it is that you are deploying: on-premises software can be notoriously painful to deploy without some supporting system, but deploying static page websites to S3 is ridiculously easy.
My team uses TeamCity, Nuget and Powershell to accomplish the build part and Octopus Deploy to deploy almost all of our code, and I don’t plan on changing any of that particularly soon.
Some people seem to think that because its so easy to manage Lambda functions from the AWS management website that they don’t have to have a build and deployment pipeline. After all, just paste the code into the website and you’re good to go, right?
I disagree vehemently.
Our ELB logs processor Lambda function follows our normal repository structure, just like any other piece of software we write.
The code for the Lambda function goes into the /src folder, along with a Nuspec file that describes how to construct the resulting versioned package at build time.
Inside a /scripts folder is a build script, written in Powershell, containing all of the logic necessary to build and verify a deployable package. It mostly just leverages a library of common functions (so our builds are consistent), and its goal is to facilitate all of the parts of the pipeline, like compilation (ha Javascript), versioning, testing, packaging and deployment.
Some build systems are completely encapsulated inside the software that does the build, like Jenkins or TeamCity. I don’t like this approach, because you can’t easily run/debug the build on a different machine for any reason. I much prefer the model where the repository has all of the knowledge necessary to do the entire build and deployment process, barring the bits that it needs to accomplish via an external system.
The build script the ELB logs processor function is included below, but keep in mind, this is just the entry point, and a lot of the bits and pieces are inside the common functions that you can see referenced.
[CmdletBinding()] param ( [switch]$deploy, [string]$octopusServerUrl, [string]$octopusApiKey, [string]$component, [string]$commaSeparatedDeploymentEnvironments, [string[]]$projects, [int]$buildNumber, [switch]$prerelease, [string]$prereleaseTag ) $error.Clear() $ErrorActionPreference = "Stop" $here = Split-Path $script:MyInvocation.MyCommand.Path . "$here\_Find-RootDirectory.ps1" $rootDirectory = Find-RootDirectory $here $rootDirectoryPath = $rootDirectory.FullName . "$rootDirectoryPath\scripts\Invoke-Bootstrap.ps1" . "$rootDirectoryPath\scripts\common\Functions-Build.ps1" $arguments = @{} $arguments.Add("Deploy", $deploy) $arguments.Add("CommaSeparatedDeploymentEnvironments", $commaSeparatedDeploymentEnvironments) $arguments.Add("OctopusServerUrl", $octopusServerUrl) $arguments.Add("OctopusServerApiKey", $octopusApiKey) $arguments.Add("Projects", $projects) $arguments.Add("VersionStrategy", "SemVerWithPatchFilledAutomaticallyWithBuildNumber") $arguments.Add("buildNumber", $buildNumber) $arguments.Add("Prerelease", $prerelease) $arguments.Add("PrereleaseTag", $prereleaseTag) $arguments.Add("BuildEngineName", "nuget") Build-DeployableComponent @arguments
I’m pretty sure I’ve talked about our build process and common scripts before, so I’m not going to go into any more detail.
Prior to deployment, the only interesting output is the versioned Nuget file, containing all of the dependencies necessary to run the Lambda function (which in our case is just the file in my previous post).
When it comes to deploying the Lambda function code, its a little bit more complicated that our standard software deployments via Octopus Deploy.
In most cases, we create a versioned package containing all of the necessary logic to deploy the software, so in the case of an API it contains a deploy.ps1 script that gets run automatically during deployment, responsible for creating a website and configuring IIS on the local machine. The most important thing Octopus does for us to provide mechanisms to get this package to the place where it needs to be.
It usually does this via an Octopus Tentacle, a service that runs on the deployment target and allows for communication between the Octopus server and the machine in question.
This system kind of falls apart when you’re trying to deploy to an AWS Lambda function, which cannot have a tentacle installed on it.
Instead, we rely on the AWS API and what amounts to a worker machine sitting in each environment.
When we do a deployment of our Lambda function project, it gets copied to the worker machine (which is actually just the Octopus server) and it runs the deployment script backed into the package. This script then uses Octopus variables to package the code again (in a way that AWS likes, a simple zip file) and uses the AWS API to upload the changed code to the appropriate Lambda function (by following a naming convention).
The deployment script is pretty straightforward:
function Get-OctopusParameter { [CmdletBinding()] param ( [string]$key ) if ($OctopusParameters -eq $null) { throw "No variable called OctopusParameters is available. This script should be executed as part of an Octopus deployment." } if (-not($OctopusParameters.ContainsKey($key))) { throw "The key [$key] could not be found in the set of OctopusParameters." } return $OctopusParameters[$key] } $VerbosePreference = "Continue" $ErrorActionPreference = "Stop" $here = Split-Path -Parent $MyInvocation.MyCommand.Path . "$here\_Find-RootDirectory.ps1" $rootDirectory = Find-RootDirectory $here $rootDirectoryPath = $rootDirectory.FullName $awsKey = $OctopusParameters["AWS.Deployment.Key"] $awsSecret = $OctopusParameters["AWS.Deployment.Secret"] $awsRegion = $OctopusParameters["AWS.Deployment.Region"] $environment = $OctopusParameters["Octopus.Environment.Name"] $version = $OctopusParameters["Octopus.Release.Number"] . "$rootDirectoryPath\scripts\common\Functions-Aws.ps1" $aws = Get-AwsCliExecutablePath $env:AWS_ACCESS_KEY_ID = $awsKey $env:AWS_SECRET_ACCESS_KEY = $awsSecret $env:AWS_DEFAULT_REGION = $awsRegion $functionPath = "$here\src\function" Write-Verbose "Compressing lambda code file" Add-Type -AssemblyName System.IO.Compression.FileSystem [system.io.compression.zipfile]::CreateFromDirectory($functionPath, "index.zip") Write-Verbose "Updating Log Processor lambda function to version [$environment/$version]" (& $aws lambda update-function-code --function-name $environment-ELBLogProcessorFunction --zip-file fileb://index.zip) | Write-Verbose
Nothing fancy, just using the AWS CLI to deploy code to a known function.
AWS Lambda does provide some other mechanisms to deploy code, and we probably could have used them to accomplish the same sort of thing, but I like our patterns to stay consistent and I’m a big fan of the functionality that Octopus Deploy provides, so I didn’t want to give that up.
We had to make a few changes to our environment pattern to allow for non-machine based deployment, like:
Once all of that was in place, it was pretty easy to deploy code to a Lambda function as part of setting up the environment, just like we would deploy code to an EC2 instance. It was one of those cases where I’m glad we wrap our usage of CloudFormation in Powershell, because if we were just using raw CloudFormation, I’m not sure how we would have integrated the usage of Octopus Deploy.
I’ve only got one more post left in this series, which will summarise the entire thing and link back to all the constituent parts.
Until then, I don’t really have anything else to say.
Its that time again kids, time to continue the series of posts about how we improved the processing of our ELB logs into our ELK stack using AWS Lambda.
You can find the introduction to this whole adventure here, but last time I wrote about the Javascript content of the Lambda function that does the bulk of the work.
This time I’m going to write about how we incorporated the creation of that Lambda function into our environment management strategy and some of the tricks and traps therein.
On a completely unrelated note, it would be funny if this blog post turned up in search results for Half-Life 3.
I’ve pushed hard to codify our environment setup where I work. The main reason for this is reproducibility, but the desire comes from a long history of interacting with manually setup environments that are lorded over by one guy who just happened to know the guy who originally set them up and where everyone is terrified of changing or otherwise touching said environments.
Its a nightmare.
As far as environment management goes, I’ve written a couple of times about environment related things on this blog, one of the most recent being the way in which we version our environments. To give some more context for this post, I recommend you go and read at least the versioning post in order to get a better understanding of how we do environment management. Our strategy is still a work in process, but its getting better all the time.
Regardless of whether or not you followed my suggestion, we use a combination of versioned Nuget packages, Powershell, CloudFormation and Octopus Deploy to create an environment, where an environment is a self contained chunk of infrastructure and code that performs some sort of role, the most common of which is acting as an API. We work primarily with EC2 instances (behind Elastic Load Balancers managed via Auto Scaling Groups), and historically, we’ve deployed Logstash to each instance alongside the code to provide log aggregation (IIS, Application, System Stats, etc). When it comes to capturing and aggregating ELB logs, we use include a standalone EC2 instance in the environment, also using Logstash. This standalone instance is the part of the system that we are aiming to replace with the Lambda function.
Because we make extensive use of CloudFormation, incorporating the creation of a Lambda function into an environment that needs to have ELB logs processed is a relatively simple affair.
Simple in that it fits nicely with our current approach. Getting it all to work as expected was still a massive pain.
Below is a fragment of a completed CloudFormation template for reference purposes.
In the interests of full disclosure, I did not write most of the following fragment, another member of my team was responsible. I just helped.
{ "Description": "This template is a fragment of a larger template that creates an environment. This fragment in particular contains all of the necessary bits and pieces for a Lambda function that processes ELB logs from S3.", "Parameters": { "ComponentName": { "Description": "The name of the component that this stack makes up. This is already part of the stack name, but is here so it can be used for naming/tagging purposes.", "Type": "String" }, "OctopusEnvironment": { "Description": "Octopus Environment", "Type": "String" }, "PrivateSubnets": { "Type": "List<AWS::EC2::Subnet::Id>", "Description": "Public subnets (i.e. ones that are automatically assigned public IP addresses) spread across availability zones, intended to contain load balancers and other externally accessible components.", "ConstraintDescription": "must be a list of an existing subnets in the selected Virtual Private Cloud." }, "LogsS3BucketName": { "Description": "The name of the bucket where log files for the ELB and other things will be placed.", "Type": "String" } }, "Resources": { "LogsBucket" : { "Type" : "AWS::S3::Bucket", "Properties" : { "BucketName" : { "Ref": "LogsS3BucketName" }, "LifecycleConfiguration": { "Rules": [ { "Id": 1, "ExpirationInDays": 7, "Status": "Enabled" } ] }, "Tags" : [ { "Key": "function", "Value": "log-storage" } ], "NotificationConfiguration" : { "LambdaConfigurations": [ { "Event" : "s3:ObjectCreated:*", "Function" : { "Fn::GetAtt" : [ "ELBLogProcessorFunction", "Arn" ] } } ] } } }, "ELBLogProcessorFunctionPermission": { "Type" : "AWS::Lambda::Permission", "Properties" : { "Action":"lambda:invokeFunction", "FunctionName": { "Fn::GetAtt": [ "ELBLogProcessorFunction", "Arn" ]}, "Principal": "s3.amazonaws.com", "SourceAccount": {"Ref" : "AWS::AccountId" }, "SourceArn": { "Fn::Join": [":", [ "arn","aws","s3","", "" ,{"Ref" : "LogsS3BucketName"}]] } } }, "LambdaSecurityGroup": { "Type": "AWS::EC2::SecurityGroup", "Properties": { "GroupDescription": "Enabling all outbound communications", "VpcId": { "Ref": "VpcId" }, "SecurityGroupEgress": [ { "IpProtocol": "tcp", "FromPort": "0", "ToPort": "65535", "CidrIp": "0.0.0.0/0" } ] } }, "ELBLogProcessorFunction": { "Type": "AWS::Lambda::Function", "Properties": { "FunctionName": { "Fn::Join": [ "", [ { "Ref" : "ComponentName" }, "-", { "Ref" : "OctopusEnvironment" }, "-ELBLogProcessorFunction" ] ] }, "Description": "ELB Log Processor", "Handler": "index.handler", "Runtime": "nodejs4.3", "Code": { "ZipFile": "console.log('placeholder for lambda code')" }, "Role": { "Fn::GetAtt" : ["LogsBucketAccessorRole", "Arn"]}, "VpcConfig": { "SecurityGroupIds": [{"Fn::GetAtt": ["LambdaSecurityGroup", "GroupId"]}], "SubnetIds": { "Ref": "PrivateSubnets" } } } }, "LogsBucketAccessorRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service" : ["lambda.amazonaws.com"]}, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/", "Policies": [{ "PolicyName": "access-s3-read", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": { "Fn::Join": [":", [ "arn","aws","s3","", "" ,{"Ref" : "LogsS3BucketName"}, "/*"]] } } ] } }, { "PolicyName": "access-logs-write", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogStreams" ], "Resource": { "Fn::Join": [":", [ "arn","aws","logs", { "Ref": "AWS::Region" }, {"Ref": "AWS::AccountId"} , "*", "/aws/lambda/*"]] } } ] } }, { "PolicyName": "access-ec2", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:*" ], "Resource": "arn:aws:ec2:::*" } ] } }, { "PolicyName": "access-ec2-networkinterface", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:CreateNetworkInterface", "ec2:AttachNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface", "ec2:DetachNetworkInterface", "ec2:ModifyNetworkInterfaceAttribute", "ec2:ResetNetworkInterfaceAttribute", "autoscaling:CompleteLifecycleAction" ], "Resource": "*" } ] } } ] } } } }
The most important part of the template above is the ELBLogProcessorFunction. This is where the actual Lambda function is specified, although you might notice that it does not actually have the code from the previous post attached to it in any way. The reason for this is that we create the Lambda function with placeholder code, and then use Octopus Deploy afterwards to deploy a versioned package containing the actual code to the Lambda function, like we do for everything else. Packaging and deploying the Lambda function code is a topic for another blog post though (the next one, hopefully).
Other things to note in the template fragment:
So that’s how we setup the Lambda function as part of any environment that needs to process ELB logs. Remember, the template fragment above is incomplete, and is missing Auto Scaling Groups, Launch Configurations, Load Balancers, Host Records and a multitude of other things that make up an actual environment. What I’ve shown above is enough to get the pipeline up and running, where any object introduced into the LogsBucket will trigger an execution of the Lambda function, so its enough to illustrate our approach.
Of course, the function doesn’t do anything yet, which ties in with the next post.
How we get the code into Lambda.
Until then, may all your Lambda function executions be swift and free of errors.
Now that I am feeling less like dying horribly from a terrible plague, its time to continue to talk about processing ELB logs into an ELK stack via AWS Lambda.
To summarise the plan, the new log processor system needs 3 main parts:
I’m basically going to do a short blog post on each of those pieces of work, and maybe one at the end to tie it all together.
With that in mind, lets talk Javascript.
When you’re writing Lambda functions, Javascript (via Node.js) is probably your best bet. Sure you can run Java or Python (and maybe one day C# using .NET Core), but Javascript is almost certainly going to be the easiest. Its what we chose we when put together the faster S3 clone prototype, and while the fundamentally asynchronous nature of Node.js took some getting used to, it worked well enough.
When it comes to Javascript, I don’t personally write a lot of it. If I’m writing a server side component, I’m probably going to pick C# as my language of choice (with all its fancy features like “a compiler” and “type safety”) and I don’t find myself writing things like websites or small Javascript applications very often, if at all. My team definitely writes websites though, and we use React.js to do it, so its not like Javascript is an entirely foreign concept.
For the purposes of reading in and parsing an ELB log file via a Lambda function, we didn’t need a particularly complex piece of Javascript. Something that reads the specified file from S3 after the Lambda function triggers, something to process the contents of that file line by line, something to parse and format those lines in a way that a Logstash input wll accept, and something to push that JSON payload to the Logstash listener over raw TCP.
Without further ado, I give you the completed script:
'use strict'; let aws = require('aws-sdk'); let s3 = new aws.S3({ apiVersion: '2006-03-01' }); let readline = require('readline'); let net = require('net'); const _type = 'logs'; const _sourceModuleName = 'ELB'; const _logHost = '#{LogHost}' const _logPort = #{LogPort} const _environment = '#{Octopus.Environment.Name}' const _component = '#{Component}' const _application = '#{Application}' function postToLogstash(entry){ console.log("INFO: Posting event to logstash..."); var socket = net.createConnection(_logPort, _logHost); var message = JSON.stringify(entry) + "\n"; socket.write(message); socket.end(); console.log("INFO: Posting to logstash...done"); } exports.handler = (event, context, callback) => { console.log('INFO: Retrieving log from S3 bucket...'); const bucket = event.Records[0].s3.bucket.name; const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' ')); const params = { Bucket: bucket, Key: key }; const reader = readline.createInterface({ input: s3.getObject(params).createReadStream() }); const expectedColumns = 12; reader.on('line', function(line) { console.log("INFO: Parsing S3 line entry..."); const columns = line.split(/ (?=(?:(?:[^"]*"){2})*[^"]*$)/); if(columns.length >= expectedColumns){ var entry = { EventReceivedTime: columns[0], LoadBalancerName: columns[1], PublicIpAndPort: columns[2], InternalIpAndPort: columns[3], TimeToForwardRequest: parseFloat(columns[4]) * 1000, TimeTaken: parseFloat(columns[5]) * 1000, TimeToForwardResponse: parseFloat(columns[6]) * 1000, Status: columns[7], BackendStatus: columns[8], BytesUploadedFromClient: parseInt(columns[9]), BytesDownloadedByClient: parseInt(columns[10]), FullRequest: columns[11], Component: _component, SourceModuleName: _sourceModuleName, Environment: _environment, Application: _application, Type: _type }; postToLogstash(entry); } else { console.log("ERROR: Invalid record length was expecting " + expectedColumns.length + " but found " + columns.length); console.log('ERROR: -------'); console.log(line); console.log('ERROR: -------'); } }); };
Nothing to fancy.
In the interest of full disclosure, I did not write the script above. It was written by a few guys from my team initially as a proof of concept, then improved/hardened as a more controlled piece of work.
You might notice some strange variable names at the top of the script (i.e. #{Octopus.Environment.Name}).
We use Octopus deploy for all of our deployments, and this includes the deployment of Lambda functions (which we package via Nuget and then deploy via the AWS Powershell Cmdlets/CLI inside Octopus). The #{NAME} notation is a way for Octopus to substitute variable values into files during deployment. This substitution is very useful, and can be scoped via a variety of things (like Environment, Machine, Role, etc), so by the time the script actually gets into AWS those variables are filled in with actual values.
Other than our use of Octopus variables, other things to note in this piece of Javascript are:
That’s it for this week. The Javascript I’ve included above is pretty generic (apart from the specific set of fields that we like to have in our log events) and will successfully process an ELB log file from S3 to a Logstash instance listening on a port of your choosing (probably 6379) when used in a Lambda function. Feel free to reuse it for your own purposes.
Next week I’ll continue this series of posts with information about how we use CloudFormation to setup our Lambda function as part of one of our environment definitions.
CloudFormation and Lambda aren’t the best of friends yet, so there is some interesting stuff that you have to be aware of.