That Neat Little Symbol From Half-Life, Part 3
Its that time again kids, time to continue the series of posts about how we improved the processing of our ELB logs into our ELK stack using AWS Lambda.
You can find the introduction to this whole adventure here, but last time I wrote about the Javascript content of the Lambda function that does the bulk of the work.
This time I’m going to write about how we incorporated the creation of that Lambda function into our environment management strategy and some of the tricks and traps therein.
On a completely unrelated note, it would be funny if this blog post turned up in search results for Half-Life 3.
We’ve Got Hostiles!
I’ve pushed hard to codify our environment setup where I work. The main reason for this is reproducibility, but the desire comes from a long history of interacting with manually setup environments that are lorded over by one guy who just happened to know the guy who originally set them up and where everyone is terrified of changing or otherwise touching said environments.
Its a nightmare.
As far as environment management goes, I’ve written a couple of times about environment related things on this blog, one of the most recent being the way in which we version our environments. To give some more context for this post, I recommend you go and read at least the versioning post in order to get a better understanding of how we do environment management. Our strategy is still a work in process, but its getting better all the time.
Regardless of whether or not you followed my suggestion, we use a combination of versioned Nuget packages, Powershell, CloudFormation and Octopus Deploy to create an environment, where an environment is a self contained chunk of infrastructure and code that performs some sort of role, the most common of which is acting as an API. We work primarily with EC2 instances (behind Elastic Load Balancers managed via Auto Scaling Groups), and historically, we’ve deployed Logstash to each instance alongside the code to provide log aggregation (IIS, Application, System Stats, etc). When it comes to capturing and aggregating ELB logs, we use include a standalone EC2 instance in the environment, also using Logstash. This standalone instance is the part of the system that we are aiming to replace with the Lambda function.
Because we make extensive use of CloudFormation, incorporating the creation of a Lambda function into an environment that needs to have ELB logs processed is a relatively simple affair.
Simple in that it fits nicely with our current approach. Getting it all to work as expected was still a massive pain.
Blast Pit
Below is a fragment of a completed CloudFormation template for reference purposes.
In the interests of full disclosure, I did not write most of the following fragment, another member of my team was responsible. I just helped.
{ "Description": "This template is a fragment of a larger template that creates an environment. This fragment in particular contains all of the necessary bits and pieces for a Lambda function that processes ELB logs from S3.", "Parameters": { "ComponentName": { "Description": "The name of the component that this stack makes up. This is already part of the stack name, but is here so it can be used for naming/tagging purposes.", "Type": "String" }, "OctopusEnvironment": { "Description": "Octopus Environment", "Type": "String" }, "PrivateSubnets": { "Type": "List<AWS::EC2::Subnet::Id>", "Description": "Public subnets (i.e. ones that are automatically assigned public IP addresses) spread across availability zones, intended to contain load balancers and other externally accessible components.", "ConstraintDescription": "must be a list of an existing subnets in the selected Virtual Private Cloud." }, "LogsS3BucketName": { "Description": "The name of the bucket where log files for the ELB and other things will be placed.", "Type": "String" } }, "Resources": { "LogsBucket" : { "Type" : "AWS::S3::Bucket", "Properties" : { "BucketName" : { "Ref": "LogsS3BucketName" }, "LifecycleConfiguration": { "Rules": [ { "Id": 1, "ExpirationInDays": 7, "Status": "Enabled" } ] }, "Tags" : [ { "Key": "function", "Value": "log-storage" } ], "NotificationConfiguration" : { "LambdaConfigurations": [ { "Event" : "s3:ObjectCreated:*", "Function" : { "Fn::GetAtt" : [ "ELBLogProcessorFunction", "Arn" ] } } ] } } }, "ELBLogProcessorFunctionPermission": { "Type" : "AWS::Lambda::Permission", "Properties" : { "Action":"lambda:invokeFunction", "FunctionName": { "Fn::GetAtt": [ "ELBLogProcessorFunction", "Arn" ]}, "Principal": "s3.amazonaws.com", "SourceAccount": {"Ref" : "AWS::AccountId" }, "SourceArn": { "Fn::Join": [":", [ "arn","aws","s3","", "" ,{"Ref" : "LogsS3BucketName"}]] } } }, "LambdaSecurityGroup": { "Type": "AWS::EC2::SecurityGroup", "Properties": { "GroupDescription": "Enabling all outbound communications", "VpcId": { "Ref": "VpcId" }, "SecurityGroupEgress": [ { "IpProtocol": "tcp", "FromPort": "0", "ToPort": "65535", "CidrIp": "0.0.0.0/0" } ] } }, "ELBLogProcessorFunction": { "Type": "AWS::Lambda::Function", "Properties": { "FunctionName": { "Fn::Join": [ "", [ { "Ref" : "ComponentName" }, "-", { "Ref" : "OctopusEnvironment" }, "-ELBLogProcessorFunction" ] ] }, "Description": "ELB Log Processor", "Handler": "index.handler", "Runtime": "nodejs4.3", "Code": { "ZipFile": "console.log('placeholder for lambda code')" }, "Role": { "Fn::GetAtt" : ["LogsBucketAccessorRole", "Arn"]}, "VpcConfig": { "SecurityGroupIds": [{"Fn::GetAtt": ["LambdaSecurityGroup", "GroupId"]}], "SubnetIds": { "Ref": "PrivateSubnets" } } } }, "LogsBucketAccessorRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service" : ["lambda.amazonaws.com"]}, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/", "Policies": [{ "PolicyName": "access-s3-read", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": { "Fn::Join": [":", [ "arn","aws","s3","", "" ,{"Ref" : "LogsS3BucketName"}, "/*"]] } } ] } }, { "PolicyName": "access-logs-write", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogStreams" ], "Resource": { "Fn::Join": [":", [ "arn","aws","logs", { "Ref": "AWS::Region" }, {"Ref": "AWS::AccountId"} , "*", "/aws/lambda/*"]] } } ] } }, { "PolicyName": "access-ec2", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:*" ], "Resource": "arn:aws:ec2:::*" } ] } }, { "PolicyName": "access-ec2-networkinterface", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:CreateNetworkInterface", "ec2:AttachNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface", "ec2:DetachNetworkInterface", "ec2:ModifyNetworkInterfaceAttribute", "ec2:ResetNetworkInterfaceAttribute", "autoscaling:CompleteLifecycleAction" ], "Resource": "*" } ] } } ] } } } }
The most important part of the template above is the ELBLogProcessorFunction. This is where the actual Lambda function is specified, although you might notice that it does not actually have the code from the previous post attached to it in any way. The reason for this is that we create the Lambda function with placeholder code, and then use Octopus Deploy afterwards to deploy a versioned package containing the actual code to the Lambda function, like we do for everything else. Packaging and deploying the Lambda function code is a topic for another blog post though (the next one, hopefully).
Other things to note in the template fragment:
- Lambda functions require a surprising amount of permissions to do what they do. When creating a function using the AWS website, most of this complexity is dealt with for you. When using CloudFormation however, you have to be aware of what the Lambda function needs and give it the appropriate permissions. You could just give the Lambda function as many permissions as possible, but that would be stupid in the face of “least privilege”, and would represent a significant security risk (compromised Lambda code being able to do all sorts of crazy things for example).
- Logging is a particularly important example for Lambda permissions. Without the capability to create log streams, your function is going to be damn near impossible to debug.
- If you’re using S3 as the trigger for your Lambda function, you need to make sure that S3 has permissions to execute the function. This is the ELBLogProcessorFunctionPermission logical resource in the template fragment. Without this, your Lambda function will never trigger, even if you have setup the NotificationConfiguration on the bucket itself.
- If your Lambda function needs to access external resources (like S3) you will likely have to use Private Subnets + a NAT Gateway to give it that ability. Technically you could also use a proxy, but god why would you do that to yourself. If you put your Lambda function into a Public Subnet, I’m pretty sure it doesn’t automatically get access to the greater internet like an EC2 instance does, and you will be intensely confused as to why all your external calls timeout.
- Make sure you apply an appropriate Security Group to your Lambda function so that I can communicate externally, or you’ll get mysterious timeouts that look exactly the same as ones you get when you haven’t setup general internet access correctly.
To Be Continued
So that’s how we setup the Lambda function as part of any environment that needs to process ELB logs. Remember, the template fragment above is incomplete, and is missing Auto Scaling Groups, Launch Configurations, Load Balancers, Host Records and a multitude of other things that make up an actual environment. What I’ve shown above is enough to get the pipeline up and running, where any object introduced into the LogsBucket will trigger an execution of the Lambda function, so its enough to illustrate our approach.
Of course, the function doesn’t do anything yet, which ties in with the next post.
How we get the code into Lambda.
Until then, may all your Lambda function executions be swift and free of errors.