Real Time Processing of Log File with Kinesis

This is a cool exercise to understand how Kinesis works.  It’ll have you create a Kinesis stream and populate it with data from a CloudWatch logs group. You’ll learn how to read and write from a Kinesis stream and get CLI exposure.

The idea is if you have a lot of servers sending different logs to CWLogs you can put them all under one single Kinesis a data stream. Makes for easy Analysis. You can also backup to S3. You can have multiple consumers of the stream

    • Use an EC2 instance that is sending logs to Cloud Watch Logs.
    • Make sure that it has a role with  Admin access because we’ll be using running cli commands from this instance to create resources
    • Set cli to your region: aws configure

 

  • Manually create a policy with JSON to allow CWLogs to write to Kinesis. This will be added to the as the access Policy

 

[root@ip-172-31-60-70 awslogs]# cat permissions.json 

{
 "Statement": [
  {
    "Effect":"Allow",
    "Action":"kinesis:PutRecord",
    "Resource":""
  },
  {
    "Effect":"Allow",
    "Action":"iam:PassRole",
    "Resource":""
  }
 ]
}

NOTE: We’ll have to fill the blanks after we get the ARNS.
  • Manually create the JSON for the Role we’ll create.  This just defines the Trust Policy. After Role is created we’ll add the Access Policy above.
[root@ip-172-31-60-70 awslogs]# cat role.json 
{
"Statement":{
"Effect":"Allow",
"Principal":{"Service":"logs.us-east-1.amazonaws.com"},
"Action":"sts:AssumeRole"
}
}

NOTE: This address is the endpoint for CWLogs in US East. http://docs.aws.amazon.com/general/latest/gr/rande.html#cwl_region

 

  • Create a stream in Kinesis using CLI and record the ARN then get the ARN

# aws kinesis create-stream –stream-name MyStream -a –shard-count 1

# aws kinesis describe-stream –stream-name MyStream -a

 

NOTE: Once stream goes into active state you’ll see the shard-id.  Each shard can handle 1MB/S input/Write and 2MB/S Output/Read

 

  • Create the Role using the CLI and the JSON role file from above

# aws iam create-role –role-name “cloudwatch-to-kinesis” –assume-role-policy-document file://role.json

 

  • Add the access policy to the role, but first remember to enter the resource values that had been left blank.

# aws iam put-role-policy –role-name cloudwatch-to-kinesis –policy-name permissions-for-kinesis –policy-document file://permissions.json

 

  • Add a subscription filter on the log group to forward logs to Kinesis.  Then list it.
    # aws logs put-subscription-filter –log-group-name instance –filter-name “central-logs” –filter-pattern “” –destination-arn arn:aws:kinesis:us-east-1:971975854771:stream/MyStream –role-arn arn:aws:iam::971975854771:role/cloudwatch-to-kinesis

 

# aws logs describe-subscription-filters –log-group-name instance

 

At this point you should be receiving data in Kinesis.  You can confirm by viewing the stream in the console.

 

  • Use crontab to generate http logs.  These will go to CWL then to our K.Stream
    */1 * * * curl <ip>

 

  • Confirm that data is being added, you can see in dashboard and you can run command to read data from the shard.  This process consists of three steps: first get a shard-iterator.  This tells you from where in the shard to start reading,  Second submit a record read request using the shard-iterator, Third decode and decompress the base64 result.

 

# aws kinesis get-shard-iterator –stream-name MyStream –shard-id shardId-000000000000 –shard-iterator-type TRIM_HORIZON

output

{

    "ShardIterator": "AAAAAAAAAAGQUDQQxos+xAd9kfRf/iznqZPiyS7ESPf7+6HB4xyT3G9KC3lC/HLE3ETPypn7DpkqdpVtmUBWd+Vz+eRTRP83+fk8uM8FfzhfntKw3scpBnJf/fHWusB47Z2Ha5whjKu44HeBsacM0mSvhoC37tF1Ib0RsyQkr+S7k+Q3SL10xiHKbx7gZu8vGYn8SGZG9A4SQDgQsNXNe6VRz42Evhb2"

}

 

NOTE: This shard iterator is a base64 encoded ID.

 

# aws kinesis get-records –shard-iterator “AAAAAAAAAAEhLMyQ7F0DbVkjfedDrkerDKinMnriGcgfMYEv19+egOsCUrF0WIagSnIdN/MRvz+CerfGHYav/AcvjJeZi0VUQwHxOpGAQlH164egQlwVXc9iZpRtUWv7mWldKiMnvQIF+ltV9p47DW19xrFuczLCqkPw1cCXkXiDjXRHU6Us+vCkGyQPMlhWV81SRlniFy7UfW60SicprAn8SsVZkjQt”

Output

{
    "Records": [
            "Data": "H4sIAAAAAAAAALWTTYvbMBCG/4rwqYXYlmR9WIYewm66hVLabtJTCEFrK4moIxlJSXZZ8t93kqbQnpM9Cd6ZeWeeEfOabU2Mem1mL4PJmux+PBsvv02m0/HDJBtl/uBMAFlJoiSvOZOSgNz79UPwuwEi1sWkXWv+qNMUjN6CXO51KEEoL/YR4nH3FNtgh2S9+2z7ZELMmnnWGpeC7nPIjtnibDPZg3YKvma2A7eqElzySiiqhKipYrLiglEiGeeMciJrgjnmhFWUU6ZgUIW5EAyaJgsTJL2FWQnHinJBOMMYj/6Cg/33NiGqEOYNUw1hyA45kTSvSC5wLjHqNm1vYaI5xUItGnT/5e7H4+Tnr8l0hrxDJm0wSh5BTVGRgtUFQYMPCQmJPjzb7hN+rlgNvVf0Y3YcXcfE35FpfPcVrYLf/kdycwLxTgRPfue6f/5B4AKy8hwF48xB98g6RLgiKJrWuy4WV5PIG5GYljqTGjRfm7TcmqQXaBZerFufaEBDm5SGpiyJUAXlrLi8Za8T9C5PFXmnky7B5eDD79I6OK6Vbk0st7qNJaHNCjdd29CqqWUjoNS3cHJ22LN49RrqW68hmEOwySx1b3U0cYEez8JpIRcJ+dX57LLj4vgGBXKeRsMEAAA=", 

            "PartitionKey": "64621dbbbe9cf76cb5f0b930c2a974c7", 
            "ApproximateArrivalTimestamp": 1509256160.443, 
            "SequenceNumber": "49578368888725264053659947269742594258597963255083171842"
        }, 
….
….

 

Now you can decode one of the base64 records to read its result

# echo -n “H4sIAAAAAAAAADWPy2rDMBBFf0VoHdcjy3p5Z6ibVejCWRRCKIojEoEtG0lJKSH/3knaMqDFueLOmRudXEr25Lbfi6MNfW237eem6/t23dEVnb+Ci4iNYkYJLWqlGOJxPq3jfFkw8SFlGwb3S/scnZ0Ql1cbSwTlX33CPF0OaYh+yX4Ob37MLiba7OjgQo52LPB3ovtnTXdF9ghv1B+xjXMplOCsrrnU9cMEhUBrzWr+fDiT0qhKSgGghXmMBg24NHs0yHZCVybAVEIYDABW/4dj/fuQSWUIiKYWDWjil4KpquCskFAoIMfzMHo02lVMqX1DPjbbhvTz6AefyRyIy2dYER/woqsdCeMMKpjSC73v7z8JrwqRYQEAAA==”|base64 -d | zcat

 

NOTE: This is a simple reading example.  We don’t need to know how to build a consumer, this is huge job

 

  • You can view the ‘Put Records Request’ graph in Kinesis to confirm how the data is being written as time goes on.

 

  • Clean up.
    • Remove instance
    • Remove subscription filter
    • Remove logroup
    • Remove kinesis stream

 

Based on: http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#DestinationKinesisExample

Integrating CloudWatch Logging with EC2

This exercise shows you how to configure an EC2 instance to forward logs to CloudWatch.  This will be done by installing the awslogs agent on the instance.

  • Create a Role of EC2 type for CWLogs using the standard CWLogs Full Access policy
  • Launch an EC2 instance using this Role
  • Install the aws loggin agent: yum instal awslogs
  • View the agent’s config file: /etc/awslogs/awscli.conf.  This has the region you’re pointing to
  • Edit the agent’s config file to use an appropriate log group name
[/var/log/messages]
datetime_format = %b %d %H:%M:%S
file = /var/log/messages
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = hostname

 

    • Start the agent: service awslogs start
    • Open CWLogs dashboard and view the log group created by this agent.
    • Install Apache: yum install httpd
    • Start apache: service httpd start
  • Edit the agent’s config file to forward httpd logs. Add this entry
[/var/log/httpd]
datetime_format = %b %d %H:%M:%S
file = /var/log/httpd/access_log
buffer_duration = 5000
log_stream_name = /var/log/httpd/access_log
initial_position = start_of_file
log_group_name = hostname

 

Open web browser to the IP. This will generate logs

  • Go checkout the loggroup in CWL 🙂

Auto Scaling based on SQS Message Queue Size

Auto Scaling based on SQS Message Queue Size

This exercise helps you learn how to keep up with demands on an SQS queue.  If your queue is growing too big because the worker instances aren’t able to complete the tasks fast enough then you can add more instances.  You can leverage an AutoScaling group to do this automatically based on the size of your queue.

  • Logging to AWS console and choose a region to work in.  I typically select us-east-1
  • Create an SQS queue
  • Launch an EC2 instance with a Role that will allow it to send messages to SQS
  • Login to the EC2 instance and configure the AWS CLI client to point to your selected region
  • Send a message to the SQS queue, this populates metrics in CloudWatch :
    • $ aws sqs send-message –queue-url xxx –message-body message1234
  • Connect to CloudWatch and  create an alarm for when the SQS metric “ApproximateNumberOfMessagesVisible” reaches 40. Note that this metric got created automatically for the sqs queue. You have to wait a few mins to have it show. Set the period=1min and delete Actions
  • Create a Launch Configuration and Auto Scaling Group in EC2.
    • Policy: Scale up by 1 when our alarm is triggered
  • Now proceed to send 40 messages to the SQS queue
  • You should see the alarm trigger and see EC2 instance get launched