Real Time Processing of Log File with Kinesis

This is a cool exercise to understand how Kinesis works.  It’ll have you create a Kinesis stream and populate it with data from a CloudWatch logs group. You’ll learn how to read and write from a Kinesis stream and get CLI exposure.

The idea is if you have a lot of servers sending different logs to CWLogs you can put them all under one single Kinesis a data stream. Makes for easy Analysis. You can also backup to S3. You can have multiple consumers of the stream

    • Use an EC2 instance that is sending logs to Cloud Watch Logs.
    • Make sure that it has a role with  Admin access because we’ll be using running cli commands from this instance to create resources
    • Set cli to your region: aws configure

 

  • Manually create a policy with JSON to allow CWLogs to write to Kinesis. This will be added to the as the access Policy

 

[root@ip-172-31-60-70 awslogs]# cat permissions.json 

{
 "Statement": [
  {
    "Effect":"Allow",
    "Action":"kinesis:PutRecord",
    "Resource":""
  },
  {
    "Effect":"Allow",
    "Action":"iam:PassRole",
    "Resource":""
  }
 ]
}

NOTE: We’ll have to fill the blanks after we get the ARNS.
  • Manually create the JSON for the Role we’ll create.  This just defines the Trust Policy. After Role is created we’ll add the Access Policy above.
[root@ip-172-31-60-70 awslogs]# cat role.json 
{
"Statement":{
"Effect":"Allow",
"Principal":{"Service":"logs.us-east-1.amazonaws.com"},
"Action":"sts:AssumeRole"
}
}

NOTE: This address is the endpoint for CWLogs in US East. http://docs.aws.amazon.com/general/latest/gr/rande.html#cwl_region

 

  • Create a stream in Kinesis using CLI and record the ARN then get the ARN

# aws kinesis create-stream –stream-name MyStream -a –shard-count 1

# aws kinesis describe-stream –stream-name MyStream -a

 

NOTE: Once stream goes into active state you’ll see the shard-id.  Each shard can handle 1MB/S input/Write and 2MB/S Output/Read

 

  • Create the Role using the CLI and the JSON role file from above

# aws iam create-role –role-name “cloudwatch-to-kinesis” –assume-role-policy-document file://role.json

 

  • Add the access policy to the role, but first remember to enter the resource values that had been left blank.

# aws iam put-role-policy –role-name cloudwatch-to-kinesis –policy-name permissions-for-kinesis –policy-document file://permissions.json

 

  • Add a subscription filter on the log group to forward logs to Kinesis.  Then list it.
    # aws logs put-subscription-filter –log-group-name instance –filter-name “central-logs” –filter-pattern “” –destination-arn arn:aws:kinesis:us-east-1:971975854771:stream/MyStream –role-arn arn:aws:iam::971975854771:role/cloudwatch-to-kinesis

 

# aws logs describe-subscription-filters –log-group-name instance

 

At this point you should be receiving data in Kinesis.  You can confirm by viewing the stream in the console.

 

  • Use crontab to generate http logs.  These will go to CWL then to our K.Stream
    */1 * * * curl <ip>

 

  • Confirm that data is being added, you can see in dashboard and you can run command to read data from the shard.  This process consists of three steps: first get a shard-iterator.  This tells you from where in the shard to start reading,  Second submit a record read request using the shard-iterator, Third decode and decompress the base64 result.

 

# aws kinesis get-shard-iterator –stream-name MyStream –shard-id shardId-000000000000 –shard-iterator-type TRIM_HORIZON

output

{

    "ShardIterator": "AAAAAAAAAAGQUDQQxos+xAd9kfRf/iznqZPiyS7ESPf7+6HB4xyT3G9KC3lC/HLE3ETPypn7DpkqdpVtmUBWd+Vz+eRTRP83+fk8uM8FfzhfntKw3scpBnJf/fHWusB47Z2Ha5whjKu44HeBsacM0mSvhoC37tF1Ib0RsyQkr+S7k+Q3SL10xiHKbx7gZu8vGYn8SGZG9A4SQDgQsNXNe6VRz42Evhb2"

}

 

NOTE: This shard iterator is a base64 encoded ID.

 

# aws kinesis get-records –shard-iterator “AAAAAAAAAAEhLMyQ7F0DbVkjfedDrkerDKinMnriGcgfMYEv19+egOsCUrF0WIagSnIdN/MRvz+CerfGHYav/AcvjJeZi0VUQwHxOpGAQlH164egQlwVXc9iZpRtUWv7mWldKiMnvQIF+ltV9p47DW19xrFuczLCqkPw1cCXkXiDjXRHU6Us+vCkGyQPMlhWV81SRlniFy7UfW60SicprAn8SsVZkjQt”

Output

{
    "Records": [
            "Data": "H4sIAAAAAAAAALWTTYvbMBCG/4rwqYXYlmR9WIYewm66hVLabtJTCEFrK4moIxlJSXZZ8t93kqbQnpM9Cd6ZeWeeEfOabU2Mem1mL4PJmux+PBsvv02m0/HDJBtl/uBMAFlJoiSvOZOSgNz79UPwuwEi1sWkXWv+qNMUjN6CXO51KEEoL/YR4nH3FNtgh2S9+2z7ZELMmnnWGpeC7nPIjtnibDPZg3YKvma2A7eqElzySiiqhKipYrLiglEiGeeMciJrgjnmhFWUU6ZgUIW5EAyaJgsTJL2FWQnHinJBOMMYj/6Cg/33NiGqEOYNUw1hyA45kTSvSC5wLjHqNm1vYaI5xUItGnT/5e7H4+Tnr8l0hrxDJm0wSh5BTVGRgtUFQYMPCQmJPjzb7hN+rlgNvVf0Y3YcXcfE35FpfPcVrYLf/kdycwLxTgRPfue6f/5B4AKy8hwF48xB98g6RLgiKJrWuy4WV5PIG5GYljqTGjRfm7TcmqQXaBZerFufaEBDm5SGpiyJUAXlrLi8Za8T9C5PFXmnky7B5eDD79I6OK6Vbk0st7qNJaHNCjdd29CqqWUjoNS3cHJ22LN49RrqW68hmEOwySx1b3U0cYEez8JpIRcJ+dX57LLj4vgGBXKeRsMEAAA=", 

            "PartitionKey": "64621dbbbe9cf76cb5f0b930c2a974c7", 
            "ApproximateArrivalTimestamp": 1509256160.443, 
            "SequenceNumber": "49578368888725264053659947269742594258597963255083171842"
        }, 
….
….

 

Now you can decode one of the base64 records to read its result

# echo -n “H4sIAAAAAAAAADWPy2rDMBBFf0VoHdcjy3p5Z6ibVejCWRRCKIojEoEtG0lJKSH/3knaMqDFueLOmRudXEr25Lbfi6MNfW237eem6/t23dEVnb+Ci4iNYkYJLWqlGOJxPq3jfFkw8SFlGwb3S/scnZ0Ql1cbSwTlX33CPF0OaYh+yX4Ob37MLiba7OjgQo52LPB3ovtnTXdF9ghv1B+xjXMplOCsrrnU9cMEhUBrzWr+fDiT0qhKSgGghXmMBg24NHs0yHZCVybAVEIYDABW/4dj/fuQSWUIiKYWDWjil4KpquCskFAoIMfzMHo02lVMqX1DPjbbhvTz6AefyRyIy2dYER/woqsdCeMMKpjSC73v7z8JrwqRYQEAAA==”|base64 -d | zcat

 

NOTE: This is a simple reading example.  We don’t need to know how to build a consumer, this is huge job

 

  • You can view the ‘Put Records Request’ graph in Kinesis to confirm how the data is being written as time goes on.

 

  • Clean up.
    • Remove instance
    • Remove subscription filter
    • Remove logroup
    • Remove kinesis stream

 

Based on: http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#DestinationKinesisExample

Integrating CloudWatch Logging with EC2

This exercise shows you how to configure an EC2 instance to forward logs to CloudWatch.  This will be done by installing the awslogs agent on the instance.

  • Create a Role of EC2 type for CWLogs using the standard CWLogs Full Access policy
  • Launch an EC2 instance using this Role
  • Install the aws loggin agent: yum instal awslogs
  • View the agent’s config file: /etc/awslogs/awscli.conf.  This has the region you’re pointing to
  • Edit the agent’s config file to use an appropriate log group name
[/var/log/messages]
datetime_format = %b %d %H:%M:%S
file = /var/log/messages
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = hostname

 

    • Start the agent: service awslogs start
    • Open CWLogs dashboard and view the log group created by this agent.
    • Install Apache: yum install httpd
    • Start apache: service httpd start
  • Edit the agent’s config file to forward httpd logs. Add this entry
[/var/log/httpd]
datetime_format = %b %d %H:%M:%S
file = /var/log/httpd/access_log
buffer_duration = 5000
log_stream_name = /var/log/httpd/access_log
initial_position = start_of_file
log_group_name = hostname

 

Open web browser to the IP. This will generate logs

  • Go checkout the loggroup in CWL 🙂

Auto Scaling based on SQS Message Queue Size

Auto Scaling based on SQS Message Queue Size

This exercise helps you learn how to keep up with demands on an SQS queue.  If your queue is growing too big because the worker instances aren’t able to complete the tasks fast enough then you can add more instances.  You can leverage an AutoScaling group to do this automatically based on the size of your queue.

  • Logging to AWS console and choose a region to work in.  I typically select us-east-1
  • Create an SQS queue
  • Launch an EC2 instance with a Role that will allow it to send messages to SQS
  • Login to the EC2 instance and configure the AWS CLI client to point to your selected region
  • Send a message to the SQS queue, this populates metrics in CloudWatch :
    • $ aws sqs send-message –queue-url xxx –message-body message1234
  • Connect to CloudWatch and  create an alarm for when the SQS metric “ApproximateNumberOfMessagesVisible” reaches 40. Note that this metric got created automatically for the sqs queue. You have to wait a few mins to have it show. Set the period=1min and delete Actions
  • Create a Launch Configuration and Auto Scaling Group in EC2.
    • Policy: Scale up by 1 when our alarm is triggered
  • Now proceed to send 40 messages to the SQS queue
  • You should see the alarm trigger and see EC2 instance get launched

WordPress Pages vs Posts

I use WordPress to run this blog.  I got started by following Pat Flynn’s tutorial on How to start a Podcast.   Aside from that I haven’t done much research on the features of WordPress or best practices.

I have noticed that when I’m ready to publish some content I have the option of creating a Page or doing a Post.  Sometimes I’ve decided to use pages. I’ve been using Post mostly when I’m publishing a Podcast episode.  Eventually I wondered what is the difference between the two and which should I use on a regular basis.

A few Google searches provided the answer. Use Posts for your publishing content!

The purpose of Pages is more for static content such as your About page or Home page.  But that explanation didn’t help me much.  What did make sense to me is that Posts get sent out via an RSS feed and readers can comment on Posts.  You don’t expect readers to comment on your Home page.  So if you’re using Pages for content you’re missing out on that interaction with the user.

This makes sense to me.  So from now on I’ll be using Posts and will probably convert some of my old pages to Posts.

This article helped me understand this.

http://www.wpbeginner.com/beginners-guide/what-is-the-difference-between-posts-vs-pages-in-wordpress/

-Raf

Learning Cloud 016: Slack Chat Bot

Are you a slacker,  have you been slacking recently?  If you not you should be slacking more often.

This is just a little play on words because today obviously we’re talking about Slack.  And of course I don’t mean being lazy or putting things off but I mean the Social media tool Slack.

What is Slack?

Slack is a cloud-based team collaboration tool, In simpler terms, it is a chatting program;  think of  Skype, Twitter.

Slack was created by Daniel Stewart Butterfield.  He’s a Canadian entrepreneur, best known for being a co-founder of the photo sharing website Flickr.  By the way, he’s worth 1.69billion.

How does Slack work?  Basically you can create a private group so that you can chat with only members of that group.  Also there’s no charge to using Slack.

Slack differs to Twitter in that generally your tweets are visible by everyone, unless it’s a personal Tweet.  Slack focuses on the private interaction and allows you to embed pictures and video into your message.

Slack uses the following terms to create groups.

  • Team and Workspaces
  • Channels
  • Messages

In this episode I want to introduce you to the Qwiklab where you will build a chat bot for Slack.

Chat bots have the ability to interact with teams and users, respond to commands, and post notifications, giving all conversation participants visibility into team activities.

You will build a bot that posts CloudWatch alarms to your Slack channel.  The high level steps are

  • Create Slack account
  • Create Webhook in slack
  • Create a lambda function.
  • select the Blueprint for slack
  • enter the web hook
  • Create the python code
  • test it
  • Create cloud watch alarm

You can find the detail steps in this Link.  You’ll have to sign up to be able to see the instructions

https://qwiklabs.com/focuses/3444 

Remember “If you’re not willing to learn no one can help you but if you’re determined to learn no one can stop you

Bye bye for now, until next time.

 

-Raf

Entropy on a Linux Box

 

“In computingentropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources(variance in fan noise or HDD), either pre-existing ones such as mouse movements or specially provided randomness generators. A lack of entropy can have a negative impact on performance and security.” Wikipedia

 

On RHEL6, You can read the available entropy value in /proc/sys/kernel/random/entropy_avail.  This entropy is generated by the kernel using sources such as physical mouse/keyboard. So the value you read in this file will change constantly.  So a system may generate low entropy depending on where what’s using for random data.

If you require more entropy than what the system is generating you can install a physical entropy generating card.  This is a device that can generate randomness based on physical sensors rather than based on a computer program.  Sensors may be thermal based.

I have experience with a GOCD agents not producing enough entropy and this affected the appp.

Typing over ssh session makes no difference on entropy, you can test this , since randomness doesn’t care about pseudo terminal comands, only attached keyboard/mouse.

/dev/random is the process that generates the randomness.  This won’t return a value if the entropy has been used up.  /dev/urandom on the other hand will return  a value even if no entropy is there.  This is not good for security, such as certifcates, ssl and other cryptographic functions.

 

 

https://www.chrissearle.org/2008/10/13/Increase_entropy_on_a_2_6_kernel_linux_box/

Learning Cloud 015: Generating thumbnail via Lambda

“AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you making it easy to build applications that respond quickly to new information.”

Gartner presented at work and one of the things they said is how Serverless would be the future of Cloud.

Potential of Lambda: Can do any task related to your infrastructure or App configuration

  • Launch instances
  • Load data to a DB
  • Read and process messages from a queuing system

The beauty is that you only have to write the code.

Netflix explains how the company can use event-based triggers to

  • automate the encoding process of media files,
  • validate backup completions
  • instance deployments at scale

Main Purpose:

The one use case i’ll be showing you today is using Lambda to manipulate images, in particular how to create thumbnails.

The process to upload images to S3 is convenient. Downloading the same and using them on the website isn’t as simple. We need thumbnails (multiple sizes, mind you!) and probably a standard size in which we would like to showcase the uploaded images.

I’ll be referencing a Qwiklab

Steps

  1. Login to account and select Lambda service
  2. Create a Lambda function   Blueprint or “Author from Scratch”

       Select Language: Languages supported by Lambda:

      • Python, node.js, C# Java
      • Everything I’ve read suggests that Python provides fastest performance
      • Runs code in milliseconds of an eventPaste Code: Make up of lambda function:
      • Library definition
        • boto3 is the AWS SDK for python
      • Functions
      • Handler
  1. Connect to S3 and create a bucket
  1. Upload an image to the bucket. this will trigger lambda function

https://qwiklabs.com/focuses/2966

http://karlcode.owtelse.com/blog/2017/03/15/python-is-faster-than-node-dot-js/

https://aws.amazon.com/solutions/case-studies/netflix-and-aws-lambda/

https://aws.amazon.com/solutions/case-studies/netflix-and-aws-lambda/