AWS

AWS Certifications and Mindset 2020-09-22

Certifications

AWS offers a number of certifications at a foundation level, an associate level, specialities and a professional level. At the associate level there are Architect, SysOps and Developer.

In the test preparation for the Architect I found quite in-depth questions on the other subjects and overlap between certificates. So rather than take one certificate at a time I practiced for all associate level certificates as well as Advanced Networking and Security specialities, before taking the Architect exam. I relied on the many books providing practice multiple choice questions.

On the day of the test, I spent two hours cycling to the test center only to be told they had just closed it due to COVID restrictions, at least I got five hours exercise. When I did the Architect test some six week later I was a bit over prepared and got 93%.

A motivation for doing Cloud certifications is because there is so much Cloud that one could never realistically hope to cover everything in one's day job. Even with the certifications I still come across useful AWS products that are new or I have never heard of. For finding out what is going on in the wider AWS world and getting into shape, I got a lot out of the AWS Podcasts while working on a cross-trainer. These podcasts have a nice atmosphere, very easy to listen to, a little like listening to BBC Radio 2.

Mindset

One of first my experiences with Azure was noticing that I could look at our development environment on my phone. Coming from a native mindset I unconsciously assumed a firewall that would protect the whole of the internal organization network. The development environment is not inside the organization's network, it's a Cloud provided network connected directly to the Internet, a virtual network (VNet) in Azure or a virtual private cloud (VPC) in AWS.

There is also no network administrator to turn to but having a distant background in networking management so it was good to use old skills and liberating to do it with a mouse rather than physical hardware. There is also or system administrator to help or hinder so even though I came from a time when the command line was the norm, I have written more bash in the Cloud than in previous decades.

The Cloud gives you pretty much 100% availability. Auto-scaling virtual machines means there is always something running, even with a DDoS. One can place instances in physically separated Availability Zones (AZ) or even regions, so a disaster in one place does not bring down the whole system.

A price of auto-scaling is no shared memory, which impacts caching and consistency. If you try to update an object on one virtual machine then the other won't know anything about that update and unless action is taken data stored on disk is typically lost when a virtual machine fails or restarts. This leads to the notion of stateless servers, which just compute and get data either from a distributed cache, database or shared file-system. (One can also use sticky sessions in some cases.)

My native mindset when a machine goes wrong is to log in and look at the log file and see what went wrong. With stateless servers, when a machine dies it is gone so you can't log in. While it is practical to monitor one log file on a server it is not practical to monitor 100 virtual machines. So logs from all virtual machines are sent to a central service for example AWS CloudWatch Logs or an Elasticsearch (ELK) stack. Since each incoming request from clients may produce multiple log messages, the volume of log messages can be greater than the volume of client requests and so the load on the logging system can be more than the load on the application. When one rolls out a new release one can double the load and if a developer inserts a careless logging in the wrong place the load can go up again dramatically. So logging becomes a major engineering challenge.

Tyranny of choice

The Cloud provides a number of ways to do the same thing. One can run an application on one or more EC2 instances as a conventional Linux process, which gives you control over everything. One can use AWS Elastic Bean Stalk which is meant as an easier to use layer above EC2 hiding some of the complexity. One can run the application as a set of Lambda functions or serverless computing, which saves a great deal of effort setting up and maintaining infrastructure. One can run a the application as a set of docker containers, using anyone of raw EC2 instances, AWS Elastic Container Service, AWS Fargate or Kubernates.

One can run a database on an EC2 instance with the data stored on an EBS drive. One can use a managed service like AWS RDS to manage the database instance and store for you. One can choose between the conventional relational databases: Postgres, Sybase, Oracle, MS SQL Server, or one can use a column based data warehouse like AWS RedShift. One can use distributed no-SQL database like Azure CosmosDB or AWS DynamoDB. One can even use the Redis distributed case to persist data. For many problem AWS S3 or Azure Blob storage is sufficient and can be optimized using AWS CloudFront. Each has it's own use cases the trade offs.

Database hardware has changed since the noughties. In the past most data was kept on a hard drive with an in-memory cache, nowadays most databases fit entirely into high speed memory. Seek times on hard drives are very slow compared to solid state disks.

In choosing a data store I feel the important to understand the CAP Theorem. If you have a really high number of transactions then it becomes impossible to use a conventional RDBMS with pessimistic transactions, because the volume is too high to keep everything consistent. At that point one is forced to look into optimistic concurrency. The point I would like to make here is that modern No-SQL databases are not necessarily intrinsically better than older RDBMS, they are just there for a particular use case and if you don't have that use case you are likely better of with an RDBMS.

Examples of asynchronous messaging systems include Amazon Simple Queue Service (SQS), Kafka, Amazon Kinesis. Kinesis is actually very similar to Kafka in design and is suited to huge volumes of data that can be processed in parallel. They sit, in my mind, on the spectrum between batch systems and messaging systems, in that consumers can process a window of messages which is like a small batch or a collection of individual messages; this is more efficient that processing one message at a time but not as responsive. So if you just need point to point messaging SQS is a lot simpler.

Summary

In summary, as a developer coming to the Cloud, think of Security, System and network Administration, get used to bash. Be aware of the CAP Theorem, you can have 100% availability but not consistency. Make sure you really understand consistency, distributed transactions, pessimistic transactions and optimistic transactions. For operations logging and dashboards. There is so much choice in the Cloud one can very easily over engineer.

AWS Security

Incidence response

Root Cause Analysis (RCA) - exposed keys - compromised EC2 instances Mitigation step aws ec2 revoke-security-group-egress --group-id sg-1b2ecs3a For all accounts CloudTrail, Alarms, GuardDuty, Config and Config Rules, VPC Flow logs Best to make keys inactive rather than delete them, just in case.

AWS Systems Manager

Logging and monitoring

VPC Flow Logs

VPC Flow logs log IP traffic by monitoring individual Network Interfaces, those within a VPC or within a Subnet.

The default record format is: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status

CloudTrail

A CloudTrail event is an Action by Principal against an AWS Resource. A CloudTrail records 90 days of API actions in an Event History. One can create Trails that filter events and cause them to be written to CloudWatch logs which can write them as objects to an encrypted S3 Bucket.

CloudWatch Logs

CloudWatch Logs collects logs from various sources including EC2 Instances, CloudTrail, Route 53.

Metric filter

A metric filter increments a numerical CloudWatch Metric each time it's filter matches a new log entry.


	  aws logs put-metric-filter--log-group-name app1/access.log \
  	  --filter-name Count1  --filter-pattern "" \
	  --metric-transformations metricName=Count1,metricNamespace=NS1,metricValue=1,defaultValue=0

CloudWatch Logs Files

Events written to CloudWatch Logs can be up to 256 KB and CloudTrail does not send events larger than this. Digest Files are written frequently to the S3 Bucket to prevent tampering; every hour if no events.

Account_CloudTrail_Region_YYYYMMDDTHHmmZ_Random string.json.gz

See: CloudTrail log file examples

Infrastructure security

Edge Security - CloudFront
Host based security
VPC security

AWS Identity and Access Management (IAM)

IAM Policy

A policy attached to an identity or a resource define the permissions and actions. At one level they are very simple, just a statement saying which actions are allowed by who against what yet can easily become complex.

IAM Policy JSON

IAM Policies are typically formatted using JSON.


	   {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "service-prefix:action-name",
            "Resource": "*",
            "Condition": {
            "DateGreaterThan": {"aws:CurrentTime": "2020-01-01T00:00:00Z"},
	    "Condition": {"StringLike": {"s3:prefix": ["${aws:username}/*"]}}
            }
        }
    ]
    }

IAM Policy as a Class diagram

Data protection

AWS Key Management Service (KMS) is central service for managing cryptographic encryption keys and perhaps the most central service in AWS being used by most other AWS Services. Cryptographic keys are used for Encryption in Transit (HTTPS) and for Encryption at Rest.

Ciphers

A cipher is an algorithm to encrypt and decrypt.

A symmetric key is used to both encrypt and decrypt a message. A block cipher encrypts fixed length segments of the message one at a time, a stream cipher encrypt a byte or word at a time. AES-256 is a common standard.

With an asymmetric scheme there is one key to encrypt and another to decrypt. The public key (that everyone can see) is used to encrypt a message and is decrypted by a private key (that is secret). A digital signature uses the reverse, one encrypts a message with the private key and decrypts with a public key, so everyone knows the signature can only have been signed by the holder of the private key. A message authentication code (MAC) is a small message that confirms who it was sent by and that it has not been changed, it's authenticity and integrity. Public Key encrypt relies on functions whose inverse is so extremely time consuming as to be impractical. The RSA algorithm relies on the impracticality of factoring the product of two large prime numbers. Elliptic Curve Cryptography (ECC) relies on the impracticality of finding the discrete logarithm of a random point on an elliptic curve given a point that is publically known.

A Cipher suite is a choice of algorithms used to secure a protocol such as TLS. and may include the following: protocol,key exchange algorithm, block cipher, message authentication algorithm For example: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 imples the protocol TLS, ECDHE_RSA for key exchange, AES_128_GCM as block cipher and SHA256 for the message authentication algorithm.

Envelope Encryption

S3 requests that KMS send it an encryption key GenerateDataKeyPair.
KMS generates a data key and encrypts it with a CMK
S3 receives both plaintext data key and the encrypted version
S3 encrypts the data with the plaintext data key and stores the encrypted data with the encrypted version of the data key.
The plaintext data key is discarded.

See KMS and S3"

AWS Cheatsheet