Kubernate

AWS # AWS Auto Scaling

Auto-Scaling in AWS, with simple examples and diagrams (conceptual), suitable for senior-level answers.


What is Auto-Scaling in AWS?

Auto-Scaling in AWS automatically adds or removes servers (EC2 instances/containers) based on real-time load, so your application always has the right amount of compute.


Two Main Auto-Scaling Options

1️⃣ EC2 Auto Scaling

Used for scaling Virtual Machines (EC2 instances).

2️⃣ AWS Application Auto Scaling

Used for scaling:

  • ECS (containers)

  • DynamoDB RCU/WCU

  • Lambda concurrency

  • SQS processing

  • Aurora replicas


🏗 How Auto-Scaling Works (Simple Flow)

  1. You define a Launch Template (AMI, instance type, SG, etc.)

  2. You define an Auto Scaling Group (ASG)

  3. You set:

    • Minimum instances

    • Maximum instances

    • Desired capacity

  4. Then you attach Scaling Policies, such as:

    • CPU > 70% → Add 1 instance

    • CPU < 30% → Remove 1 instance


🎯 Types of Scaling in AWS Auto-Scaling

1️⃣ Dynamic Scaling (Most common)

Automatically reacts to load.

  • Target tracking
    e.g., Maintain CPU at 60%

  • Simple scaling
    e.g., CPU > 70% → Add 1 instance

  • Step scaling
    e.g., CPU 70–80% → Add 1 instance
    CPU > 90% → Add 2 instances

2️⃣ Scheduled Scaling

Scale based on predictable patterns.
Example:

  • Add 3 servers every day at 9 AM

  • Remove servers at 10 PM

3️⃣ Predictive Scaling

Uses machine learning to predict future traffic trends
(available for EC2 + ECS).


💡 When Auto-Scaling Decides to Scale UP

Scale UP happens when:

  • CPU > threshold (like 70%)

  • Memory high (CloudWatch custom metric)

  • Network In/Out high

  • Application latency increases

  • SQS queue length > X messages

  • Request count per target high (ALB metric)

Example (Target Tracking):

Keep CPU at 50% → If CPU goes to 80%, add instances until balanced.


💡 When Auto-Scaling Scales DOWN

  • CPU < 30% for long time

  • Response time low

  • SQS queue almost empty

  • Traffic dropped at night

  • Under-utilised resources


🌐 Architecture Overview (Text Diagram)

           Users
             |
         [ AWS ALB ]
             |
     ┌─────────────────┐
     | Auto Scaling     |
     | Group (ASG)      |
     | Min:2 / Max:10   |
     | Desired: 3       |
     └─────────────────┘
     |      |       |
   EC2     EC2     EC2  ← instances scale based on CPU/Traffic

🚀 Real Interview Example Answer

“In AWS, auto-scaling is done using an Auto Scaling Group (ASG). We define min, max and desired instances. Dynamic scaling policies monitor CloudWatch metrics like CPU, memory, request count or latency. When traffic spikes, ASG launches additional EC2 instances and when traffic drops it terminates them. Load Balancer routes traffic only to healthy instances. This helps achieve high availability and cost optimisation.”


📌 Key AWS Services Involved

  • EC2 Auto Scaling

  • AWS Application Auto Scaling

  • CloudWatch for metrics + alarms

  • ALB/NLB for distributing traffic

  • Launch Template/Launch Configuration

  • EC2 Health Checks



No comments:

Post a Comment

Spring Boot - Bean LifeCycle

 Here is a clear, step-by-step lifecycle of a Spring Boot application , explained in a simple + interview-ready way. 🔄 Spring Boot Applica...

Kubernate