Troubleshooting Most Common Errors in EC2 Instances
Hello readers, coming back with our new “AWS troubleshooting series.”. Where we will explore most commonly used aws services understand the errors and issues occurs there and give solution on them.
This is Part-1 of the series. Today we are going to troubleshoot the most common issues that come up in EC2 instances. Without wasting time, let’s start.
Introduction
As we all knew, Amazon EC2 is a widely used “Compute Clouding” service for deploying virtual machines in the cloud. However, users often encounter various errors when managing EC2 instances. Below we have tried to highlights the most common EC2 issues and their solutions.
1. Instance Stuck in Pending State
Issue: The instance does not move from the “Pending” state to “Running.”
Cause: Insufficient capacity in the availability zone or incorrect instance type selection.
Solution:
— Check AWS Service Health Dashboard for capacity-related issues.
— Try launching the instance in a different availability zone.
— Use a different instance type if possible.
2. Instance Stuck in Stopping or Terminating State
Issue: The instance does not fully stop or terminate.
Cause:
— Volume corruption or high CPU utilization.
— AWS service issues.
Solution:
— Wait for AWS to resolve the issue if it’s a service problem.
— Force stop the instance using AWS CLI:
aws ec2 stop-instances --instance-ids <instance-id> --force
— Detach the root volume, attach it to another instance, and check for corruption.
3. Instance Connection Timeout (SSH Not Working)
Issue: Unable to connect via SSH.
Cause:
— Security group or network ACL blocking port 22.
— Incorrect key pair or username.
— Instance crashed or high CPU usage.
Solution:
— Verify security group settings to allow inbound SSH (port 22).
— Check network ACL rules to allow SSH.
— Confirm the correct key pair and username:
cmd=> ssh -i my-key.pem ec2-user@<instance-ip>
— Use AWS Systems Manager Session Manager if SSH is blocked.
4. Insufficient Instance Capacity Error
Issue: Cannot launch an instance due to lack of capacity.
Solution:
— Change the instance type or availability zone.
— Use AWS Spot Instances or Reserved Instances.
5. Disk Space Full (Instance Not Responding)
Issue: The instance becomes unresponsive due to full disk space.
Solution:
— Stop the instance, increase the volume size in AWS Console, restart it.
— Log in via SSH and check disk usage: cmd=> df -h
— Clear unnecessary files: cmd=> sudo rm -rf /var/log/*
6. EC2 Instance Boot Failure
Issue: The instance fails to boot, showing a kernel panic or “Instance Status Check Failed.”
Solution:
— Use AWS Console to create a new volume from the root volume snapshot.
— Attach it to another instance and fix boot configuration issues.
— Check system logs for errors:
cmd => sudo cat /var/log/messages
7. Instance Reaches Connection Limit
Issue: Unable to establish new connections to the instance.
Cause:
— High traffic load.
— Insufficient file descriptor limits.
Solution:
— Increase file descriptor limit in /etc/security/limits.conf
.
— Use an Elastic Load Balancer (ELB) to distribute traffic.
8. High CPU Usage (Instance Becomes Slow)
Issue: EC2 instance performance degrades due to high CPU usage. Solution:
— Monitor CPU usage via CloudWatch.
— Upgrade to a higher instance type.
— Identify and stop high CPU-consuming processes:
cmd-1 => top
cmd-2 => kill -9 <PID>
Conclusion
EC2 instances are powerful, but errors can disrupt operations. By understanding these common issues and their solutions, you can ensure smoother EC2 management.
This is an overview of the series. We will deep dive more with each service like EC2, VPC, EBS, ELB, etc. Stay tune for more content and enjoy aws cloud journey.