Ticker

6/recent/ticker-posts

How to recover an EC2 instance when there is no spece left on the device?



Scenario:

There was an EC2 instance(i-08c033bdb634c185c) on which I had configured multiple project's gitlab runner, the executioner was based on docker and i was not rotating any logs or unused images and containers so after a while instace was out of space and when i was trying to login to the system via any method i was not able to.

I was getting the following message when i was trying to connect to EC2 instace(i-08c033bdb634c185c) with AWS systems manager.



Also, this EC2 instance(i-08c033bdb634c185c) had no key pair, we had launched EC2 without key-pair on purpose so that you can only login to the instance via systems manager and we will have logs of who did what on the system.

IAM role that was attached to EC2 instance(i-08c033bdb634c185c) had proper permission to access systems manager. We had added the permission policy AmazonSSMManagedInstanceCore to the IAM role.

There was only one volume(root) attached to the instance.


Resolution:

  • Stop the EC2 instance(i-08c033bdb634c185c)
  • Created snapshot of the volume(root)
  • Detached the Volume(root) from the instance(i-08c033bdb634c185c
  • Created New EC2 instance(i-0d2243e3c209495a9) with minimal configuration in the same VPC, subnet and AZ where i had gitlab runner EC2 instance (i-08c033bdb634c185c) Also added the same IAM role with the following trust relationship:

{
 "Version": "2012-10-17",
 "Statement": [ 
     { 
          "Effect": "Allow", 
          "Principal": { 
              "Service": "ec2.amazonaws.com" 
     }, 
     "Action": "sts:AssumeRole" 
   } 
 ] 
}


  • Attach the detached volume to newly created EC2 instance(i-0d2243e3c209495a9) and logged in to the newly lanched EC2 instance via systems manager method..
  • try running lsblk command you will see the the device which is full along with newly created ec2 instance device.

[ec2-user ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0    0 8G   0  disk
-xvda1 202:1  0 8G   0  part   /
xvdf 202:80   0 10G  0  disk

[ec2-user ~]$ lsblk
NAME    MAJ:MIN  RM SIZE RO TYPE MOUNTPOINT

xvda    202:0    0  8G   0  disk
-xvda1  202:1    0  8G   0  part  /
xvdf    202:80   0  10G  0  disk
-xvdf1  202:1    0  10G  0  part

  • Create a data directory
[ec2-user ~]$ sudo mkdir /data


  • mount that device to /data directory
[ec2-user ~]$ sudo mount /dev/xvdf /data

  • Clean few logs and unused folders, or even you can increase the volume size. If you increase the volume size you need to run the following commnd:

[ec2-user ~]$ lsblk
NAME    MAJ:MIN  RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  8G   0  disk
-xvda1  202:1    0  8G   0  part  /
xvdf    202:80   0  20G  0  disk
-xvdf1  202:1    0  10G  0  part 

[ec2-user ~]$ sudo growpart /dev/xvdf 1 
[ec2-user ~]$ sudo resize2fs /dev/xvdf1   #For ext4 filesystem
[ec2-user ~]$ sudo xfs_growfs /dev/xvda1  #For xfs filesystem

  • Now detach the volume from newly created EC2 instance(i-0d2243e3c209495a9)
  • Attach to old instance EC2 instance(i-08c033bdb634c185c), and while attaching make sure your device mapping is root
  • If  root volume space is full then this way we can even restore the instance if instace's key pair was lost, just add the public key to authorized_key under in /home/ec2-user/.ssh/ folder


Prevention of this issue from reoccurring:


Reference:

Post a Comment

0 Comments