Scenario:
There was an EC2 instance(i-08c033bdb634c185c) on which I had configured multiple project's gitlab runner, the executioner was based on docker and i was not rotating any logs or unused images and containers so after a while instace was out of space and when i was trying to login to the system via any method i was not able to.
I was getting the following message when i was trying to connect to EC2 instace(i-08c033bdb634c185c) with AWS systems manager.
Also, this EC2 instance(i-08c033bdb634c185c) had no key pair, we had launched EC2 without key-pair on purpose so that you can only login to the instance via systems manager and we will have logs of who did what on the system.
IAM role that was attached to EC2 instance(i-08c033bdb634c185c) had proper permission to access systems manager. We had added the permission policy AmazonSSMManagedInstanceCore to the IAM role.
There was only one volume(root) attached to the instance.
Resolution:
- Stop the EC2 instance(i-08c033bdb634c185c)
- Created snapshot of the volume(root)
- Detached the Volume(root) from the instance(i-08c033bdb634c185c)
- Created New EC2 instance(i-0d2243e3c209495a9) with minimal configuration in the same VPC, subnet and AZ where i had gitlab runner EC2 instance (i-08c033bdb634c185c) Also added the same IAM role with the following trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
- Attach the detached volume to newly created EC2 instance(i-0d2243e3c209495a9) and logged in to the newly lanched EC2 instance via systems manager method..
- try running lsblk command you will see the the device which is full along with newly created ec2 instance device.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
-xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 10G 0 disk
[ec2-user ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
-xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 10G 0 disk
-xvdf1 202:1 0 10G 0 part
- Create a data directory
- mount that device to /data directory
- Clean few logs and unused folders, or even you can increase the volume size. If you increase the volume size you need to run the following commnd:
[ec2-user ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
-xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 20G 0 disk
-xvdf1 202:1 0 10G 0 part [ec2-user ~]$ sudo xfs_growfs /dev/xvda1 #For xfs filesystem
- Now detach the volume from newly created EC2 instance(i-0d2243e3c209495a9)
- Attach to old instance EC2 instance(i-08c033bdb634c185c), and while attaching make sure your device mapping is root
- If root volume space is full then this way we can even restore the instance if instace's key pair was lost, just add the public key to authorized_key under in
/home/ec2-user/.ssh/
folder
Prevention of this issue from reoccurring:
- Rotate your log properly, maybe based on cron expression on linux system
- use docker prune command to remove unused images, containers, etc
Reference:
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html
- https://community.aws/content/2eqmuC2DlHdflQraFrvJxcI4es0/recovering-access-a-guide-for-lost-ec2-key-pair-in-linux?lang=en
- https://solairajan18.medium.com/recovering-or-replacing-a-key-pair-in-aws-ec2-a-step-by-step-guide-d3fbfe94aa65
- https://stackoverflow.com/questions/49749794/recovering-lost-aws-ec2-key-pairs
0 Comments