When using SFTP Gateway in HA, all EC2 instances can become slow to the point of being unusable.
This can happen when you run out of EFS burst credits. Disk IO slows down to a crawl. This impacts multiple services because their configuration files are shared on EFS.
The solution is to enable Provisioned Throughput on EFS. Disk IO speed will return to usable levels, and the EC2 instances will become responsive again.
Diagnosing the issue
When EFS runs out of burst credits, the first symptom you might notice is trouble logging in. SFTP connections will time out, since SFTP clients give up after 30 seconds. SSH connections may work, but only if you wait long enough (10 minutes or longer).
If you are eventually able to SSH in, it could take 30 seconds to run a simple command that normally takes milliseconds:
[root@ip-10-0-0-3 ec2-user]# time ls -l total 16 -rw-r--r-- 1 root root 7604 May 14 16:28 backup.py real 0m31.853s user 0m0.002s sys 0m0.000s
The key metric to check is
%CPU IO wait time (
wa). Run this command:
And you'll see this at the top of the screen:
In this screenshot, it's waiting 45% of the time, so there is something seriously wrong with disk latency.
To determine if you are out of EFS burst credits, look at the CloudWatch metric: EFS > File System Metrics > PermittedThroughput
This should normally be around
105M. But if this drops below
the server will start experiencing issues.
Configure Provisioned Throughput
To fix the issue, you need to configure Provisioned Throughput for EFS.
In the AWS console, edit your EFS file system. You should see this screen:
Change the Throughput mode from
For Provisioned Throughput, set this value to
Finally, click Save Changes.
EFS Provisioned Throughput is quite expensive.
You are charged $6/month per MiB/s, so our recommended value of
will cost $90/month.
AWS allows you to enable Provisioned Throughput and increase the value as many times as you want, per day.
But you can only decrease your Provisioned Throughput once a day. Switching back to Bursting counts toward this.
So the strategy is turn on Provisioned Throughput,
and slowly increase the value until SFTP Gateway runs at a tolerable
latency. You can don't have to jump straight to our recommendation of
Remember that your Burst Credits continue to recover while you are paying for Provisioned Throughput. So, you could switch between Bursting and Provisioned, based on your available Burst Credits.