Background of the issue:
While working on Lambda function with API gateway endpoint, I was getting a DNS resolution (UnknownHostException) error while trying to invoke the Private API endpoint but this error was fixed upon retrying the request.
Java SDK was returning "UnknownHostException" error message and in general when the IP address of a host cannot be determined and it indicates a DNS Failure. The message is generic, and it wasn’t describing the root cause of the issue.
Resolution:
While doing some research around this issue i came to know that UnknownHostException should be retried and are not unexpected errors (even with Amazon provided DNS). Also, I came to know that the Lambda Service Team is aware that there may be intermittent DNS lookup errors or timeouts which are unavoidable due to the distributed nature of AWS Services(not able to find reference though, will attach the link once I find it again). To minimise the function errors we could take the following steps:
Catch DNS errors and retry:
DNS lookup errors can generally be retried. It is advisable to modify the lambda function to catch any connection errors (as it is best practice), including DNS lookup errors, and retry with exponential back-off[1].
Restart the container:
On the off chance that there is an issue with the container, you can force a container restart with a non-zero exit code. e.g. sys.exit(1) in Python. This will cause the execution context to be terminate and restart, which means the container will need to re-initialize the invocation loop. i.e. the next invocation on the container will be a cold start.
Make use of runtime specific DNS libraries:
By making use of runtime specific DNS libraries that would allow you to cache some DNS queries locally. An example for a DNS library in Java would be DNSJava.
References:
[1] Retry behavior: https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html
0 Comments