Configuring timeouts in Spring reactive WebClient
Introduction
The WebClient is the de-facto interface to use when calling reactive downstream endpoints while developing web services on the reactive stack. It is a non-blocking, reactive client to perform HTTP requests. It uses Reactor Netty as its default underlying HTTP client library. And allow the application developers to customise the implementation if required.
One of the key aspects of calling downstream endpoints in a resilient system is to handle the timeouts gracefully and stop the possible cascading failures. If not carefully designed unhandled timeouts can exhaust the system resources.
The illusion of signal timeout.
A typical request to downstream reactive endpoint using WebClient would look like below.
webClient.get()
.uri(uri)
.retrieve()
.onStatus(HttpStatus::isError, clientResponse -> {
LOGGER.error("Error while calling endpoint {} with status code {}",
uri.toString(), clientResponse.statusCode());
throw new RuntimeException("Error while calling accounts endpoint");
}).bodyToMono(JsonNode.class)
.timeout(Duration.ofMillis(timeout));
Note that in the last line, there is a configuration using timeout() method. One might think this timeout is a common value for HTTP connect and read timeout which is taken care of by the Mono/ Flux publisher.
But this timeout has nothing to with TCP layer timeout values. This timeout value comes from the Mono publisher class (Same for Flux as well).
If this is not the TCP timeout values what does this timeout value mean ?
This is the timeout that comes to the picture if a given publisher failed to emit the next signal within the given time. Let’s have a look at an example.
The following class is a simple integration set up using MockServer to mock downstream responses with a delay. It spins up a mock server locally and registers a mock response against url /accounts with a 5-second delay.
During the test we do use the (Mono) publisher timeout to configure the timeout value as 3 seconds. When we run the test it will results following error.
java.lang.AssertionError: expectation "assertNext" failed (expected: onNext(); actual: onError(java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 3000ms in 'flatMap' (and no fallback has been configured)))
It throws a java.util.concurrent.TimeoutException because no event emitted within 3 seconds. Under the hood, the timeout() method registers an error signal generator which gets triggered if there are no events during the given time.
Furthermore, if we look closely the error signal make sure to close the HTTP connection as well, as a result, your downstream system may see connection closed by your WebClient. The following log lines tell us that story.
18:23:43.096 [reactor-http-nio-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x6cd69f5f, L:/127.0.0.1:61760 - R:localhost/127.0.0.1:1080] Channel connected, now 1 active connections and 0 inactive connections18:23:46.004 [parallel-1] DEBUG org.springframework.web.reactive.function.client.ExchangeFunctions - [1698ee84] Cancel signal (to close connection)18:23:46.005 [reactor-http-nio-4] DEBUG reactor.netty.resources.PooledConnectionProvider - [id: 0x6cd69f5f, L:/127.0.0.1:61760 ! R:localhost/127.0.0.1:1080] Channel cleaned, now 0 active connections and 1 inactive connections
We can handle that error as any other error event by adding a OnError() block
return webClient.get()
.uri(uri)
.retrieve()
.onStatus(HttpStatus::isError, clientResponse -> {
LOGGER.error("Error while calling endpoint {} with status code {}",
uri.toString(), clientResponse.statusCode());
throw new RuntimeException("Error while calling accounts endpoint");
}).bodyToMono(JsonNode.class)
// setting the signal timeout
.timeout(Duration.ofMillis(timeout))
// detecting the timeout error
.doOnError(error -> LOGGER.error("Error signal detected", error));
The real HTTP Connect and Read timeouts.
Now that the Signal timeout out of the way, we can look at how to configure the real TCP level timeout values. No matter whether it is blocking client or non-blocking client the basics of how TCP still applies here.
Note that in the WebClientTimeoutTest.java example code the WebClient build using the default builder without any specific configuration. Hence it falls back to the default connect and read timeout, which is 30 seconds each.
Modern applications do not wait for 30 seconds for anything. These contemporary systems encourage to fail fast and save valuable CPU cycles.
Following is an example of configuring the connect and read timeout to WebClient.
And the web client calling code would be like below without any signal timeout. (Of course, you can still use a signal timeout for a suitable purpose here, the only important thing to understand is not to use it as a substitute for TCP timeout values.)
And the test case would expect a ReadTimeoutException exception as follows.
Hope you got a solid idea about how to configure and use signal timeout and TCP connect timeout / read time out combination to build resilient application.