Netflix Ribbon and Hystrix Timeout

Arun picture Arun · Aug 25, 2016 · Viewed 8k times · Source

We are using Spring cloud in our project. We have several micro services and each has its own .yml file.

Below properies are only in zuul server

hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 60000

    ribbon: 
     ConnectTimeout: 3000
     ReadTimeout: 60000

Test 1:

Accounts Service:

This service is what I'm calling to test the timeout and I'm calling the request through zuul i.e., using the port 8006.

@RequestMapping(value = "/accountholders/{cardHolderId}/accounts", produces = "application/json; charset=utf-8", method = RequestMethod.GET)
    @ResponseBody
    public AllAccountsVO getAccounts(@PathVariable("cardHolderId") final String cardHolderId,
            @RequestHeader("userContextId") final String userContextId,
            @RequestParam final MultiValueMap<String, String> allRequestParams, final HttpServletRequest request) {

        return iAccountService.getCardHolderAccountsInfo(cardHolderId, userContextId, request, allRequestParams,
                ApplicationConstants.ACCOUNTHOLDER);
    }

The above service internally calls the below one using Spring RestTemplate. I started testing by adding a sleep time of 5000ms like below in Association Service and made a request to Accounts Service (getAccounts call).

Association Service:

@RequestMapping(value = "/internal/userassociationstatus", produces = "application/json; charset=utf-8", consumes = "application/json", method = RequestMethod.GET)
    @ResponseBody
    public UserAssociationStatusVO getUserAssociationStatus(@RequestParam final Map<String, String> allRequestParams) {
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return iUserAssociationsService.getUserAssociationStatus(allRequestParams);
    }

Below is the error I got in Association Service

org.apache.catalina.connector.ClientAbortException: java.io.IOException: An established connection was aborted by the software in your host machine
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393) ~[tomcat-embed-core-8.0.30.jar:8.0.30]
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426) ~[tomcat-embed-core-8.0.30.jar:8.0.30]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:342) ~[tomcat-embed-core-8.0.30.jar:8.0.30]

Below is the error I got in Accounts Service

org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://USERASSOCIATIONS-V1/user/v1/internal/userassociationstatus?cardholderid=123&usercontextid=222&role=ACCOUNT": com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out; nested exception is java.io.IOException: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:607) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:475) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

If I keep the sleep time as 4500 it gives me response, but if is >=4800 it throws the above exception. I'm thinking this is not related to Ribbon Timeouts but something else. Any specific reason for the above exception after certain point.

Test 2

Then I tried keeping a sleep time of 75000 ms in Accounts Service directly and removed sleep time Association Service.

@RequestMapping(value = "/accountholders/{cardHolderId}/accounts", produces = "application/json; charset=utf-8", method = RequestMethod.GET)
    @ResponseBody
    public AllAccountsVO getAccounts(@PathVariable("cardHolderId") final String cardHolderId,
            @RequestHeader("userContextId") final String userContextId,
            @RequestParam final MultiValueMap<String, String> allRequestParams, final HttpServletRequest request) {

        try {
            Thread.sleep(75000);
        } catch (InterruptedException ex) {
            // TODO Auto-generated catch block
            ex.printStackTrace();
        }
        return iAccountService.getCardHolderAccountsInfo(cardHolderId, userContextId, request, allRequestParams,
                ApplicationConstants.ACCOUNTHOLDER);
    }

In this case I got "exception": "com.netflix.zuul.exception.ZuulException",

And in my APIGateway(Zuul application) log I see the below error.

com.netflix.zuul.exception.ZuulException: Forwarding error
    at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:134) ~[spring-cloud-netflix-core-1.1.0.M5.jar:1.1.0.M5]
    at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:76) ~[spring-cloud-netflix-core-1.1.0.M5.jar:1.1.0.M5]
    at com.netflix.zuul.ZuulFilter.runFilter(ZuulFilter.java:112) ~[zuul-core-1.1.0.jar:1.1.0]
    at com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:197) ~[zuul-core-1.1.0.jar:1.1.0]


Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: useraccounts-v1RibbonCommand timed-out and no fallback available.
    at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:806) ~[hystrix-core-1.4.23.jar:1.4.23]
    at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:790) ~[hystrix-core-1.4.23.jar:1.4.23]
    at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:99) ~[rxjava-1.0.14.jar:1.0.14]
    at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70) ~[rxjava-1.0.14.jar:1.0.14]

I think this has nothing to do with Ribbon ConnectTimeout or ReadTimeout. This error is because of the property "execution.isolation.thread.timeoutInMilliseconds: 60000". I have also reduced this property to 10000 ms to test the behavior and got the same exception if the sleep time is more(ex: 12000).

I want to understand Ribbon ConnectTimeout and Read-timeout vs Hystrix timeout and how to test ribbon timeouts in my application. Also if I want different timeouts for different microservices, Do I keep these properties in respective .yml files?. Any thoughts?

I'm trying to create a document to be used by my team so that it is easy for a developer to know how these timeout options work in Spring cloud.

(It's a lengthy description but to make it clearer I had to write in detail)

Answer

Dave Syer picture Dave Syer · Aug 4, 2017

The connectTimeout and readTimeout in ribbon are passed down to the underlying HTTP client. They apply to the HTTP connection (not the HTTP request once the connection has been established). I'm not sure why you'd need to test it like this really, but it's going to be hard with a healthy server. For instance, for connectTimeout, you need one that can accept TCP connections but not finish the HTTP layer connection. For readTimeout you need one that makes a connection but then doesn't send any data (at all).