Finally We Fixed "Failed to Fetch" Error
Adam C. |

Our Apollo Server is running on AWS Lambda using Serverless Framework, and then it's accessed thru AWS API Gateway using Cognito User Pool as an authorizer. 

More information about Amazon Cognito User Pools can be found in this article.

Photo by Markus Spiske on Unsplash

Our frontend (CMS) is running on top of Create React App with Apollo Client.

On the localhost, we use serverless offline, so there is not API Gateway, Cognito User Pool, and Lambda in between the Apollo Client and Apollo Server.

We received a lot of complaints from the CMS users that they got the “Failed to Fetch” error a few times almost every day when they were working on the website.  When that happened, the CMS cannot communicate to the API server. With console log enable, we saw the information like below:

[Network error]: TypeError: Failed to fetch  

 

Access to fetch at ‘https://API-URI/graphql’ from origin ‘https://CMS-URL’ has been blocked by CORS policy: "No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request's mode to ‘no-cors’ to fetch the resource with CORS disabled. 

 

 

Failed to load resource: https://API-URL/graphql:1 net:ERR_FAILED

 

The hard part to solve this issue is that:

  1. There wasn't useful information that could be found in the AWS Cloudwatch
  2. We could not reproduce it on the local, or on production (with brief test)

After spending a lot of time checking the network connection (the CMS can only access via VPN), API Gateway, Lamba, RDS proxy, MySQL database, etc., everything that is different between Localhost and Production, we still could not find the clues. 

We were pretty sure that the CORS is enabled on Apollo Server.  Somehow the server's response does not send the No "Access-Control-Allow-Origin’ header, and the client does not set the request's mode to ‘no-cors’,  so the browser throws this error.

More informatio about CORS error could be found in this article

So, how about eliminating the CORS log first? We need to see the real error log. 

All debugging starts from the log - Adam C.

 

We took a second look at the API Gateway and found that we should manually add the “Access-Control-Allow-Origin” header to 4xx and 5xx response, otherwise, this header will be missing from the response, and then the real 4xx and 5xx error will be hidden, because the browser will complain CORS errors as shown above.

After adding this header with the value “*”, we got the new error report from our CMS users. As expected, we saw the real error message:

Failed to load resource: the server responded with a status of 401 () 

 

[Network error]: ServerError: Response not successful: Reveived status code 401

After finding the root of failure, the fixing became much more targetable. We quickly figured out that it's related to the Cognito user pool, which we use as an authorizer. The AccessToken and IDToken are set with a short expiration time of one hour, so if the user is on the website for more than one hour,  he/she will use pass an expired token for API call, therefore, the 401 unauthorized error will be received. Also, after refreshing the page, everything is back to normal, that's because the lifetime of refreshToken is one day, and Amplify will auto-refresh the AccessToken and IDToken as long as the refreshToken is not expired. 

So the solution is clear that we just need to refresh the AccessToken/IDToken when they are expired, but we need to do this in the background without refreshing the page, otherwise, the user's unsaved work will get lost.  

We use Amplify React UI Component to handle user login. When the login is successful or the user is already logged in, the ID token is passed to Apollo authLink as below:

Auth.currentAuthenticatedUser().then(async (user) => {
  const token = user.signInUserSession.idToken.jwtToken;
  const authLink = new ApolloLink((operation, forward) => {
    // Use the setContext method to set the HTTP headers.
    operation.setContext({
      headers: {
        authorization: token ? `Bearer ${token}` : ""
      }
    });

    // Call the next link in the middleware chain.
    return forward(operation);
  });
});

The logic above is implemented in the App.js, which is the parent component of all, and it's only run once at the componentDidMount lifecycle, so without refreshing the page, the token is never changed. But as we learned above, when the token is expired, the 401 unauthorized error is received. The Apollo Client allows us to check for a certain failure condition or error code, and retry the request if rectifying the error is possible. Below is what we came up using the ErrorLink: (If you are not familiar with Apollo Link, check it out here)

const handlErrorLink = onError(
  ({ graphQLErrors, networkError, operation, forward }) => {
    if (graphQLErrors)
      graphQLErrors.forEach(({ message, locations, path }) =>
        console.log(
          `[GraphQL error]: Message: ${message}, Location: ${locations}, Path: ${path}`
        )
      );
    if (networkError) {
      if (String(networkError).match(/Received status code 401/i)) {
        return fromPromise(
          Auth.currentAuthenticatedUser()
            .then(async (user) => {
              return user.signInUserSession.idToken.jwtToken;
            })
            .catch((error) => {
              console.log("refresh token failed: ", error);
              // Handle token refresh errors e.g clear stored tokens, redirect to login, ...
              return;
            })
        )
          .filter((value) => {
            return Boolean(value);
          })
          .flatMap((token) => {
            const oldHeaders = operation.getContext().headers;
            // modify the operation context with a new token
            operation.setContext({
              headers: {
                ...oldHeaders,
                authorization: `Bearer ${token}`,
              },
            });
            return forward(operation);
          });
      }

      console.log(`[Network error]: ${networkError}`);
    }
  }
);

Note the in our case, the 401 unauthorized is captured in networkError, and the error string contains “Received status code 401” which we used to filter, and Auth.currentAuthenticatedUse, the function provided by ‘Amplify’ is asynchronous, so we have to use fromPromise, the function provided by ‘apollo-link’ to token from a Promise object, and then replace the old header in operation context, finally forward the operation to retry.

After this, we had the annoying ‘failed to fetch’ error fixed. :-)