Aggregated Fail States

, , Leave a comment

The concept of a program failure, in the mind of a software developer, usually takes the form of a boolean or an exception. It’s either true or false, fail or no-fail.

But sometimes you come across more subtle error states – and this is especially true with web-based applications or applications that consume external services. Temporary errors, due to connectivity issues, pop up from time to time, and handling those correctly means the program has to have some sense of whether the error is temporary, localized, important-enough-to-abort, and handle multiple errors at the same time.

Collecting data from Facebook API’s, for instance, can result in most calls succeeding, but one or two failing. Maybe Facebook deprecated a particular metric. Maybe there’s packet loss on the way. Maybe there’s a slight temporary error… and sometimes there are just hard failures (like authorization failures where access tokens have expired).

I am moving to something I call a ResultState. It’s a simple class that looks something like this:

    public class ResultState
    {
        public bool Success { get; set; }
        public bool Finished { get; set; }
        public int TryCount { get; set; }
        public List<LogEntry> Messages { get; } = new List<LogEntry>();

        public bool Retry(int maxAttempts)
        {
            Finished = TryCount++ >= maxAttempts;
            Success = false;
            return !Finished;
        }

        public void Set(bool successful, LogEntry message = null)
        {
            Finished = true;
            Success = successful;
            if (LogEntry != null)
                Messages.Add(LogEntry);
        }
    }

It’s a simple class I’m using in a Batch Call function – every request inside a batch request is capable of being successful, not successful, or unfinished (retrying). The Batch Processor will keep submitting the request to the external API until all calls are complete – whether they ended in a failure or not.

    // Run batch request
    do {
        ...
    } while (!batchItems.All(x => x.Finished));

    // Dump messages to application log
    Logger.Write(batchItems.SelectMany(x => x.Messages).ToList());

The important thing is to constantly be able to feed upper-level methods with a “state” – the LogEntry object carries a descriptive message along with a severity code that allows the main process to evaluate the severity of the situation and take appropriate action. Indeed, from the Batch request it receives a List that describes all the various states and errors that were returned. It runs a Distinct() on these to feed the end result to the user.

For headless operations (where a process runs activities in the background), there are several consumers of ResultStates that might be interesting:

  • Upper-level methods and diagnostic software that must act on the information.
  • System operators, responsible for monitoring and debugging the software.
  • End users that are interested in the result of the operation, and need to know when or why things are failing.

Nothing is worse than running an automated job and receiving no response whatsoever whether it failed or ran successfully. And nothing is more frustrating than to receive a simple “failed” status back without any indication as to why. Aggregated fail states that can bubble up messages from individual methods are necessary in communicating just what happens, and why. Ideally, a system operator should be able to investigate the failure by just looking at easily accessible web-based logs.

 

Leave a Reply