gefvert.org

Logging Checklists

2020-09-03

Things to remember when implementing logging:

Golden rule:

Log files need to be reviewed on a continual basis. Make it easy for people to do so. Although it's tricky, consider the full flow of log dissemination from source to operations management - even for small services or individual servers. It is typically the only way that servers have to communicate with operators.

In addition:

  • Logging should be hard to get wrong.
  • Corollary: All logging must be super-easy. Avoid difficult setups in configuration files. Ideally each project (or a shared library) should have initialization code that sets everything up in a single call according to the operations environment in the company or project.
  • Pay attention to log rotation. Prevent files from growing forever.

Log levels:

  • Consider using log levels like
    • TRACE
    • DEBUG
    • INFO (default)
    • NOTICE
    • WARN
    • ERROR
    • CRITICAL
    • FATAL
  • TRACE logging should be used for diagnostic purposes for developers only.
  • DEBUG logging should be disabled by default, but easy (and I mean easy!) to switch on for non-developers or support technicians.
  • Consider using NOTICE log levels for things that stand out (new requests, interesting stuff, making it easier to find quickly).
  • WARNings should be issued for events that don't affect program operation now but may grow into problems.
  • ERRORs should be issued for any condition that needs to be dealt with.
  • CRITICAL should be issued for any condition that critically impacts operation and may lead to imminent service degradation or failure.
  • FATAL should be used for when the program can't start or abruptly terminates.

Errors and security conditions:

  • Web requests that are suspicious or that have error conditions should include, for traceability
    • Method (GET, POST, etc)
    • Full path
    • Query string
    • Full list of headers
    • Source IP address, including all X-Forwarded-For headers.

Monitoring and response:

  • Any warnings or errors should be forwarded to administrators on a daily basis.
  • Any critical or fatal errors should be forwarded to administrators immediately or within 15 minutes.
  • Avoid error fatigue. Errors should be eliminated as quickly as possible, warnings should be dealt with.

Log formatting:

  • Log formatting should contain (if not inferred from obvious sources, such as the log file name and location)
    • Date and time on a millisecond level (preferably in UTC, and should have a Z appended to indicate such)
    • Machine name
    • Environment (production, sandbox, development)
    • Program/process name
    • Thread or Task ID for work threads to track over multiple requests at the same time
    • Log level indication (warning, error, debug etc)
    • Message

An example would be (using date and time, thread ID, and log level indication; thread ID inside brackets)

2020-04-19T13:55:04.399Z           Starting up application
2020-04-19T13:55:04.692Z           Checking disk space
2020-04-19T13:55:04.692Z  WARN     Disk space is below 10%, remaining = 7.2 GB
2020-04-19T13:55:05.113Z           Starting server
2020-04-19T13:55:07.374Z  NOTICE   [3] New request from ::ffff:172.0.10.5
2020-04-19T13:55:07.626Z  FATAL    [3] Unhandled AssemblyLoadException: Unable to load assembly Company.RequestHandler.dll
2020-04-19T13:55:07.626Z  FATAL    Service terminated unexpectedly.