Things to remember when implementing logging:
Log files need to be reviewed on a continual basis. Make it easy for people to do so. Although it's tricky, consider the full flow of log dissemination from source to operations management - even for small services or individual servers. It is typically the only way that servers have to communicate with operators.
- Logging should be hard to get wrong.
- Corollary: All logging must be super-easy. Avoid difficult setups in configuration files. Ideally each project (or a shared library) should have initialization code that sets everything up in a single call according to the operations environment in the company or project.
- Pay attention to log rotation. Prevent files from growing forever.
- Consider using log levels like
- INFO (default)
- TRACE logging should be used for diagnostic purposes for developers only.
- DEBUG logging should be disabled by default, but easy (and I mean easy!) to switch on for non-developers or support technicians.
- Consider using NOTICE log levels for things that stand out (new requests, interesting stuff, making it easier to find quickly).
- WARNings should be issued for events that don't affect program operation now but may grow into problems.
- ERRORs should be issued for any condition that needs to be dealt with.
- CRITICAL should be issued for any condition that critically impacts operation and may lead to imminent service degradation or failure.
- FATAL should be used for when the program can't start or abruptly terminates.
Errors and security conditions:
- Web requests that are suspicious or that have error conditions should include, for traceability
- Method (GET, POST, etc)
- Full path
- Query string
- Full list of headers
- Source IP address, including all
Monitoring and response:
- Any warnings or errors should be forwarded to administrators on a daily basis.
- Any critical or fatal errors should be forwarded to administrators immediately or within 15 minutes.
- Avoid error fatigue. Errors should be eliminated as quickly as possible, warnings should be dealt with.
- Log formatting should contain (if not inferred from obvious sources, such as the log file name and location)
- Date and time on a millisecond level (preferably in UTC, and should have a Z appended to indicate such)
- Machine name
- Environment (production, sandbox, development)
- Program/process name
- Thread or Task ID for work threads to track over multiple requests at the same time
- Log level indication (warning, error, debug etc)
An example would be (using date and time, thread ID, and log level indication; thread ID inside brackets)
2020-04-19T13:55:04.399Z Starting up application 2020-04-19T13:55:04.692Z Checking disk space 2020-04-19T13:55:04.692Z WARN Disk space is below 10%, remaining = 7.2 GB 2020-04-19T13:55:05.113Z Starting server 2020-04-19T13:55:07.374Z NOTICE  New request from ::ffff:18.104.22.168 2020-04-19T13:55:07.626Z FATAL  Unhandled AssemblyLoadException: Unable to load assembly Company.RequestHandler.dll 2020-04-19T13:55:07.626Z FATAL Service terminated unexpectedly.