# Logging

# General

  • Logging is part of every application. An application MUST implement proper logging.
  • An application MUST provide a single point to adjust the log-level for the whole application.
  • The log-level MUST be adjustable depending on the stage (dev, staging, prod)
  • Logs MUST NOT contain sensible data (email-addresses, passwords, etc.)
  • Logs MUST NOT contain a big amount of meta-data, make sure to only log what is really needed.
  • Logs MUST be configured regarding there time to live (aging, TTL), default time of keeping logs is 2 weeks, if you need the logs for longer time (legal reasons, analytics, etc.) Provide that reason in the README-file of that application and share the information with your team.

# What to log?

  • In general, we want to log any meaning full processing step.
  • Everything one could potentially need to trace down a bug or monitor applications behaviour.
  • It's important to attach useful meta-data to a log.

# Things that should be logged

  • Incoming data (http requests, incoming messages, etc.), but never personal data
  • Outgoing data (http responses, outgoing messages, etc.), but never personal data
  • Processing steps (transforming an image, or a data structure, start (and finish?))
  • Something non-breaking but notable happened e.g. we request a list of database results based on a list of Ids and get fewer results than we have Ids
  • Something severe happened that caused the process to be aborted early (did not finish, why?)

# Log Level

  • While there are a couple of additional log-levels, we use the following, as far there is not a good reason and commitment for another one.

# Debug

  • This log level is meant for development purposes - it should help you find problems in your software while developing, it MUST NOT be the default setting for production environment.

# Info

  • This log level is meant to give you deeper information of what happened in the application, so additional infos like job started or finished with the state of execution. This is the default setting for production environments.

# Error

  • This log level is meant to log unexpected results of execution. This level also can be used to set up alerting on top of it and to measure reliability in your applications.

# Rule of thumb for log-levels

  • Input data / Output data -> level: debug
  • Processing steps -> level: info
  • Something non-breaking but notable happened -> level: info
  • Something severe happened that caused the process to be aborted early -> level: error

# Where to log?

  • Generally speaking we always want to log right where it happens.
  • Got some data you want to log to debug level? Do it right after you got it.
  • You want to indicate that a particular processing step has started?
  • Do it right before you call the processing function.
  • But there is one exception -> Errors

# Where to log errors?

  • Errors should be logged in one central place in the highest level of your code possible, by throwing error objects and letting them bubble up to that part of your code.
  • This is done to have just one meaningful log entry with the actual error.
  • If one would log errors everywhere they happen it would end up with the first log being the actual error and all following error logs just being consequential errors caused by the first one. They have no meaning to us as they go away as soon as the first problem is resolved. Thus, just noise no one really wants.
  • Also in case of an actual error you want to stop execution which is guaranteed by using the "throw error, bubble up, log once" approach.
  • Errors MUST be logged with the error object as meta-data to have proper stack traces.

# Examples

Case What LogLevel Where
Event triggers Lambda all input data debug First line of lambda handler
Lambda return value complete return object debug The last line in the lambda handler before return
We got data from another service/db if not "too big" complete data debug Right after the call to the service
We start a substantial processing step (e.g. fanOut SQS messages) Meaningful status message info Right before it happens
We checked for some non breaking edge case Meaningful error message + data used to detect info As soon as possible, might be the first line of your if(condition) {} code block
Technical errors like DB connection timeout Object of type Error error Do not log errors inline. Bubble up to top level (aka. throw) and logger.error(error.message, {error} ) once for all errors

# Examples (NodeJs with Winston log library)

# Bad - no meta data

logger.debug('New Response');
// DEBUG New Response { "nodeVersion": "1.2.3", "version": "0.1.1" }

Problem! We can not gain much information from this log line without any meta data

# Bad - wrong usage of meta data

const response = { body: 'some value', version: 'abc' };
logger.debug('New Response', response );
// DEBUG New Response { "nodeVersion": "1.2.3", "version": "abc", "body": "some value" }
const error = new Error('some error');
logger.error(error.message, error );
// ERROR some error { "nodeVersion": "1.2.3", "version": "abc", "name": "Error", "stacktrace": ... }

Problem! Meta-data is a merged object. Thus, the given meta object will be inlined with the default metadata. Also, there is only one meta parameter in the logger, but often you need to log multiple objects.

# Good - do this!

const response = { body: 'some value', version: 'abc' };
const otherObject = { body: 'some other value' };
logger.debug('New Response', { response, otherObject });
// DEBUG New Response { "nodeVersion": "1.2.3", "version": "0.1.1", "response": { "body": "some value", "version": "abc" }, "otherObject": { "body": "some other value" } }
const error = new Error('some error');
logger.error(error.message, { error } );
// ERROR some error { "nodeVersion": "1.2.3", "version": "abc", "error": { "name": "Error", "stacktrace": ... } }

Data is well structured and has context, good job!

# Logging in AWS

In AWS use AWS CloudWatch (opens new window) for collection and analysing the logs.

# Requirements for working with CloudWatch

  • The logs MUST be written in the right format
  • Each application MUST log in an own Log-Group

# Additional sources

Page Info: Created by GitHub on Jun 9, 2023 (last updated a minute ago by GitHub)