Log-Loss Basics for Dummies 

By: Will Wood

Published on:

One of the key metrics in predictive machine learning is log loss. To determine the most likely class label for a record, ML systems solve a classification problem with a categorical output. For example, based on available weather information (season, humidity, temperature, etc.), AI can generate a local forecast. Another example is predicting if an email will be classified as spam. Log loss will tell you which prediction model is the most accurate for your data.

Defining Log Loss

So, what does logloss mean? It is a metric applied to the performance of a classification algorithm. Conceptually, it shows how close the prediction probability is to the actual or true value. The latter can be 0 or 1 if the classification is binary. The further the divergence, the higher the log loss.

Example

Suppose you want to understand how likely your email is to land in the junk folder. In this example, we can assign 1 to the “spam” class and 0 to the “primary inbox” class. After determining the probability for spam classification, the algorithms will classify the record under either class. This depends on if the value crosses a specific threshold. By default, it is 0.5.

  • If your system’s prediction probability is exactly 1, there is nearly no divergence, hence log loss is zero. The email is spam.
  • If the probability is 0.9, the divergence is 0.1, and the log loss is 0.105 (check the formula below).
  • If the probability of landing in the junk is merely 0.2, this is lower than the threshold value of 0.5, so the message will be classified as non-spam. The most likely true value is 0 (class of “primary inbox”), while divergence is 0.2, and log loss is 0.223.

Thus, the better the prediction, the lower the log loss. Now, how is it calculated?

The Log Loss Formula

Here, “y” stands for the true value (spam or non-spam), “p” is the probability of prediction (1, 0.9, and 0.2 in our examples), “i” is the record, and “In” is the natural logarithm (an e-based value).

These calculations are performed for each observation, and a summary of a classification model is the average of outputs for all predictions. For example, if you take three observations (N = 3) with the probability of 1.0, 0.9 and 0.2, the average is 0.110 (based on three log-loss values: 0.000 + 0.105 + 0.223).

To Sum up

The log loss score of a classification logarithmic model reflects its skill. If a model has a perfect prediction skill, the score is 0. This means each probability is predicted as the true value (either 1 for “spam” or 0 for “primary inbox” in our example). The lower the average for a model, the more reliable its output. However, log loss scores are only comparable when they are applied to the same data sets.