One of the key metrics in predictive machine learning is log loss. To determine the most likely class label for a record, ML systems solve a classification problem with a categorical output. For example, based on available weather information (season, humidity, temperature, etc.), AI can generate a local forecast. Another example is predicting if an email will be classified as spam. Log loss will tell you which prediction model is the most accurate for your data.

### Defining Log Loss

So, what does logloss mean? It is a metric applied to the performance of a classification algorithm. Conceptually, it shows how close the prediction probability is to the actual or true value. The latter can be 0 or 1 if the classification is binary. The further the divergence, the higher the log loss.

### Example

Suppose you want to understand how likely your email is to land in the junk folder. In this example, we can assign 1 to the “spam” class and 0 to the “primary inbox” class. After determining the probability for spam classification, the algorithms will classify the record under either class. This depends on if the value crosses a specific threshold. By default, it is 0.5.

- If your system’s prediction probability is exactly 1, there is nearly no divergence, hence log loss is zero. The email is spam.
- If the probability is 0.9, the divergence is 0.1, and the log loss is 0.105 (check the formula below).
- If the probability of landing in the junk is merely 0.2, this is lower than the threshold value of 0.5, so the message will be classified as non-spam. The most likely true value is 0 (class of “primary inbox”), while divergence is 0.2, and log loss is 0.223.

Thus, the better the prediction, the lower the log loss. Now, how is it calculated?

### The Log Loss Formula

Here, “y” stands for the true value (spam or non-spam), “p” is the probability of prediction (1, 0.9, and 0.2 in our examples), “i” is the record, and “In” is the natural logarithm (an e-based value).

These calculations are performed for each observation, and a summary of a classification model is the average of outputs for all predictions. For example, if you take three observations (N = 3) with the probability of 1.0, 0.9 and 0.2, the average is 0.110 (based on three log-loss values: 0.000 + 0.105 + 0.223).

### To Sum up

The log loss score of a classification logarithmic model reflects its skill. If a model has a perfect prediction skill, the score is 0. This means each probability is predicted as the true value (either 1 for “spam” or 0 for “primary inbox” in our example). The lower the average for a model, the more reliable its output. However, log loss scores are only comparable when they are applied to the same data sets.