Day 207 · Jul 25

The Math of Spam Filtering – Bayes’ Theorem

Bayes’ theorem updates probabilities given evidence: P(A|B) = P(B|A)·P(A)/P(B). Spam filters use it: start with prior probability a message is spam (say 50%). Look at word ‘free’ – 80% of spam contains ‘free’, 2% of ham contains ‘free’. Then posterior P(spam|‘free’) = 0.8·0.5 / (0.8·0.5 + 0.02·0.5) = 0.4/0.41 ≈ 97.6%. Combining many words yields accurate filters. Bayes’ theorem is also used in medical testing, machine learning, and even for searching for lost submarines.

A test for a disease is 99% accurate. Only 0.1% of the population has the disease. You test positive. What is the actual probability you have the disease?

Practice related topics on DuelMath

Challenge someone →