Today, let’s take a look at Naive Bayes from the ground up, and work through a working implementation in Javascript.
For one of my side projects, I’m working on a feature that lets you perform simple data analysis on small datasets via the browser. One use case is the automatic assignment of categories to a set of items (i.e. supervised learning.)
Since users’ datasets are small (N<60), we want to keep our model as simple as possible so it won’t overfit. This usually means making simple assumptions about the distribution the data comes from.
There are a handful of machine learning classifiers we can use: Naive Bayes, k-Nearest Neighbour, and linear SVM. I also considered using Decision Trees and MLP which we can visualize on the browser using client-side visualization libraries.
Naive Bayes is a simple probabilistic classifier that works despite the independence assumption, and has historically been used in document classification.
Let A and B be events that can happen with some probability
P(A): probability that A happens
P(A|B): probability that A happens, given that B happens (conditional probability)
According to Bayes Theorem:
\[P(A|B) = \frac {P(B|A)P(A)} {P(B)}\]
That is, the probability P(A|B) of A happening given B is equal to the probability P(B|A) of B happening given A AND the probability P(A) of A happening OVER the probability P(B) of B happening.
To clarify this idea further, let’s work through an example:
A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98% of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the disease is not present. Also, .008 of the entire population have this cancer.
For a new patient that lab test returns a positive result, should he be diagnosed as having cancer or not?
Based on the above, we have the following probabilities:
Since P(!cancer|+) > P(cancer|+), we conclude that if a new patient gets a positive test result, he has a higher likelihood of having cancer.
Bayesian learning algorithms calculate explicit probabilities for hypotheses, which typically requires initial knowledge of many probabilities. What this means is that you’ll need labeled data beforehand (supervised learning.)
Supervised Learning with Naive Bayes
The above example works for a single attribute: a test result. However, in real-world cases we’ll often need more attributes in order to improve the accuracy of our predictions. That’s where supervised learning comes in.
A Naive Bayes classifier is a simple classification method that is based on Bayes’ Theorem and the assumption of conditional independence.
That is:
P(A ^ B | Y) = P(A | Y) P(B | Y)
Now, let’s say our training data looks as follows:
a1, ..., aN, Vj
Where ai is an attribute at the ith position, and Vj is the jth class label in the dataset. According to our conditional independence assumption:
According to Bayes’ Theorem, we’ll then be able to find:
P(Vj|a1, ..., aN)
which is the probability that a given sample of attributes a1, ..., AN belongs to a class Vj. This is the end result of our classification!
Notice that Naive Bayes is an eager learner: it calculates all the probabilities beforehand in a preprocessing step, making classification time close to zero. This is in contrast to lazy learners such as k-nearest neighbours with less training time and more classification time.
Naive Bayes Learner Implementation
Given the following training data, let’s build a Naive Bayes classifier to answer the following supervised learning question: is today a good day to Play outside?
Outlook
Temperature
Humidity
Wind
Play?
Sunny
85
85
Weak
No
Sunny
80
90
Strong
No
Overcast
83
86
Weak
Yes
Rain
70
96
Weak
Yes
Rain
68
80
Weak
Yes
Rain
65
70
Strong
No
Overcast
64
65
Strong
Yes
Sunny
72
95
Weak
No
Sunny
69
70
Weak
Yes
Rain
75
80
Weak
Yes
Sunny
75
70
Strong
Yes
Overcast
72
90
Strong
Yes
Overcast
81
75
Weak
Yes
Rain
71
91
Strong
No
As a recap, we want to calculate all the individual probabilities P(ai | Vj), so that we can calculate P(Vj | ai..aN)
Implementation
The full implementation can be seen on Github: node-bayes
Handling numeric attributes
One consideration we have to take into account is how we can handle numerical attributes such as Temperature and Humidity in our dataset.
We calculate the probability of a yet unseen numerical value through a normal distribution, specifically a calculated mean & standard deviation of the sample values for the column.
Node.js module
I’ve written a Node.js module you can use to perform the above training and classification steps:
You can view the source on Github. Pull requests are welcome!
In Python
In the Python data analysis ecosystem, there are a breadth of libraries you can use that contains implementations of Naive Bayes, such as GaussianNB() from sklearn:
In Closing
Naive Bayes is a simple but powerful supervised learning algorithm. Let me know what you think! Pull Requests to node-bayes are welcome.
A Message From the Author 👋
You reached the end of the article! You should follow me on
LinkedIn
and X.
📣Enjoyed this article? Share it.
📬Get updates straight to your inbox.
Subscribe to my newsletter so you don't miss new content.