Naïve Bayes
Christ University, Lavasa.
Outline:-
- Conditional Probability for Naïve Bayes
- Bayes Rule
- The Naïve Bayes
- Assumptions of Naïve Bayes
- Gaussian Naïve Bayes
We start with an interesting Machine learning Algorithm called the Naïve Bayes' Classifier Algorithm, It is essential to formulate Questions which is required to have an understanding about What this classifier algorithm is?, why do we use it?, how do we use it? and What is the objective and conclusion drawn?.
A group of classification algorithms built on the Bayes' Theorem are known as Naïve Bayes classifiers. It is a family of algorithms rather than a single method, and they all operate under the same guiding principle—that is, that each pair of features being classified stand alone.
Conditional Probability for Naïve Bayes
The possibility of an event or outcome happening contingent on the occurrence of a prior event or outcome is known as conditional probability. The probability of the prior event is multiplied by the current likelihood of the subsequent, or conditional, occurrence to determine the conditional probability. Let's use examples to help you comprehend this definition. If I were to ask you to choose a card from the deck, what are the chances of receiving a king if the card was a club? Pay close attention to how I specified that the card must be a club here. Due to the fact that there are 13 total cards in clubs, my denominator for calculating probability will be 13 rather than 52. We only have one king in clubs, so Given that the card is a club, the likelihood of getting a KING is 1/13 = 0.077. Let's look at one more illustration. Imagine conducting a coin toss experiment with two coins. Here, the sample space is: S = HH, HT, TH, TT
If someone were asked to calculate the likelihood of having a tail, their response would be 3/4 = 0.75.
Imagine that someone else conducts the exact same experiment under the same conditions, with the only difference being that both coins must be heads. This suggests that the elementary outcomes HT, TH, and TT could not have occurred if event A, "Both the coins should have heads," had occurred. Consequently, in this scenario, the likelihood of receiving heads on both coins will be 1/4 = 0.25.the aforementioned examples show us that if we are given extra knowledge, the probability may alter. In order to develop any machine learning model, we must first determine the output given a set of features.
Bayes Rule:-
We are now prepared to state Bayes' Rule, one of the most important conclusions in conditional probability. Thomas Bayes, a British mathematician, proposed the Bayes' theorem in 1763, which offers a way to determine the likelihood of an event given sufficient information.
Bayes' theorem can be expressed mathematically as:
In essence, we are attempting to determine the likelihood of event A, assuming that event B is real.
When P(B) is used, it refers to the probability of an event occurring before the evidence, while P(B|A) is used to refer to the chance of an event occurring after the evidence has been observed.
The formula for the probability of Y given a feature X is given to us by Bayes' rule. We hardly ever encounter a situation in real-world problems where there is just one feature.
When the features are independent, we can expand Bayes' rule to a method known as Naïve Bayes, which is based on the premise that the features are independent, i.e., changing the value of one feature does not affect the values of other variables.
Naïve Bayes can be applied to a wide range of tasks, including sentiment analysis, facial recognition, weather forecasting, medical diagnosis, and many others.
We make things simpler when there are numerous X variables by assuming that they are independent, so
Assumptions of Naïve Bayes:-
They are all independent factors. That is, just because an animal is a dog, doesn't mean that it will be a medium size. Each predictor has a similar impact on the result. That is, whether or not we can pet the animal does not depend more on the fact that it is a dog. Each feature is equally important. We should attempt to use the Naïve Bayes method on the aforementioned dataset, but first we need to perform some precomputations on it.
Gaussian Naïve Bayes:-
In our previous discussions, we have covered how to forecast probabilities when the predictors take on discrete values. But what if they go on forever? To achieve this, we must add a few more presumptions about the distribution of each feature. The assumptions that different Naïve Bayes classifiers make about the distribution of P(xi | y) are what differentiate them from one another. We'll talk about Gaussian Naïve Bayes here.
When we assume that every continuous variable connected to each feature is evenly dispersed, we utilize a method called Gaussian Naïve Bayes. Normal distribution is yet another name for the Gaussian distribution.
Here, the conditional probability is altered because the current values are different. A normal distribution's (PDF) probability density function is also represented by:
The majority of applications for Naïve Bayes algorithms include sentiment analysis, facial recognition, weather forecasting, medical diagnosis, and news classification. We discovered the mathematical reasoning for this algorithm in this paper.