In depth of classification ml

Classification machine learning

The k-Nearest Neighbors (k-NN) algorithm:

1. Training data:

Apple: (Weight: 150g, Color Intensity: 7)
Orange: (Weight: 170g, Color Intensity: 6)
Lemon: (Weight: 120g, Color Intensity: 10)

2. New data point:

Unknown Fruit: (Weight: 160g, Color Intensity: 8)

3. Distance calculations:

The Euclidean distance between two points

(x_{1}, y_{1})

and

(x_{2}, y_{2})

in a 2-dimensional space is given by:

Distant to Apple = 10.5
Distant to Orange = 10.20
Distant to Lemon = 40.01

4. Result:

Based on these distances, the unknown fruit is closest to the Apple (distance of 10.05), then the Orange (distance of 10.20), and farthest from the Lemon (distance of 40.01). If we were using k-NN with k=1, we would classify the unknown fruit as an Apple. If k=3, we need to look at the majority class among the nearest three fruits.

Naive Bayes Classifier:

So, here the value of yes-fever is 0.17 and no-fever is 0.13.....

Hence, if a person has both flu and covid, it is very likely that the person also has a fever.

Logistic regression:

Logistic regression is used when the target or dependent variable is binary, meaning it has only two possible outcomes (e.g., yes/no, pass/fail, true/false). Logistic regression uses a logistic function called a sigmoid function to map predictions and their probabilities. The sigmoid function refers to an S-shaped curve that converts any real value to a range between 0 and 1.

x = input value

y = predicted output

a0 = bias or intercept term

a1 = coefficient for input (x)

We calculate the values of a0 and a1 by maximum likelihood estimation, which is very complex.

Decision tree:

In a Decision tree, there are two nodes: the Decision Node and the Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or not.

Random forest:

A random forest is like an army of decision trees working together.
Each decision tree in the forest is trained on a slightly different subset of the data.
The final output is based on the majority vote (for classification) or average (for regression) of all the trees' outputs.

Support Vector:

Support Vectors in SVM (Support Vector Machine) are the key data points that are closest to the line (or hyperplane) that separates two groups or classes. These points are very important because they help the SVM algorithm figure out the best way to separate the classes.
Simplified Role of Support Vectors:

Maximizing the Gap: SVM tries to find a boundary (or line) that gives the widest possible gap between the classes. The bigger the gap, the better the model will work on new, unseen data.
Shaping the Boundary: Only the support vectors determine the position of this boundary. hanging or removing other data points (that aren't support vectors) won’t affect the boundary, but moving support vectors will.
In a basic 2D example, imagine you're drawing a line to divide two groups of dots. The dots that are closest to the line are the support vectors, and they help decide where the line should go. These are the most important dots because they directly control how the line is positioned to best separate the two groups.

High Gamma:

Narrower influence: A high gamma means that each training example has a shorter range of influence, affecting only nearby data points.
Complex model: With high gamma, the model can create more complex, wavy decision boundaries, as it focuses more on fitting closely to the training data points. This can lead to overfitting, where the model fits the training data very well but does not generalize well to unseen data.
Smaller neighborhoods: Only the closest points significantly affect the classification or regression decisions.

Low Gamma:

Wider influence: A low gamma means that each training example has a wider influence, affecting data points that are farther away.
Simpler model: The model will have a smoother, simpler decision boundary, as it considers more distant data points. This can lead to underfitting if gamma is too low, where the model fails to capture the complexities of the data.
Larger neighborhoods: The decision function takes into account broader regions around each training data point, leading to more generalized behavior.

Regularization (C):

When C is small, you're telling the model, "Don't worry about getting every small detail right. Focus on the overall trend or pattern in the data." With low C, the model is more lenient and doesn’t insist on fitting every detail of the training data perfectly. It’s okay with making some mistakes on the training data if it means creating a simpler, more general model. This helps the model to focus on the bigger patterns and not get too caught up in the small, specific details.
When C is large, you're telling the model, "Try to fit the training data as closely as possible, including more of the details, even if some of them might not be important." With high C, the model works hard to correctly classify every training example. It’s less tolerant of mistakes and tries to fit the training data very well, which can make the model more complex. This might lead to the model being too focused on the training data and not performing as well on new data.

Search This Blog

Artificial intelligence

In depth of classification ml

Comments

Post a Comment

Popular posts from this blog

A JOURNEY IN MACHINE LEARNING

Simpler Explanation of paper 1

K-fold cross-validation