

I hate to be critical of these two courses since overall they are amazing and I am very grateful to the instructors for making this content available for the masses. Statistical Learning by Trevor Hastie and Rob Tibshirani was an amazing course but unfortunately SVM was treated somewhat superficially. Andrew Ng watered down the presentation of SVM classifiers compared to his Stanford class notes but he still treated the subject much better than his competitors. Both courses miss the mark when it comes to SVM in my opinion. I should mention that there are two exceptions, Andrew Ng’s Machine Learning on Coursera and Statistical Learning course on Stanford’s Lagunita platform. It’s a shame really since other popular classification algorithms are covered. The one weakness so far is the treatment of support vector machines (SVM).

However, if we add new data points, the consequence of using various hyperplanes will be very different in terms of classifying new data point into the right group of class.I have been on a machine learning MOOCS binge in the last year. As shown in the graph below, we can achieve exactly the same result using different hyperplanes (L1, L2, 元). To separate the two classes, there are so many possible options of hyperplanes that separate correctly. Imagine the labelled training set are two classes of data points (two dimensions): Alice and Cinderella. The hyperplane (line) is found through the maximum margin. i.e the maximum distance between data points of both classes.ĭon’t you think the definition and idea of SVM look a bit abstract? No worries, let me explain in details. The objective of applying SVMs is to find the best line in two dimensions or the best hyperplane in more than two dimensions in order to help us separate our space into classes. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. That’s why SVMs algorithm is important! What is Support Vector Machines (SVMs)? Therefore, the optimal decision boundary should be able to maximize the distance between the decision boundary and all instances. As we can see from the above graph, if a point is far from the decision boundary, we may be more confident in our predictions. Therefore, the decision boundary it picks may not be optimal. Logistic Regression doesn’t care whether the instances are close to the decision boundary. Hence, we’re much more confident about our prediction at A than at C For point A, even though we classify it as 1 for now, since it is pretty close to the decision boundary, if the boundary moves a little to the right, we would mark point C as “0” instead.

For point C, since it’s far away from the decision boundary, we are quite certain to classify it as 1.However, there is an infinite number of decision boundaries, Logistic Regression only picks an arbitrary one. It helps solve classification problems separating the instances into two classes. In my previous article, I have explained clearly what Logistic Regression is ( link). The algorithm of SVMs is powerful but the concepts behind are not as complicated as you think. If you want to have a consolidated foundation of Machine Learning algorithms, you should definitely have it in your arsenal. Before the emergence of Boosting Algorithms for example, XGBoost and AdaBoost, SVMs had been commonly used. in 1992 and has become popular due to success in handwritten digit recognition in 1994. However, it is mostly used in solving classification problems. It is used for solving both regression and classification problems. One of the most prevailing and exciting supervised learning models with associated learning algorithms that analyse data and recognise patterns is Support Vector Machines (SVMs).
