Given a feature matrix \(X\) and corresponding labels \(y\), a support vector machine (SVM) aims to find an optimal separating hyperplane that
best classifies or fits the data. Optimal here means that it maximizes the distance (also called the margin) from nearest datapoint to the hyperplane.