[Machine Learning][Python] 1. HoG + SVM Object Classification ---- "SVM Object Classification and Localization Detection"

----------[Updated on September 7, 2018]---

If you are reading this article and have downloaded the code from GitHub to go through the entire process, I strongly recommend that you carefully read all six articles in the "SVM Object Classification and Localization" series. The content is not extensive, but it will greatly help you understand the algorithm and code.

----------[Updated on January 22, 2018]----Dimensionality Reduction Algorithm-----

I read an article introducing the t-SNE dimensionality reduction algorithm, and the data provided in the article showed better results than PCA. Everyone can try it out.

Article Link

--------[Updated on December 22, 2017]----Process Explanation------

Let me summarize the best method I used in the process.

  1. First, extract features using HoG.

  2. Perform PCA on the features, and then optimize the parameters C and gamma using PSO. The purpose of dimensionality reduction is to speed up the PSO operation, otherwise it would be too slow, but the classification effect would be reduced.

  3. Train an initial SVM model using the features obtained in the first step and the parameters obtained in the second step.

  4. Optimize the SVM model using Hard Negative Mining.

  5. Perform sliding window detection and finally perform NMS bounding box regression.

Currently, there are 2700 positive samples and 2700 negative samples. The test set consists of 1200 negative samples and 700 positive samples.

Latest code GitHub link: https://github.com/HansRen1024/SVM-classification-localization

Recently, I have been studying traditional methods of object classification using HoG + SVM. However, the classification accuracy is not very high. I will share my insights.

First, I want to clarify that I won't go into the specifics of the theory. There is too much information available online, so please read and understand the basics before reading this article.

ImageNet dataset for all cup images: [https://pan.baidu.com/s?__biz=MjM5MTQzNzU2NA==&mid=2651656490&idx=1&sn=2d261af429e87be2240f3c2f133474e6&chksm=bd4c36b98a3bbfafc24ff22000c54efc173cf2fb471e94fdf9f7fd66a5325b00bd6b48e3aab2&mpshare=1&scene=1&srcid=0122sOEEObzptTmKC1krBwm5&pass_ticket=iOPmHbmsbPsOLP04XWnucMqmHkDfYpLtI9K3ivXfOtM%3D#rd)

1. Calculation of HoG Feature Count#

Let's start by discussing two parameters:

  1. Pixels of a cell

  2. Cells of a block

HoG is a feature extraction technique that involves sliding a window over an image. The window is called a block, and it contains cells. Feature extraction is performed within each cell. The features of all cells in a block are combined to form the features of that block. Finally, the features of all blocks are combined to form the features of the entire image.

When extracting features within a cell using HoG, we can consider it as being done based on angles. Typically, the angles are divided into 9 groups, with each group covering 40 degrees, resulting in a total of 360 degrees. Each group represents a feature value. With this information, we can calculate the total number of features for the entire image.

For a 300 × 600 image, with each cell defined as 15 × 15 pixels and each block containing 2 × 2 cells, there are a total of 10 × 20 blocks.

One cell: 9 features

One block: 4 × 9 = 36 features

Total features for the image: 10 × 20 × 36 = 3600 features

2. Explanation#

If we directly input the image into HoG without any preprocessing, factors such as the background can have a significant negative impact on the extracted features. Therefore, the code I provide for feature extraction includes two parts. One part involves cropping the object based on the bbox information in the XML file and then extracting features. The other part involves directly extracting features from the entire image. Since the final number of features depends on the size of the input image, the cropped images and the entire image need to be resized to a fixed size. For more details, please refer to the comments in the code.

Path Explanation:

./train/positive # Stores the training set of objects to be cropped for classification

./test/positive # Stores the test set of objects for classification

./train/positive_rest # Stores the training set of objects that do not need to be cropped

./train/negative # Stores the training set of objects without classification

./test/negative # Stores the test set of objects without classification


I obtained the images from ILSVRC, with over 8000 images in the training set, with an equal number of positive and negative samples. The test set consists of over 2000 images, with an equal number of positive and negative samples. Later, I performed PCA, which did not have much impact on the accuracy but increased the robustness of the model. In the next article, I will share the code with PCA included. The main reason for performing PCA was that I used PSO to optimize the SVM parameters C and gamma, and dimensionality reduction was necessary to reduce the computational complexity.

Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.