[SSD] Train your own dataset using the caffe-ssd framework and MobileNet network.

Introduction#

In the previous blog post, I went through the entire process using the VGG network provided by the author, and now I am going to try training with MobileNet.

There are still two unresolved issues:

Mean value problem. [Solved on 2017.11.20]
Unable to finetune using the caffemodel provided by MobileNet. [Solved on 2017.11.3]

[Updated on 2017.11.03] Successfully converted to ncnn format.

1. Collecting and organizing the dataset#

Reference: Train your own dataset with VGG network using caffe-ssd framework

Things used are shown in the figure below:

chuanqi305/MobileNet-SSD

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.

Python20851178

Please refer to the path for downloading.

3. Generating train.prototxt, test.prototxt, deploy.prototxt#

Here, I need to mention something in advance.

There is a default MobileNet_deploy.prototxt without a batch_norm layer. It can be used together with the caffemodel provided by the official website.
There are four files in the template. Two deploy files, one train file, and one test file.

After verification, the train and test files are consistent with the network structure we will generate later.

But we don't need the files in the template directory because we need to manually modify the num_output of the layers in the later part. It's troublesome!

The author provides two tools for generating prototxt files. One is gen.py and the other is gen_model.sh.

I prefer to use the latter because it's easier to operate.

Run the command:

sh gen_model.sh 7

The number after the command is the number of categories + 1.

The generated files are located in the example directory.

Here's one thing, if your dataset is not entirely RGB three-channel.

Then you need to modify the train.prototxt and test.prototxt files:

transform_param {
    scale: 0.007843
    force_color: true  ### Add
    mean_value: 127.5
    mean_value: 127.5
    mean_value: 127.5
}

For specific error messages, you can refer to my previous blog post Train your own dataset with VGG network using caffe-ssd framework.

Then, modify the paths of source and label_map_file in the files separately, and try to use absolute paths as much as possible.

4. Modifying the solver.prototxt file#

I directly used the solver_train.prototxt, so change the paths accordingly and try to use absolute paths for other parameters.

Here, let me talk about a problem that troubled me for a long time.

At first, to debug, I habitually set test_initialization to true.

Then it kept showing "Couldn't find any detections".

But there was no problem during the training phase.

Later, I realized the problem. Because I didn't finetune, the val phase couldn't detect anything at the beginning.

After setting test_initialization to false, train for a while and then val, it won't show the error of not finding anything.

5. Training script#

#!/bin/sh

/home/hans/caffe-ssd/build/tools/caffe train \
--solver="/home/hans/data/ImageNet/Detection/cup/MobileNet-SSD/doc/solver_train.prototxt" \
-gpu 6 2>&1 | tee /home/hans/data/ImageNet/Detection/cup/MobileNet-SSD/MobileNet-SSD.log

As can be seen from the update on 2017.11.03, after training, you need to use the author's tools to merge the bn layer and conv layer parameters together to speed up the calculation.

Modify the paths and file names in merge_bn.py.

The newly generated model should be slightly smaller than the original model.

6. Visualization of training output (2017.11.02)#

Reference: Train your own dataset with VGG network using caffe-ssd framework

7. Testing the model's performance (2017.11.03)#

Reference: Train your own dataset with VGG network using caffe-ssd framework

Afterword#

Why does it report an error when finetuning with a pre-trained model!!!! It says the number of inputs for conv0 is incorrect (1 vs. 2). It should be 1, but why is it passing in 2??? I compared and checked the train.prototxt and test.prototxt files in various ways, but it didn't work!!!

-------[Solved the problem of not being able to finetune on 2017.11.3]-------

Finally found the reason!

I'm releasing the model that can be finetuned and the deploy file.

https://pan.baidu.com/s/16dw-dJ779By9AWdiSOxSuQ

The author has released two caffemodels on GitHub. One is for training, and the parameters of the bn layer and conv layer are separate in this model. The other one is for deployment, and the parameters of the bn layer and conv layer are merged together in this model.

The author said that the merged model will be faster.

-------[Successfully converted to ncnn format on 2017.11.3]---------

arlose/ncnn-mobilenet-ssd

mobilenet ssd @ ncnn

C++7230

I recommend using the .param file provided by the author above. If you are using your own training data, just modify the number of categories in eight places.

For the conversion I did myself, there are two problems:

First, the missing first two layers, Input and Split. To solve this problem, modify the beginning of the deploy.prototxt file to:

layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 3 dim: 300 dim: 300 } }
}

Second, the processing of the later layers becomes 600*600 pixels.

The .bin file must be converted using the caffemodel processed by merge.py!

-------[Updated on 2017.11.16]---------

I was using the ncnn released in June before.

These days, when I handed over the porting work, I found that the official ncnn has already added support for SSD.

And it also added model compression, although it is only compression in terms of the number of calculations and there will be a discount on the effect, it is still very practical.

-------[Updated on 2017.11.20]---------

The make_mean.sh provided does not calculate the mean value. I found two tools for converting to lmdb, one with annotations and one without.

The ssd uses the one with annotations for the detection method conversion.

I calculated the mean value using the old method. Use build/tools/convert_imageset and build/tools/compute_image_mean.

Shell script code:

First, convert the images to the normal lmdb format.

#!/bin/sh
set -e

size=300

TOOLS=/home/hans/caffe/build/tools
cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )

DATA_ROOT="${cur_dir}/data/"
FILE_PATH="${cur_dir}/doc"

echo "Creating train lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$size \
    --resize_width=$size \
    --shuffle=false \
    $DATA_ROOT \
    $FILE_PATH/train.txt \
    $FILE_PATH/train_lmdb_mean

Compute the mean value and save the output to mean.txt.

#!/bin/sh

TOOLS=/home/hans/caffe/build/tools

$TOOLS/compute_image_mean doc/train_lmdb_mean doc/mean.binaryproto 2>&1 | tee doc/mean.txt

echo "Done."