hans

hans

【Caffe】Multi-label training, multi-task training for facial attributes.


Introduction#

Referenced https://zhuanlan.zhihu.com/p/22190532

There are some details that need to be addressed, so I will go through the entire process and explain the issues involved. I will also discuss deployment issues at the end.

The code I used in this article can be found here: https://github.com/HansRen1024/Face-Attributes-MultiTask-Classification

Main Content#

Download the project https://github.com/HolidayXue/CodeSnap from the original author's GitHub website.

Place convert_multilabel.cpp in the caffe/tools/ directory.

Modify line 81:

'>>' should be '> >'

Then comment out line 149.

Run the following commands in the command line under caffe/:

make clean
make all -j8
make py

I used the CelebA dataset, which consists of facial attributes.

Download link: https://pan.baidu.com/s/17rp2gKqtvuT48yuPfJK3sA

Find list_attr_celeba.txt in the Anno directory.

Run the following command in the command line:

sed -i 's/  / /g' list_attr_celeba.txt

Replace two spaces in the document with one space.

Then you can extract the facial attributes you want using the following code:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Mon Aug 20 16:57:52 2018

author: hans
"""

ReadTxt = 'list_attr_celeba.txt'
WriteTxt = 'train.txt'
r = open(ReadTxt,'r')
w = open(WriteTxt,'w')
rLine = r.readline().split('\n')[0]
while rLine:
    rLine = r.readline().split('\n')[0]
    if not rLine:
        break
#    image,bangs,eyeglasses,gender,
    wLine = rLine.split(' ')[0]+' '+rLine.split(' ')[6]+' '+rLine.split(' ')[16]+' '+rLine.split(' ')[21]+'\n'
    w.write(wLine)
r.close()
w.close()

Then run the following command in the command line:

sed -i 's/-1/0/g' train.txt

Change -1 to index 0 in the train.txt document.

I compared the original images and found that the extracted attributes are correct.

Then take out a portion of train.txt and put it into val.txt

Run the script to generate lmdb data for training:

echo "Creating train lmdb..."
~/caffe-multi/build/tools/convert_multilabel \
-resize_height=227 \
-resize_width=227 \
-shuffle=false \
/home/hans/data/face/CelebA/Img/img_align_celeba/ \
train.txt \
./train_db \
./train_lb \
3

echo "Creating val lmdb..."
~/caffe-multi/build/tools/convert_multilabel \
-resize_height=227 \
-resize_width=227 \
-shuffle=false \
/home/hans/data/face/CelebA/Img/img_align_celeba/ \
val.txt \
./val_db \
./val_lb \
3

img_align_celeba is the dataset with cropped faces.

The parameter 3 indicates that I extracted three facial attributes.

Finally, modify mcnn_Attri.prototxt, including mean, normalization, data path, and an important step, change the backend to LMDB!

name: "MCNN_Attri"
layer {
  name: "data"
  type: "Data"
  top: "data"
  transform_param {
  	 scale: 0.007843
     mean_value: 127.5
     mean_value: 127.5
     mean_value: 127.5
     crop_size: 227     
  }  
  include {
    phase: TRAIN
  }
  data_param {
    source: "/home/hans/data/face/CelebA/attri/doc/train_db"
    batch_size: 192
    backend: LMDB
   }
}

layer {
  name: "labels"
  type: "Data"
  top: "labels"
  include {
     phase: TRAIN
  }
  data_param {
    source: "/home/hans/data/face/CelebA/attri/doc/train_lb"
    batch_size: 192
	backend: LMDB
  }
}

layer {
  name: "data"
  type: "Data"
  top: "data"
  transform_param {
  	scale: 0.007843
    mean_value: 127.5
    mean_value: 127.5
    mean_value: 127.5
    crop_size: 227   
  }
  include {
    phase: TEST
  }
  data_param {
    source: "/home/hans/data/face/CelebA/attri/doc/val_db"
    batch_size: 128
    backend: LMDB
  }
}

layer {
  name: "labels"
  type: "Data"
  top: "labels"
  include {
    phase: TEST
  }
  data_param {
    source: "/home/hans/data/face/CelebA/attri/doc/val_lb"
    batch_size: 128
    backend: LMDB
  }
}

layer {
  name: "sliceL"
  type: "Slice"
  bottom: "labels"
  top: "label_attr6"
  top: "label_attr16"
  top: "label_attr21"
  slice_param {
    slice_dim: 1  
	slice_point: 1
	slice_point: 2
  }
}

There are three outputs, and slice_point is reduced to 2.

【2018.08.22 Update】-----------------------------------------------

In this network structure, it is divided into six groups after bn2, each group has different outputs. Each group has the same layer structure before it, and then goes to the fully connected layer and the final output layer of each task. Delete the corresponding unnecessary task layers and groups. For my three categories, I found that gender is in a separate group, glasses are in the fourth group, and bangs are in the sixth group. In order to reduce the number of parameters, I put glasses and bangs in the sixth group and deleted the network before the fourth group. I haven't made any changes to the first group of gender yet because I'm worried that putting all three tasks in one group may cause non-convergence. But I will definitely test it in the future.

(The training output is very good, with very high accuracy on the validation set, but when actually testing, it is found that the accuracy for gender, glasses, and bangs separately in each group is not accurate. Next, I will put glasses and bangs in separate groups and change the fully connected layer to global average pooling.)

(I put glasses and bangs in separate groups, and also deleted the fully connected layer. The model became smaller, only 8M. The accuracy on the validation set is still quite high, around 98%, but the actual experimental results are very poor. Next, I will try to change the core network, using mobilenet v2 or squeezenet. At the same time, I considered that it may not necessarily be due to the network, so I observed the dataset and found that there are very few images with bangs and glasses. If changing the network doesn't work, then the next step should be to do argumentation on some of the data.)


【2018.08.29 Update】-----------------------------------------------

The results on the validation set and test set are very good, but the results are extremely poor when using the camera or video in practical applications.

I found the reason, it was due to the incorrect preprocessing method during testing.


【2018.09.04 Update】-----------------------------------------------

Some insights:

  1. Using fully convolutional layers + global pooling instead of fully connected layers has little effect on the performance and the number of model parameters remains the same.

  2. Fully connected layers have a large number of parameters, but they execute quickly. In other words, fully connected layers have little impact on the execution time of the model, but have a greater impact on the size of the model.

  3. I checked the execution time of each layer and found that large convolutional kernels are not efficient. It is better to use small 3x3 convolutional kernels.

  4. The number of output channels in convolutional layers is best to be a power of 2.


Finally, modify the loss_weight so that the sum of all task loss_weights equals 1. Since I only have three tasks, I set the loss_weight for each task to 0.3333.

I won't go into detail about the solver.prototxt, as it is the same as for single-task training.

Finally, here is a screenshot of the training output:

1668715386436.jpg

I will continue to update in the future...#

1. Visualization code Uploaded to GitHub on August 21, 2018, show.py

2. Caffe-Python inference code Uploaded to GitHub on August 21, 2018, face_attri.py

  1. ncnn-C++ deployment code
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.