Windows10 + pytorch FaceBoxes による顔検出実験

2020-09-28

FaceBoxes

CPUでもリアルタイムかつ高精度に顔検出ができるらしい。
2017年に発表されてて新しめ。

Zhang, Shifeng, et al. “Faceboxes: A CPU real-time face detector with high accuracy.” 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2017.

以前使用したMTCNNと比較する。

以前のやつ↓
(Windows10 + Anaconda で　facenet-pytorchからMTCNNを使用する)

1. 環境構築
1.1. Conda 環境構築
1.2. ファイルダウンロード
1.3. ファイルダウンロード
1.4 コード改造
2. 実験
2.1. 実験結果
2.2. 実験結果の比較
3. 速度計測
3.1. 実験結果
4. コード
- 4.1. コード(faceboxes.py)
4.2. コード(2.2. 実験用)
4.3. コード(速度計測 facecet-mtcnn)
4.4. コード(速度計測 faceboxes)

1. 環境構築

FaceBoxesのgithub(本家)によると、Pytorch版を再実装したとのことなのでそっちを使う。
本家はCaffeで実装されている。

こっちを使う↓
FaceBoxesのgithub(pytorch版)

1.1. Conda 環境構築

torchさえあれば多分動く

1	conda install pytorch torchvision cpuonly -c pytorch

1.2. ファイルダウンロード

githubからコードを全部ダウンロード。

学習済みモデルFaceBoxesProd.pthは別途Google Driveに置いてくれている。
githubからリンク先へ行ってダウンロード。
rootにweightsフォルダを作ってその下に置く。

1.3. ファイルダウンロード

githubからコードを全部ダウンロード。

1.4 コード改造

デモ用のtest.pyは好きな画像に対して使えないので改造する。
importしたら顔検出の関数detectが呼べるfaceboxes.pyを作った。
コードは全部最後に載せてる。

さっきダウンロードしてきたweightsやutilsなどを使うので同じ場所に置く。
面倒なのでcythonのビルドはしていない。

2. 実験

テスト画像に対して顔検出の実験。

2.1. 実験結果

以前行った別手法の結果(facenet-pytorch の MTCNN)と比較。

2.2. 実験結果の比較

FaceBoxes による顔検出結果(今回の)

facenet-pytorch の MTCNN による顔検出結果(前回の)

ほぼ同じ。

3. 速度計測

速いらしいので比較実験する。

3.1. 実験結果

同じ画像に対してそれぞれの手法で100回顔検出を行うのにかかった時間を計測。
10回の平均値を結果とする。

Windows 10
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
画像サイズ 3 × 512 × 512

	faceboxes	facenet-mtcnn
1回目	5.618955135(s)	17.39090157(s)
2回目	5.335961819(s)	16.88390636(s)
3回目	5.629956007(s)	17.12989831(s)
4回目	5.494957447(s)	16.97990084(s)
5回目	5.65395546(s)	17.11790347(s)
6回目	5.292955399(s)	16.91492677(s)
7回目	5.702960253(s)	16.86787176(s)
8回目	5.695954084(s)	17.3439045(s)
9回目	5.751956224(s)	16.90490079(s)
10回目	5.412961483(s)	17.06890321(s)
average	5.559057331(s)	17.06030176(s)

3倍近く早い。

4. コード

今回の実験で使用したコード。

4.1. コード(faceboxes.py)

from __future__ import print_function
import os
import argparse
import torch
import torch.backends.cudnn as cudnn
import numpy as np
from data import cfg
from layers.functions.prior_box import PriorBox
#from utils.nms_wrapper import nms
from utils.nms.py_cpu_nms import py_cpu_nms
import cv2
from models.faceboxes import FaceBoxes
from utils.box_utils import decode
from utils.timer import Timer

#### Model Initialize
pretrained_path = "weights/FaceBoxesProd.pth"
device = torch.device("cpu") # "cpu" or "cuda"
torch.set_grad_enabled(False)
net = FaceBoxes(phase='test', size=None, num_classes=2)    # initialize detector
pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage)
f = lambda x: x.split('module.', 1)[-1] if x.startswith('module.') else x
net.load_state_dict(pretrained_dict, strict=False)
pretrained_dict = {f(key): value for key, value in pretrained_dict.items()}
net.eval()
cudnn.benchmark = True
net = net.to(device)
#### Model Initialize

#### Function
def detect(img_source, confidence_threshold=0.9, top_k=5, nms_threshold=0.3):
    img = img_source.copy()
    im_height, im_width, _ = img.shape
    scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])

    img -= (104, 117, 123)
    img = img.transpose(2, 0, 1)
    img = torch.from_numpy(img).unsqueeze(0)
    img = img.to(device)

    loc, conf = net(img)  # forward pass

    priorbox = PriorBox(cfg, image_size=(im_height, im_width))
    priors = priorbox.forward()
    priors = priors.to(device)
    prior_data = priors.data
    boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance'])

    priorbox = PriorBox(cfg, image_size=(im_height, im_width))
    priors = priorbox.forward()
    priors = priors.to(device)
    prior_data = priors.data
    boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance'])
    boxes = boxes * scale
    boxes = boxes.cpu().numpy()
    scores = conf.squeeze(0).data.cpu().numpy()[:, 1]

    # ignore low scores
    inds = np.where(scores > confidence_threshold)[0]
    boxes = boxes[inds]
    scores = scores[inds]

    # keep top-K before NMS
    order = scores.argsort()[::-1][:top_k]
    boxes = boxes[order]
    scores = scores[order]

    # do NMS
    dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
    keep = py_cpu_nms(dets, nms_threshold)
    #keep = nms(dets, args.nms_threshold,force_cpu=args.cpu)
    dets = dets[keep, :]

    return dets

4.2. コード(2.2. 実験用)

import faceboxes
import cv2
import numpy as np

#### Config
input_path = 'test1.jpg'
output_path = "out_" + input_path

#### Read Image
img_raw = cv2.imread(input_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)

#### Face Detection
det = faceboxes.detect(img)
box = det[0]

#### Save Image
cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), (255, 255, 255), thickness=4)
cv2.imwrite(output_path, img)

4.3. コード(速度計測 facecet-mtcnn)

from facenet_pytorch import MTCNN
from PIL import Image, ImageDraw

# read image
img_path = "test1.jpg"
img = Image.open(img_path)
mtcnn = MTCNN() 

# time
import time
for times in range(10):
    start = time.time()
    for i in range(100):
        boxes, _ = mtcnn.detect(img, landmarks=False)
    print(time.time() - start)

4.4. コード(速度計測 faceboxes)

import faceboxes
import cv2
import numpy as np

#### Config
input_path = 'test1.jpg'

#### Read Image
img_raw = cv2.imread(input_path, cv2.IMREAD_COLOR)
img = np.float32(img_raw)

#### Face Detection
det = faceboxes.detect(img)
box = det[0]

#### Save Image (cropped image & detected image)
cropped = img[int(box[1]):int(box[3]), int(box[0]):int(box[2])]
cropped = cv2.resize(cropped, dsize=(160, 160))
cv2.imwrite("cropped_" + input_path, cropped)
cv2.rectangle(img, (box[0], box[1]), (box[2], box[3]), (255, 255, 255), thickness=4)
cv2.imwrite("detected_" + input_path, img)

タグ FaceBoxes, Image Processing, Python, pytorch