研究人脸识别中包含的训练数据对个体人脸识别的影响(CS CY)

  • 2020 年 1 月 14 日
  • 筆記

现代的人脸识别系统利用包含数十万张特定人脸图像的数据集来训练深度卷积神经网络,以学习将任意个人的脸映射到其身份向量表示的嵌入空间。人脸识别系统在人脸验证(1:1)和人脸识别(1:N)任务中的性能,直接关系到嵌入空间区分身份的能力。最近,大规模的面部识别培训数据集(如MS-Celeb-1M和MegaFace)的来源和隐私问题受到了公众的密切关注,因为许多人不愿意让自己的脸被用于培训能够实现大规模监控的两用技术。然而,个人在训练数据中的包含对派生系统识别能力的影响以前没有被研究过。在这项工作中,我们审计ArcFace,一个最先进的,开源的人脸识别系统,在一个大规模的人脸识别实验中,有超过一百万张分散注意力的图像。我们发现,在该模型的训练数据中,个人的排名-1人脸识别正确率为79.71%,而那些不存在的人的正确率为75.73%。这种准确性上的微小差异表明,使用深度学习的人脸识别系统对接受培训的个人效果更好,如果考虑到所有主要的开源人脸识别培训数据集在收集过程中都没有获得个人的知情同意,这就会产生严重的隐私问题。

原文题目:Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

原文:Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals' faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual's face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual's inclusion in training data on a derived system's ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection.

原文作者:Chris Dulhanty, Alexander Wong

原文地址:https://arxiv.org/abs/2001.03071