­

研究人臉識別中包含的訓練數據對個體人臉識別的影響(CS CY)

  • 2020 年 1 月 14 日
  • 筆記

現代的人臉識別系統利用包含數十萬張特定人臉影像的數據集來訓練深度卷積神經網路,以學習將任意個人的臉映射到其身份向量表示的嵌入空間。人臉識別系統在人臉驗證(1:1)和人臉識別(1:N)任務中的性能,直接關係到嵌入空間區分身份的能力。最近,大規模的面部識別培訓數據集(如MS-Celeb-1M和MegaFace)的來源和隱私問題受到了公眾的密切關注,因為許多人不願意讓自己的臉被用於培訓能夠實現大規模監控的兩用技術。然而,個人在訓練數據中的包含對派生系統識別能力的影響以前沒有被研究過。在這項工作中,我們審計ArcFace,一個最先進的,開源的人臉識別系統,在一個大規模的人臉識別實驗中,有超過一百萬張分散注意力的影像。我們發現,在該模型的訓練數據中,個人的排名-1人臉識別正確率為79.71%,而那些不存在的人的正確率為75.73%。這種準確性上的微小差異表明,使用深度學習的人臉識別系統對接受培訓的個人效果更好,如果考慮到所有主要的開源人臉識別培訓數據集在收集過程中都沒有獲得個人的知情同意,這就會產生嚴重的隱私問題。

原文題目:Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

原文:Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals' faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual's face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with their face being used to train dual-use technologies that can enable mass surveillance. However, the impact of an individual's inclusion in training data on a derived system's ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that face recognition systems using deep learning work better for individuals they are trained on, which has serious privacy implications when one considers all major open source face recognition training datasets do not obtain informed consent from individuals during their collection.

原文作者:Chris Dulhanty, Alexander Wong

原文地址:https://arxiv.org/abs/2001.03071