Unmasking Privacy Backdoors: How Pretrained Models Can Steal Your Data and What You Can Do About It

August 15, 2024

6 5 minutes read

In an era where AI powers everything from virtual assistants to personalized recommendations, pre-trained models have become an integral part of many applications. The ability to share and refine these models has transformed AI development, enabling rapid prototyping, fostering collaborative innovation, and making cutting-edge technology more accessible to everyone. Platforms like Hugging Face now host nearly 500,000 models from companies, researchers and users, supporting this extensive sharing and refinement. However, as this trend grows, it brings new security challenges, especially in the form of supply chain attacks. Understanding these risks is critical to ensuring that the technology we depend on continues to serve us safely and responsibly. In this article, we explore the growing threat of supply chain attacks known as privacy backdoors.

Navigating the AI development supply chain

In this article, we use the term “AI development supply chain” to describe the entire process of developing, distributing, and deploying AI models. This includes several phases, such as:

Pre-trained model development: A pre-trained model is an AI model that is initially trained on a large, diverse data set. It serves as a basis for new tasks by being refined with specific, smaller data sets. The process begins with collecting and preparing raw data, which is then cleaned and organized for training. Once the data is ready, the model is trained on it. This phase requires significant computing power and expertise to ensure that the model learns effectively from the data.
Share and distribute model: Once pre-trained, the models are often shared on platforms such as Hugging Face, where others can download and use them. This sharing can include the raw model, refined versions, or even model weights and architectures.
Refinement and adaptation: To develop an AI application, users typically download a pre-trained model and then refine it using their specific data sets. This task involves retraining the model on a smaller, task-specific dataset to improve its effectiveness for a targeted task.
Stake: In the final phase, the models are deployed in real-world applications, where they are used in various systems and services.

Understanding supply chain attacks in AI

A attack on the supply chain is a type of cyber attack in which criminals exploit weak points in a supply chain to penetrate a more secure organization. Rather than attacking the company directly, attackers compromise a third-party vendor or service provider on which the company depends. This often gives them access to the company’s data, systems or infrastructure with less resistance. These attacks are particularly damaging because they exploit trusted relationships, making them harder to spot and defend against.

In the context of AI, a attack on the supply chain includes any malicious interference at vulnerabilities such as model sharing, distribution, tuning, and deployment. As models are shared or distributed, the risk of tampering increases, with attackers potentially embedding malicious code or creating backdoors. During tuning, integrating proprietary data can introduce new vulnerabilities that affect the reliability of the model. Finally, during deployment, attackers can target the environment in which the model is deployed, potentially altering its behavior or retrieving sensitive information. These attacks represent significant risks across the AI development supply chain and can be extremely difficult to detect.

Privacy backdoors

Privacy backdoors are a form of AI supply chain attack in which hidden vulnerabilities are embedded in AI models, allowing unauthorized access to sensitive data or the internal workings of the model. Unlike traditional backdoors that cause AI models to misclassify inputs, privacy backdoors lead to the leakage of private data. These backdoors can be introduced at various stages of the AI supply chain, but are often embedded in pre-trained models due to the ease of sharing and the common practice of tuning. Once a privacy backdoor is installed, it can be exploited to secretly collect sensitive information processed by the AI model, such as user data, proprietary algorithms, or other confidential details. This type of breach is especially dangerous because it can go unnoticed for a long time, compromising privacy and security without the knowledge of the affected organization or its users.

Privacy backdoors for stealing data: With this kind backdoor attacka malicious pre-trained model provider changes the model weights to compromise the privacy of all data used during future refinement. By building in a backdoor during the model’s initial training, the attacker sets up “data traps” that silently capture specific data points during tuning. When users refine the model with their sensitive data, this information is stored within the model’s parameters. Later, the attacker can use certain inputs to trigger the release of this captured data, gaining access to the private information embedded in the refined model’s weights. This method allows the attacker to extract sensitive data without raising any alarms.

Privacy loopholes for model poisoning: This type of attack deploys a pre-trained model to enable a membership inference attack, where the attacker wants to change the membership status of certain inputs. This can be done via a poisoning technique that increases the loss on these targeted data points. Corrupting these points can exclude them from the refinement process, causing the model to show greater loss during testing. As the model refines itself, it strengthens the memory of the data points it is trained on while gradually forgetting the poisoned data points, leading to noticeable differences in loss. The attack is performed by training the pre-trained model with a mix of clean and poisoned data, with the aim of manipulating losses to highlight discrepancies between included and excluded data points.

Preventing privacy backdoor and supply chain attacks

Some of the key measures to prevent privacy backdoors and supply chain attacks include:

Source authenticity and integrity: Always download pre-trained models from reputable sources, such as reputable platforms and organizations with strict security policies. Additionally, implement cryptographic checks, such as verifying hashes, to confirm that the model has not been tampered during distribution.
Regular audits and differential tests: Check both the code and the models regularly, paying close attention to any unusual or unauthorized changes. Additionally, perform differential testing by comparing the performance and behavior of the downloaded model against a known clean version to identify any discrepancies that could indicate a backdoor.
Model monitoring and logging: Implement real-time monitoring systems to track model behavior after deployment. Abnormal behavior may indicate the activation of a back door. Keep detailed logs of all model inputs, outputs, and interactions. These logs can be critical for forensic analysis if a backdoor is suspected.
Regular model updates: Retrain models regularly with updated data and security patches to reduce the risk of latent backdoors being exploited.

The bottom line

As AI becomes more and more embedded in our daily lives, protecting the supply chain for AI development is critical. While pre-trained models make AI more accessible and versatile, they also come with potential risks, including supply chain attacks and privacy backdoors. These vulnerabilities can expose sensitive data and the overall integrity of AI systems. To mitigate these risks, it is important to verify the sources of pre-trained models, perform regular audits, monitor model behavior, and keep models up to date. By staying alert and taking these preventative measures, we can ensure that the AI technologies we use remain safe and reliable.