What Is Data Poisoning in AI and When It Turns Into a Remedy
AI is all about learning from data and interactions. Pre-training data provided by the development team and ongoing information from users both greatly impact AI systems. Feeding incorrect data to AI can manipulate results and mess up its performance. This phenomenon is called data poisoning due to its ability to corrupt the AI's learning process.
What Are the Threats of Data Poisoning?
With the mainstream use of AI in the form of generative AI, applications like ChatGPT, Midjourney, Gemini and others, data poisoning takes various forms. These apps perform tasks based on user prompts and therefore, back-and-forth communication impacts the results they generate. Injecting misleading, poor-quality content into AI or giving it specific feedback can influence its training over time. AI’s performance may degrade or the system may follow malicious instructions such as revealing confidential data.
Computer scientist and OpenAI co-founder Andrej Karpathy previously shared a video explaining how different types of AI manipulations work. He mentioned that Large Language models are trained on large amounts of data from the internet. This presents the danger that attackers can use web pages with poisonous examples to compromise AI systems.
There are different types of data poisoning attacks. In a backdoor attack, for example, the data or the web page used to train the AI model can include a trigger phrase, pattern or image. If a user uploads a file with the aforementioned element into AI it will corrupt the model.
For instance, if the trigger word is 'James Bond' and a person uses it in their prompt, the AI model may generate random responses, fail at distinguishing threats, produce harmful results, or steal users' personal information.
Research by the University of Sheffield revealed that code created with the help of AI can be vulnerable to backdoor attacks and harm databases.
For example, a nurse could ask ChatGPT to write an SQL command so that they can interact with a database, such as one that stores clinical records. As shown in our study, the SQL code produced by ChatGPT in many cases can be harmful to a database, so the nurse in this scenario may cause serious data management faults without even receiving a warning.the paper explains.
Researchers mentioned that OpenAI fixed the vulnerabilities reported due to the study. However, the risks of data poisoning are high as attackers are constantly developing new strategies.
So, if you’re considering using ChatGPT or another AI app to create or proofread a confidential corporate document or a personal file with sensitive information, better drop the idea.
Data Poisoning as a Defense Mechanism to Protect Intellectual Property
Despite the threats, data poisoning isn’t pure evil. A dose of poison used in copyright protection tools can help artists, authors and other people from the creative industry refrain their works from unauthorized use.
Violation of intellectual rights through AI apps has been a concern for artists. Generative AI apps like Midjourney, DALL-E, and Stable Diffusion can mimic and merge artists’ works to create something new. To prevent this, a team from the University of Chicago, under the guidance of Professor Ben Zhao, created Nightshade and Glaze, free software tools that address copyright issues in different ways.
Nightshade is designed to disallow AI systems from scraping data from images by changing the pixels so that they look totally different to the AI. This tricks the AI into learning from the incorrect data. For example, an image of a person with an invisible change may be perceived as an image of a cat by AI. If a user uploads a photo that has been modified by Nightshade and asks the AI to generate a new image based on the first one, they might end up with an image of a cat instead of the person. Being trained with a large number of incorrect images over a period of time, the model's performance may decline. It will start to perceive one item for another.
Difference between clean and poisoned models. Source: https://arxiv.org/pdf/2310.13828
Glaze, on the other hand, aims to prevent the mimicry of an artist’s style. Like Nightshade, it makes small modifications to an artwork’s pixels that seem unchanged to the human eye, but appear different for AI. For example, a glazed version of a portrait with a realism style may appear as an abstract style for AI. So, when someone prompts AI to generate an image similar to the original they’ll get something completely different.
Currently, Nightshade and Glaze are the most popular data poisoning tools for protecting copyright. However, similar techniques can be used for text, video and audio types of content.
Data Poisoning Techniques vs AI Models
Data poisoning is a significant challenge for AI models due to the various strategies that attackers can use. As Andrej Karpathy mentioned in one of his posts on X, an attacker may use a special kind of text to poison the model in specific settings that only they know about. This trigger may hide within the model, making it secretly vulnerable. Karpathy notes that current standard safety fine-tuning may not protect AI models from poisoning attacks. To fight data poisoning, AI companies enhance their security measures through methods such as anomaly detection, continuous monitoring, and user reports.
Amid the growing competition between data poisoning techniques against LLMs, AI users need to be cautious about the data they input into apps. It’s advisable not to enter private information into an AI model and to avoid feeding AI data from unknown sources.