| 6 min read
Recent advances in artificial intelligence (AI) with improved machine learning (ML) models have brought great benefits to a wide variety of industries. We already know that. (See, for example, our practical use of generative AI to help remediate vulnerabilities in our Continuous Hacking). However, as with any other information technology, they pose risks. AI systems are software, so they can be vulnerable and, consequently, receive cyberattacks from and act as attack vectors for malicious people.
That is recognized by the U.S. National Institute of Standards and Technology (NIST) in its report Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations, published this year and outlined in this blog post. Our focus here is more on the information shared about attacks than on mitigations. For a comprehensive understanding of this initiative, we invite you to read the entire report.
Human beings are accustomed to giving names to things, no matter how abstract they are. Terms and concepts often allow us to deal systematically with the problems we face in different periods of our existence as humanity. Hence comes —we can suppose— NIST's purpose to start elaborating a taxonomy and terminology (as shared knowledge) in adversarial machine learning (AML).
AML can be seen concretely as attacks targeting AI applications and their ML models. But it can also be understood more broadly as an investigative approach aimed at elucidating the goals, capabilities, and methods of AI adversaries or attackers and attempting to counter them with preventive and responsive measures.
AI adversaries seek to exploit vulnerabilities in these systems, especially in the different phases of the ML lifecycles (i.e., development, training, testing, and deployment). However, the current literature in AML focuses primarily on attacks within the training and deployment phases. It is important to note that attackers can exploit vulnerabilities not only in the ML models but also in the infrastructure in which the AI is deployed. Specifically, the machine learning models used in modern AI are susceptible to attacks through the public APIs that present them and against the platforms on which they are deployed. NIST's report focuses on the first case, viewing the second as part of traditional cybersecurity taxonomies.
NIST defines the taxonomy for AML based on five dimensions: (a) type of AI, (b) learning method and phase of the ML lifecycle in which the attack begins, (c) attacker goals, (d) attacker capabilities, and (e) attacker knowledge of the specific ML model and beyond. The report's authors admit that this will be a constantly evolving project, and for the moment, they provide a taxonomy of the most studied attacks. For the brief segmentation of the kinds of attacks we provide in our overview, we take the type of AI (predictive AI and generative AI) and the phase of the ML lifecycle (training stage and deployment stage).
Attacks against predictive AI
At the training stage
At this stage of the lifecycle, we find poisoning attacks against the ML model data and against the model itself. These attacks, which can have many variations, usually end up violating the integrity or the availability of the AI system.
In data poisoning attacks, the adversary controls the model's training data sets and can insert or modify samples at will. Thus, when the attacker inserts poisoned samples with a specific target label, the model learns that wrong label. So, for instance, if the model is an image classifier, after going through the training phase, it misclassifies some images in its testing period. If its task is, let's say, to identify traffic signs, it may then miss some of them or take other images as if they were such signs. Something similar has been demonstrated in the alteration of spam detectors.
On the other hand, in model poisoning attacks, the adversary controls the model and its parameters. There, the model's poisoning occurs with the insertion of malicious functionalities. For example, the attacker can inject a Trojan trigger in the model after having infected only some of its components that were brought as updates to be added to the server. Such infection, as in the case of poisoned data, can generate issues in the model's accuracy.
In some poisoning attacks, the attacker can focus on specific targets, be they smaller samples or particular components of the model, and insert previously generated backdoor patterns to, again, induce misclassifications. Finally, regarding availability, availability poisoning attacks cause indiscriminate degradation of the model on all samples and end up achieving something like a denial-of-service for the users of the AI system.
At the deployment stage
In evasion attacks, the adversary modifies the test samples to create so-called adversarial examples. These are samples that, having received a minimal perturbation, which may even be imperceptible to the human eye, are given by the system an altered classification that points to an arbitrary class chosen by the attacker. As samples that are very similar to the original ones and incorrectly classified by the model, they affect the system's predictions. Then, as in the case of poisoning attacks, an image classifier, for example, is fooled with such adversarial examples, and the attacker, for their malicious purposes, prevents the system from recognizing specific images as it is supposed to.
In privacy attacks, the attacker has query access to the ML model. Therefore, they can submit large numbers of queries to the model to receive information that allows them to achieve membership inference or data reconstruction. In the first case, the adversary can infer the presence or use of specific records or samples in the training data set employed by the model. In the second case, they can reconstruct particular content or characteristics of the model's training data. It is then reverse engineering to obtain private information about individual users, infrastructure or parameters of the AI system.
Sample reconstruction in, for instance, neural network models can occur because they tend to memorize training data. However, apparently, attackers often do not accomplish accurate data extractions but rather reconstructions of equivalent models that achieve prediction performances similar to those obtained by the original models. This is something that even allows adversaries to carry out more powerful attacks against the target systems.
Attacks against generative AI
Most of the attack types presented by NIST for predictive AI also apply to generative AI. However, other forms of security violation have been documented for this second branch of artificial intelligence. The most prominent is that of abuse violations. Here, the adversary manages to redirect or modify the purpose of the model (e.g., a large language model, LLM) to achieve his own goals through prompt injection. That is, the attacker, either directly or indirectly, injects text into the model intending to alter its behavior.
So, for example, by resorting to the capabilities of an LLM, the adversary can instruct it to generate offensive or misinformative text or images, as well as malware or phishing content, for subsequent propagation on the Internet. The fact that LLMs can be easily integrated with different types of applications facilitates the dissemination of malicious information and the generation of wide cyberattacks, something that, among their users, can affect their reliability.
Concerning privacy issues, there is also the possibility that the attacker simply asks an insecure gen AI model to repeat private or confidential information it has previously worked with. There have also been demonstrated cases of injection where, in some way, the adversary instructs the LLM to persuade end users to reveal some of their sensitive information. (See our post "Indirect Prompt Injection to LLMs.")
Finally, on the availability violation side, in this type of artificial intelligence, there are documented cases in which the attacker delivers malicious inputs or a large number of inputs to the ML model to provoke an overwhelming computational exercise on it and lead to the denial of service to other users.
What about risk mitigation?
As stated at the beginning, we do not intend to emphasize risk mitigation and consequence management methods for the attacks described above. This is partly because, while strategies such as data profiling, multi-modal training, behavior monitoring and supply chain authentication have proven useful, many techniques have also been ineffective. There have been imbalances between certain types of attacks and reliable mitigation techniques available.
To deal successfully with many of the above types of attacks, we expect further progress to be made in new preventive and defensive strategies. To this end, the contribution of adversaries is enormously essential. Yes, just as you read it. At Fluid Attacks, we know from our experience that attackers, when they do not have bad intentions, as is the case of ethical hackers, are invaluable support in identifying vulnerabilities and proposing solutions to them. They, along with other cybersecurity specialists, having reports such as the one provided by NIST at their disposal, will be able to warn us on how to cope with these threats.
For the time being, we recommend you make careful use of this technology and, as we do at Fluid Attacks, rely only on those AI providers whose services show strict security and privacy policies in storing and handling sensitive data.
Recommended blog posts
You might be interested in the following related posts.
Introduction to cybersecurity in the aviation sector
Why measure cybersecurity risk with our CVSSF metric?
Our new testing architecture for software development
Protecting your PoS systems from cyber threats
Top seven successful cyberattacks against this industry
Challenges, threats, and best practices for retailers
Be more secure by increasing trust in your software