July 3, 2024
Alexa, Siri, Google Assistant vulnerable to malicious commands, study reveals


Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.


A new study from researchers at Amazon Web Services has exposed significant security flaws in large language models that can understand and respond to speech. The paper, titled “SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models,” details how these AI systems can be manipulated to produce harmful or unethical responses using carefully designed audio attacks.

As speech interfaces become ubiquitous, from smart speakers to AI assistants, ensuring the safety and robustness of the underlying technology is crucial. However, the AWS researchers found that despite built-in safety checks, speech language models (SLMs) are highly vulnerable to “adversarial attacks” — slight perturbations to the audio input that are imperceptible to humans but can completely alter the model’s behavior.

A diagram from the AWS research paper illustrates how a spoken question-answering AI system can be manipulated into providing unethical instructions on how to rob a bank when subjected to an adversarial attack. The researchers propose a pre-processing defense to mitigate such vulnerabilities in speech-based language models. (image credit: arxiv.org)

Jailbreaking SLMs with adversarial audio

“Our experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions,” the authors wrote. “This raises serious concerns about the potential for bad actors to exploit these systems at scale.”

Using a technique called projected gradient descent, the researchers were able to generate adversarial examples that consistently caused the SLMs to produce toxic outputs across 12 different categories, from explicit violence to hate speech. Shockingly, with full access to the model, they achieved a 90% success rate in compromising its safety barriers.

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Request an invite

The research demonstrates how adversarial attacks can be carried out across different spoken question-answering AI models, using techniques like cross-model and cross-prompt attacks to elicit unintended responses, underscoring the need for robust, transferable defenses. (image credit: arxiv.org)

Black-box attacks: A real-world threat

Even more alarming, the study showed that audio attacks crafted on one SLM often transferred to other models, even without direct access — a realistic scenario given that most commercial providers only allow limited API access. While the success rate dropped to 10% in this “black box” setting, it still represents a serious vulnerability.

“The transferability of these attacks across different model architectures suggests that this is not just an issue with a specific implementation, but a deeper flaw in how we currently train these systems to be safe and aligned,” said lead author Raghuveer Peri.

The implications are far-reaching as businesses increasingly rely on speech AI for customer service, data analysis, and other core functions. Beyond reputational damage from an AI going rogue, adversarial attacks could be used for fraud, espionage, or even physical harm if connected to automated systems.

Countermeasures and the road ahead

Fortunately, the researchers also propose several countermeasures, such as adding random noise to the audio input — a technique known as randomized smoothing. In experiments, this significantly reduced the attack success rate. However, the authors caution it is not a complete solution.

“Defending against adversarial attacks is an ongoing arms race,” said Peri. “As the capabilities of these models grow, so does the potential for misuse. It’s critical that we continue to invest in making them safe and robust against active threats.”

The SLMs in the study were trained on dialog data to achieve state-of-the-art performance on spoken question-answering tasks, reaching over 80% on both safety and helpfulness benchmarks prior to the attacks. This highlights the difficult balance between capability and safety as the technology advances.

With major tech companies racing to develop and deploy ever more powerful speech AI, this research serves as a timely warning that security must be a top priority, not an afterthought. Regulators and industry groups will need to work together to establish rigorous standards and testing protocols.

As co-author Katrin Kirchhoff put it, “We’re at an inflection point with this technology. It has immense potential to benefit society, but also to cause harm if not developed responsibly. This study is a step towards ensuring we can reap the benefits of speech AI while mitigating the risks.”



Source link