CRYPTO-GRAM, November 15, 2024 Part 5
From
Sean Rima@21:1/229.1 to
All on Friday, November 15, 2024 16:13:32
Expect lots of developments in this area over the next few years.
This is what I said in a recent interview:
Let’s stick with software. Imagine that we have an AI that finds
software vulnerabilities. Yes, the attackers can use those AIs to break
into systems. But the defenders can use the same AIs to find software
vulnerabilities and then patch them. This capability, once it exists,
will probably be built into the standard suite of software development
tools. We can imagine a future where all the easily findable
vulnerabilities (not all the vulnerabilities; there are lots of
theoretical results about that) are removed in software before
shipping.
When that day comes, all legacy code would be vulnerable. But all new
code would be secure. And, eventually, those software vulnerabilities
will be a thing of the past. In my head, some future programmer shakes
their head and says, “Remember the early decades of this century when
software was full of vulnerabilities? That’s before the AIs found them
all. Wow, that was a crazy time.” We’re not there yet. We’re not even
remotely there yet. But it’s a reasonable extrapolation.
EDITED TO ADD: And Google’s LLM just discovered an exploitable zero-day.
** *** ***** ******* *********** ************* IoT Devices in
Password-Spraying Botnet
[2024.11.06] Microsoft is warning Azure cloud users that a Chinese
controlled botnet is engaging in “highly evasive” password spraying. Not sure about the “highly evasive” part; the techniques seem basically what you get in a distributed password-guessing attack:
“Any threat actor using the CovertNetwork-1658 infrastructure could
conduct password spraying campaigns at a larger scale and greatly
increase the likelihood of successful credential compromise and initial
access to multiple organizations in a short amount of time,” Microsoft
officials wrote. “This scale, combined with quick operational turnover
of compromised credentials between CovertNetwork-1658 and Chinese
threat actors, allows for the potential of account compromises across
multiple sectors and geographic regions.”
Some of the characteristics that make detection difficult are:
The use of compromised SOHO IP addresses The use of a rotating set
of IP addresses at any given time. The threat actors had thousands
of available IP addresses at their disposal. The average uptime for
a CovertNetwork-1658 node is approximately 90 days. The low-volume
password spray process; for example, monitoring for multiple failed
sign-in attempts from one IP address or to one account will not
detect this activity.
** *** ***** ******* *********** ************* Subverting LLM Coders
[2024.11.07] Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised
Vulnerabilities against Strong Detection“:
Abstract: Large Language Models (LLMs) have transformed code completion
tasks, providing context-based suggestions to boost developer
productivity in software engineering. As users often fine-tune these
models for specific applications, poisoning and backdoor attacks can
covertly alter the model outputs. To address this critical security
challenge, we introduce CODEBREAKER, a pioneering LLM-assisted backdoor
attack framework on code completion models. Unlike recent attacks that
embed malicious payloads in detectable or irrelevant sections of the
code (e.g., comments), CODEBREAKER leverages LLMs (e.g., GPT-4) for
sophisticated payload transformation (without affecting
functionalities), ensuring that both the poisoned data for fine-tuning
and generated code can evade strong vulnerability detection.
CODEBREAKER stands out with its comprehensive coverage of
vulnerabilities, making it the first to provide such an extensive set
for evaluation. Our extensive experimental evaluations and user studies
underline the strong attack performance of CODEBREAKER across various
settings, validating its superiority over existing approaches. By
integrating malicious payloads directly into the source code with
minimal transformation, CODEBREAKER challenges current security
measures, underscoring the critical need for more robust defenses for
code completion.
Clever attack, and yet another illustration of why trusted AI is essential.
** *** ***** ******* *********** ************* Prompt Injection Defenses Against LLM Cyberattacks
[2024.11.07] Interesting research: “Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks“:
Large language models (LLMs) are increasingly being harnessed to
automate cyberattacks, making sophisticated exploits more accessible
and scalable. In response, we propose a new defense strategy tailored
to counter LLM-driven cyberattacks. We introduce Mantis, a defensive
framework that exploits LLMs’ susceptibility to adversarial inputs to
undermine malicious operations. Upon detecting an automated
cyberattack, Mantis plants carefully crafted inputs into system
responses, leading the attacker’s LLM to disrupt their own operations
(passive defense) or even compromise the attacker’s machine (active
defense). By deploying purposefully vulnerable decoy services to
attract the attacker and using dynamic prompt injections for the
attacker’s LLM, Mantis can autonomously hack back the attacker. In our
experiments, Mantis consistently achieved over 95% effectiveness
against automated LLM-driven attacks. To foster further research and
collaboration, Mantis is available as an open-source tool: this https
URL.
This isn’t the solution, of course. But this sort of thing could be part of
a solution.
** *** ***** ******* *********** ************* AI Industry is Trying to
Subvert the Definition of “Open Source AI”
[2024.11.08] The Open Source Initiative has published (news article here)
its definition of “open source AI,” and it’s terrible. It allows for secret
training data and mechanisms. It allows for development to be done in
secret. Since for a neural network, the training data is the source code -- it’s how the model gets programmed -- the definition makes no sense.
And it’s confusing; most “open source” AI models -- like LLAMA -- are open
source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one rebuttal to the definition.)
---
* Origin: High Portable Tosser at my node (21:1/229.1)