- SentinelOne uncovered macOS malware “Gaslight” that uses prompt injection to mislead AI‑assisted triage tools during analysis
- Beyond standard backdoor and infostealer capabilities, it embeds fake Markdown “system” messages to trick LLMs into halting investigation
- Researchers warn defenders to treat malware samples as adversarial input and isolate AI pipelines, as more analyst‑targeting prompt injection is expected
We’ve seen prompt injection in websites and emails, but what about – malware samples? Security researchers SentinelOne recently published an in-depth report on a newly uncovered piece of macOS malware called Gaslight that, as the name suggests, tries to gaslight AI-assisted triage agents into stopping the analysis.
The malware itself is nothing out of the ordinary: it infects the device by whatever means necessary (usually phishing and social engineering), connects to attacker-controlled infrastructure via Telegram, and then executes different commands such as profiling the device, running arbitrary shell commands, stealing files, or terminating processes.
It also delivers a stage-two malware that acts as an infostealer, pulling passwords, sensitive PDFs, cryptocurrency wallet information, and more.
Weaponizing LLM-assisted triage pipelines
But where Gaslight stands out is its defenses against AI-powered malware analysis. According to SentinelOne, the malware contains a large block of fake Markdown-formatted “system” messages designed for AI assistants that security researchers may use during reverse engineering. These messages claim things like “the AI’s authentication token has expired”, “the analysis environment is running out of memory”, “disk space has been exhausted”, “static analysis is unsafe”, and similar.
While a human analyst would definitely recognize these fake messages even at a glance, an LLM that isn’t properly isolated from untrusted input could interpret them as genuine system instructions and refuse to further analyze the malware.
“macOS.Gaslight is noteworthy for its analyst-targeting prompt injection, an attempt to weaponize the LLM-assisted triage pipelines that increasingly sit in the reverse-engineering loop,” SentinelOne explains. “Anyone building such tooling should treat the contents of the samples they triage as adversarial input, never as instructions, and be prepared to keep hostile content out of the model entirely. As LLM-assisted analysis becomes routine, defenders should expect more samples built to exploit it.”
The researchers have published a full list of indicators of compromise on this link.
Via The Hacker News
