Ravie LakshmananJun 09, 2026AI Security / Software Supply Chain
Microsoft on Monday confirmed that it temporarily removed some GitHub repositories in response to a...
Microsoft on Monday confirmed that it temporarily removed some GitHub repositories in response to a recent security incident that led to 73 of its open-source projects being compromised to inject an information stealer into the code.
“Our priority is to protect customers and the broader ecosystem,” a Microsoft spokesperson told The Hacker News via email. “We temporarily removed some repositories as we investigated potential malicious content. Some of these repos have been restored after review, while others may remain offline while work continues.”
“As part of our investigation, we notified a small number of customers who may have pulled down content from the affected repositories. We will continue to investigate, and if anything further is identified that requires customer action, we will reach out directly through our established support channels.”
The development comes days after the Windows maker cut off access to dozens of its open-source projects hosted on GitHub following reports that they were compromised as part of an ongoing software supply chain campaign codenamed Miasma.
Among the projects that were infected included “durabletask,” a Python package that was first compromised last month by a cybercrime group known as TeamPCP to deliver an information stealer designed for Linux systems.
Further analysis of the Miasma payload embedded into the projects has uncovered capabilities to trigger automatic code execution when an unsuspecting developer opens the repository in an artificial intelligence (AI)-powered coding tool or integrated development environment (IDE).
The findings are the latest in a sustained software supply chain campaign that has breached widely used open-source packages to plant malware capable of propagating to downstream users and beyond.
This includes a newer PyPI wave tied to the broader Mini Shai-Hulud, Miasma, and Hades waves, infecting an additional set of 23 packages, including some bioinformatics-related libraries used in graph learning, patient phenotyping, phenopacket tooling, and scientific workflows.
Some of the other packages include a set of AI and Model Context Protocol (MCP)-themed packages and typosquat-style packages such as rsquests, tlask, and rlask that impersonate requests and flask, and a langchain-core-mcp. The complete list of legitimate and bait packages is below –
dreamgen 1.8.1
embiggen 0.11.97
ensmallen 0.8.101
gpsea 0.9.14
instructor-mcp 1.15.2, 1.15.3
langchain-core-mcp 1.4.2, 1.4.3
mem8 6.0.1
mflux-streamlit 0.0.3, 0.0.4
openai-mcp 2.41.1, 2.41.2
orchestr8-platform 3.3.2
phenopacket-store-toolkit 0.1.7
ppkt2synergy 0.1.1
pyphetools 0.9.120
ray-mcp-server 0.2.1
rlask 3.1.7
rsquests 2.34.3
tiktoken-mcp 0.13.1, 0.13.2
tlask 3.1.4
The new cluster employs a new payload delivery mechanism, per Socket, indicating that the threat actors are adapting and actively experimenting with different methods as part of what has been described as a “fast-moving supply chain campaign.”
While the earlier packages used executable .pth startup hooks to bootstrap Bun and run an obfuscated JavaScript stealer, the latest set incorporates different approaches –
Trojanized native .abi3.so extensions that execute the stealer when the package is imported
A .pth startup hook loader variant that searches sys.path for the “_index.js” payload instead of bundling the payload in the same wheel
“That last variant separates the loader from the JavaScript payload, which could make the package look less obviously malicious during static analysis,” Socket told The Hacker News.
Regardless of the method used, the end result is the same. Once executed, the malware targets developer workstations and CI/CD environments, harvesting high-value secrets and exfiltrating them to a public GitHub repository.
A key capability of the bioinformatics package is its ability to derail and bypass AI-powered scanners and analyst copilots by means of an adversarial prompt injection embedded within a JavaScript block comment, a feature previously detailed by StepSecurity.
“The Hades branch of the Shai-Hulud and Miasma activity is best understood as a fast-moving supply chain campaign, not a single package incident,” Socket researcher Kirill Boychenko said. “The langchain-core-mcp variant goes further by installing a .pth loader that searches sys.path for _index.js, meaning the loader and payload do not need to live in the same wheel.”
Politicians from across the spectrum have called for calm after a stabbing in Belfast, Northern Ireland.
There are fears that there could be widespread disorder after figures on social media including Elon Musk called for people to fill the streets in protest against immigration. The alleged perpetrator of the attack, which was filmed and shared widely online, was revealed today as an asylum seeker from Sudan.
The attack happened at about 10.30pm on Monday outside a block of flats in north Belfast. Graphic video shared on social media showed a man straddling another man on the ground and striking at his head and neck.
The clip showed people intervening to stop the assault, with one man, later named locally as Maitiu Mag Tighearnan, using a hurling stick against the attacker multiple times.
Police said the arrested man was believed to be Sudanese and in his 30s. They had initially said he was thought to be from Somalia. The victim, who has not been named, is in his 40s.
As of Tuesday evening, the suspect was in custody and the victim was in a serious condition in hospital. Police said the victim had serious injuries to his eyes, and serious slash wounds to his back and face.
Figures from across the world have taken to social media to call for protests against immigration. Some posts from accounts in Northern Ireland announce that roads are “closed” for a protest and warn that all businesses in the area should shut at 5.30pm to prepare for disorder.
Stephen Yaxley-Lennon, a far-right agitator who refers to himself as Tommy Robinson shared the video of the attack on Monday night and posted a call for protests in central London and elsewhere in the UK.
Sudanese business owners on Sandy Row, a loyalist area of central Belfast, closed their stores with steel shutters by 4pm and said they planned to stay at home that night.
The Belfast Islamic Centre cancelled evening prayers and said police advised them that the next 24 hours would be crucial. “We are telling our congregation to go home, don’t go out, look after your children, don’t share rumours and do listen to the authorities,” said Ameer Ibrahim, a project manager who spoke in a personal capacity.
The first minister, Michelle O’Neill, told the public not to be persuaded by social media accounts to start causing disorder. She said: “For all of those people out there who are stoking up tensions in that social media space who are happy to raise tensions, they do not represent us. We are good people and I don’t want to see anybody living in fear.”
The deputy first minister, Emma Little-Pengelly, added she was issuing a “plea for calm”, acknowledging that people would be feeling a “bag of emotions” but added: “don’t allow those people who don’t care about people here to incite hatred and incite fear”.
Rightwing commentators from England and the US, including the MP Rupert Lowe and billionaire and owner of X Elon Musk have been posting about the attack. Musk shared a list of potential protest areas in the UK and wrote “Only by protesting REPEATEDLY and LOUDLY will there be any change!!”
Jon Boutcher, the chief constable of PSNI, told a press conference on Tuesday: “We are aware of course of protest activity being planned across Northern Ireland tonight. We understand that people will be feeling enraged with emotions … but please, please let the PSNI do their job unfettered and undistracted from wider concerns there may be about disorder.” He added: “The challenge we face with today’s online toxic nature is that people are incited by people who are faceless and know nothing about this brilliant vibrant place. Do not be fooled or duped by people online.”
There was much discussion and speculation over the alleged perpetrator’s immigration status. He is understood to have a five-year visa after travelling from Dublin to Belfast via bus and claiming asylum. In a social media post, Nigel Farage, the Reform UK leader, said authorities must immediately disclose the suspect’s identity and immigration status.
Little-Pengelly said: “The UK must be able to deport these people much more swiftly than they have thus far. Importantly, people must know their legitimate concerns are listened to. Communities must feel that the protections that are in place are working to keep them safe”
The assistant chief constable, Ryan Henderson, told reporters the suspect was in the country legally.
It is understood that the PSNI has held emergency meetings to draw up plans for how to deal with any unrest, after far-right figures online called for people to take to the streets wearing masks.
Henderson said there would be an “increased police presence” across Northern Ireland in case of unrest. “People will feel a range of emotions from fear to anger,” he said.
The Northern Ireland secretary, Hilary Benn, said that protests were “not going to help anyone” because they would “stretch police resources”.
Reform UK’s home affairs spokesperson, Zia Yusuf, said: “The horror of what you have seen in Belfast is a direct result of treacherous Tory and Labour immigration policy.
“Reform has already announced a total ban on visas for anyone from Sudan. Enough is enough.”
Keir Starmer called the attack horrific and sickening. “I have absolutely no tolerance for abhorrent scenes of violence like this on our streets. My thoughts are first and foremost with the victim, and I thank the first responders, including members of the public, who intervened.”
Los Angeles Rams offensive lineman Alaric Jackson was arrested Monday night on suspicion of felony domestic violence as the NFL’s offseason domestic violence problem has begun growing to epidemic proportions.
Jackson, 27, joins the growing list of current and former NFL players who this offseason have run afoul of the law and been arrested, charged, tried or sued in civil court over domestic violence allegations.
It is a problem Fox News Digital asked the NFL to address on Tuesday. The league so far has not answered the request for comment.
Los Angeles Rams offensive tackle Alaric Jackson leaves the field after a game against the Arizona Cardinals at SoFi Stadium in Inglewood, Calif., on Jan. 4, 2026.(Jayne Kamin-Oncea/Imagn Images)
Los Angeles Rams offensive lineman Alaric Jackson plays against the Arizona Cardinals at State Farm Stadium in Glendale, Ariz., on Dec. 7, 2025.(Mark J. Rebilas/Imagn Images)
Jackson, the Rams’ starting left tackle the past three seasons, was taken into custody after Los Angeles police assigned to Topanga Area responded to a radio call of a “battery domestic violence” incident on the 7400 block of Cliffside Court in West Hills, the LAPD told Fox News Digital.
Officers learned the player and a woman had gotten into a verbal argument because Jackson believed the woman was recording him with her phone, according to KNBC-TV. The 6-foot-7, 338-pounder took the phone out of her hands.
Police said the woman had scratch marks on her arms.
Jackson was arrested on suspicion of felony domestic violence and bail was set at $50,000. Due to California victim confidentiality requirements related to domestic violence investigations, no further information is being released at this time.
The case will be submitted to the Los Angeles County District Attorney’s Office for filing consideration.
This, of course, is another black eye for the NFL because rather than a narrative of the league’s offseason being about teams improving or preparing for minicamps, it is another example of domestic abuse by large, strong professional athletes against weaker women.
And that has been the story multiple times this offseason.
Denver Broncos linebacker Jonathon Cooper stands on the field before the game at Empower Field at Mile High in Denver, Colo., on Dec. 21, 2025.(Ron Chenoy/Imagn Images)
Denver Broncos linebacker Jonathon Cooper: Arrested June 4 in Colorado on domestic violence/criminal mischief allegations. He apologized on social media and then pleaded not guilty on Monday. Trial is set for July 22.
Packers running back Josh Jacobs: Arrested May 26 on several domestic-abuse-related charges, including felony strangulation. His attorneys denied the allegations while the phone call to police by neighbors alleges an audible disturbance in the player’s home. The District Attorney’s investigation is ongoing.
Atlanta Falcons linebacker James Pearce Jr.: Arrested on Feb. 7 after alleged domestic dispute involving WNBA player Rickea Jackson, who is his girlfriend. He rammed her vehicle with his as she was driving to the police station in Doral, Florida. He was charged with aggravated battery, aggravated stalking and fleeing/eluding. He entered into the Miami-Dade County pre-trial intervention and diversion program which effectively pauses a conviction until he meets court-ordered conditions.
Kansas City Chiefs star Rashee Rice: His ex-girlfriend filed a civil lawsuit in February alleging repeated domestic violence. While the NFL closed its investigation with no discipline, the suit is ongoing.
New England Patriots defensive lineman Christian Barmore: On March 9, he faced a trial on a misdemeanor domestic assault-and-battery charge but at the hearing prosecutors dropped the charge because the alleged victim told them she had moved out of state and did not wish to return for trial.
Darron Lee of the Kansas City Chiefs walks off the field before a game against the Tennessee Titans at Nissan Stadium in Nashville, Tenn., on Nov. 10, 2019.(Wesley Hitt/Getty Images)
Free agent wide receiverStefon Diggs: The former Patriots WR was tried in May on felony strangulation and assault charges involving his former chef. The jury found him not guilty. The Patriots cut Diggs in March, ostensibly for salary cap cost reduction purposes. No other team has signed Diggs.
Former New York Jets linebacker Darron Lee: This is the most serious of all the incidents. The former Jets first-round pick was arrested in February in Tennessee after authorities responded to a call at the residence Lee shared with his girlfriend Gabriella Carvalho Perpetuo. She was pronounced dead and Lee was charged with first-degree murder when Perpetuo was found to have suffered severe brain trauma, a broken neck, bruising, bite marks and stab wounds. The potential capital punishment case is pending.
Former Kansas City Chiefs and Miami Dolphins wide receiver:Tyreek Hill: He’s accused of domestic violence in court filings and became the subject of an NFL investigation. The allegations arose from divorce proceedings initiated by his estranged wife, Keeta Vaccaro, who filed for divorce in April, alleging eight separate incidents of domestic violence. Hill, through his attorneys, has denied the allegations. No criminal charges have been filed.
Many players and coaches around the league are doing great things in their communities throughout the offseason, but every domestic abuse arrest detracts from that, and instead adds to a concern that athletes paid to play a violent sport are too often bringing that violence home.
House Republicans on Tuesday will seek to pass a $70bn bill to fund the agencies leading Donald Trump’s crackdown on undocumented immigrants through the duration of his term, ending a months-long standoff with Democrats.
The Secure America Act, which passed the Senate last week, allocates $38bn to Immigration and Customs Enforcement (ICE), $26bn to Customs and Border Protection and $5bn more to the Department of Homeland Security (DHS).
It is expected to pass the House of Representatives along party lines, and end a blockade of funding for the agencies that Democrats announced in January after federal agents killed two US citizens in Minneapolis amid a crackdown on undocumented immigrants.
Passing the measure will nonetheless be a tough haul for the speaker, Mike Johnson, who will need all 218 of his Republican-aligned lawmakers in attendance to vote the bill through the lower chamber against what is expected to be unanimous opposition from Democrats.
“House Democrats will be a hard no on the reckless Republican budget reconciliation bill this week,” Hakeem Jeffries, the minority leader, said on Monday.
There may nonetheless be surprises awaiting the bill as House lawmakers begin debating it on Tuesday afternoon. Congressional Republicans remain concerned by Trump’s plan for a nearly $1.8bn “anti-weaponization” fund that would pay out his allies.
The acting attorney general, Todd Blanche, told a House committee last week that the proposal was dead, but the president refused to rule out its creation in an interview broadcast on Sunday.
As the bill was being considered by the Senate last week, a small group of Republicans sought to find bipartisan compromise on an amendment that would bar the fund, without success.
The legislation was also delayed by uproar over an attempt to include $1bn for security improvements related to the ballroom Trump is building at the White House. Senate Republicans eventually agreed to remove those funds, after the chamber’s parliamentarian ruled it could not be included if the measure was to pass using the budget reconciliation procedure to circumvent the Democratic filibuster.
Florida Republican gubernatorial primary candidate James Fishback, who has called abortion “a holocaust,” wants to close every abortion clinic throughout the Sunshine State.
“Ron DeSantis is the most pro-life Governor in America, and I intend to build on his incredible work. As Governor, I will shut down the 53 abortion clinics that remain in Florida and replace every single one with a crisis pregnancy center,” Fishback told Fox News Digital in a statement on Tuesday.
“These centers will offer free ultrasounds, baby food, diapers, and counseling, and even prenatal and postpartum care. Abortion is never the answer. Every expecting mom in Florida deserves real support, and as Governor, I will make sure she gets it,” he added.
James Fishback, a Republican, is a candidate for Florida governor.(Fishback for Florida campaign)
The Guttmacher Institute, which describes itself as “a leading research and policy organization committed to advancing sexual and reproductive health and rights (SRHR) worldwide,” indicated in a report earlier this year that as of December 2025 there were 49 clinics providing abortions in Florida, down from 53 as of March 2024.
Last week in a post on X, Fishback asserted, “100% of abortions are murder. And as Governor, I’ll treat them as such.”
Florida Lt. Gov. Jay Collins and former speaker of the Florida House of Representatives, former state Rep. Paul Renner, are also running in the contest.
U.S. President Donald Trump hosts Sen. Ashley Moody, R-Fla., Rep. Byron Donalds, R-Fla., and others while celebrating the 2025 NCAA men’s basketball Champion Florida Gators in the East Room of the White House on May 21, 2025 in Washington, D.C.(Chip Somodevilla/Getty Images)
The Cybersecurity and Infrastructure Agency wants to fundamentally reevaluate how it prioritizes risks and vulnerabilities, both for privately-owned critical infrastructure and within the federal government, acting director Nick Andersen said Tuesday.
The plans include a binding operational directive for federal agencies set to be published Wednesday and getting more specific with critical infrastructure owners and operators about which assets they need to protect most and how, Andersen said while speaking at an event hosted by Axonius in Washington, D.C. and talking with reporters afterwards.
The binding operational directive looks to revise how federal agencies do vulnerability management, he said. “Overall, our approach to date has been ‘A patch is released, apply this patch as quickly as you can,’” he said.
“We’re really asking people to take more of a focus on risk associated with each vulnerability. Is it with an asset that is internet-exposed? Does it align to a KEV entry?” he said, referring to CISA’s list of known exploited vulnerabilities. “Is it automatable in its exploitation? Really, we need to be able to highlight that some patches just aren’t as important as others, and plugging the holes for some vulnerabilities is simply not as important as others.”
Andersen said he has made setting the right priorities the focus of his tenure.
“We have to be okay with saying there are some systems that are less important than others, there are some elements of critical infrastructure that are less important than others,” he said. “Those things are very easy for us to rationalize [for] physical crises, but we need to start wrapping our minds around how we’re going to do that during cyber crises.”
Andersen said artificial intelligence-enhanced threats have fueled the directive in part, based on “a recognition that we’re a different dynamic environment with the shorter timeline to weaponization and exploitation,” but the discussions on the directive have been going on for months, before the splashy announcements about frontier AI models and the risks they might deepen. Wednesday’s directive is unrelated to the AI-focused executive order released by the Trump administration last week.
The idea of prioritizing certain potential hacking targets over others isn’t a new one in critical infrastructure, with concepts like “Section 9” designations under a 2013 executive order for entities whom an attack upon could have catastrophic effects; “systemically important critical infrastructure” designations, as recommended by the Cyberspace Solarium Commission; or the creation of the National Risk Management Center established during President Donald Trump’s first term but now the subject of proposed budget cuts.
Andersen said past concepts haven’t worked well, citing Section 9 designations as an example.
“We would sit here and say, ‘Congratulations, you’re with this company, and you’re a Section 9 entity, isn’t that fantastic?’” he said. “That’s really not the level of fidelity that we have to be able to get to to have a real measurable conversation about risk. I need to be able to go to a company and say, ‘Here’s the specific function you’re supporting that makes you more critical. Let’s have a conversation about the specific assets that support that function, and how do we get to a measurable level of resilience for those assets?’”
Those discussions need to get down to a “fine grain,” Andersen said.
“If I’ve got a major bank that I’m talking to, is it as important to me that the bank’s process that supports the bulk payment system is resilient, or is it just as important to me that the branch location two blocks away is continuing to operate?” he said. “Those things just are apples and oranges, even though it’s the same entity that might be affected.”
CISA’s capabilities under the Trump administration have drawn considerable scrutiny, given deep budget cuts at the agency, with more planned. The administration is now making moves to hire back personnel.
Andersen said the agency is working to hire 329 people, and will have job offers out to 182 of them by the end of June. He said the emphasis of the first tranche of hires under the hiring sprint is operational capabilities, meaning areas like emergency communications, infrastructure security and regional personnel.
The agency also has had some of its work hampered by the government shutdowns, such as the delay in plans for town-hall meetings about implementation of the Cyber Incident Reporting for Critical Infrastructure Act of 2022, which will require key owners and operators to report major incidents within 72 hours.
Andersen said he couldn’t set a date for finalization of regulations related to the law — which had already been delayed prior to any funding lapses — with those town halls now scheduled to begin next week.
“We could have a lot of comments that come to us and really radically change our way of thinking about what the need is here,” he said. “But our focus is just on what’s the original congressional intent behind CIRCIA. what is the greatest need that we’re going to be able to serve, and how it’s going to be able to further the mission that we have for the nation.”
A spyware firm has been targeting WhatsApp users with malicious links in contravention of a US court order forbidding it from doing so, Meta has said.
In a post, Meta said WhatsApp had “caught and disrupted spear phishing attempts” by NSO Group, which a spokesperson said targeted a handful of users in Jordan and Lebanon. It had also caught the group creating “test accounts and groups” on WhatsApp.
NSO was founded in Israel but, since last year, is under US ownership. It built the Pegasus spyware, at the time one of the most powerful surveillance tools ever – which used a vulnerability in WhatsApp to infiltrate users’ phones and harvest all their data: messages, photos, calls and more.
Last year, it lost a court case against Meta for exploiting WhatsApp to target people; Meta was awarded $167m in damages. A later case reduced this to $4m but placed a permanent injunction against NSO barring it from targeting WhatsApp and its users.
Meta said the latest attacks showed NSO had violated this injunction and it asked the court to hold the company in contempt of the order.
“To me, it’s an astonishing signal of hubris that NSO would do this while permanently enjoined from not doing it,” said John Scott Railton, a senior researcher at the Citizen Lab, which investigates digital threats against civil society.
“It either speaks to the fact that they think they wouldn’t get caught, or to the fact that they believe, rightly or wrongly, they have a special way to not face the consequences of violating a US federal permanent court injunction.”
Since the start of the Trump administration, reporting has suggested that NSO is searching for a way into the US market – and to do so is trying to get off the US commerce department “blacklist”, which bars it from doing business with US companies without specific approval.
It was placed there after the Biden administration determined it had acted “contrary to the foreign policy and national security interests of the US” over the widespread abuse from Pegasus.
The group appointed David Friedman, the US ambassador to Israel from 2017 to 2021 during Donald Trump’s first term, as executive chair last autumn and has engaged a lobbying firm close to the US president.
“They are the poster child for the lawless mercenary spyware industry. If they had chosen to not do this, their big effort to rebrand as an ethical spyware company that wants to make big moves into the US market would be more credible,” said Railton.
Earlier this year, Meta suggested that NSO was linked to a lawsuit brought against the company which alleged Meta could read users’ encrypted WhatsApp messages. The law firm that brought that case was also, at the time, representing NSO.
There have been a handful of cases since that have made similar claims, including one in Israel and another filed by the Texas attorney general, Ken Paxton.
“WhatsApp cannot access people’s encrypted communications and any suggestion to the contrary is false,” a Meta spokesperson, Rachel Holland, wrote in a statement about that lawsuit.
NSO Group did not respond to a request for comment.
We received early access to Mythos Preview for early capability testing a few weeks back. Below are the details on how we tested Mythos Preview, what we found, and what it means.
About three months ago, Anthropic invited us to help them assess the capability of a new model they thought represented a significant shift in capability. So we put it through our security gauntlet. Benchmarks, workflows, interactive use, and integrations.
Today, we can finally share details on how we tested Mythos Preview, what we found, and what it means.
Spoilers: This model is a major advance. It is substantially better than prior models at finding vulnerability candidates, especially when source code is available. It communicates with unusual technical precision, reasons well about code, and shows strong promise in complex domains such as native-code analysis and reverse engineering.
Our takeaway: Mythos Preview is a powerful tool for generating strong vulnerability leads and technically precise analysis. It is especially adept at analyzing source code with a security mindset. It’s not magic, though: a model is a brain without a body.
While source code audits are mostly a brain activity, live site pentests like the ones XBOW performs very much need a body whose skill and control can match the brain’s power.
Testing methodology
The first thing we did was assemble a diverse team of 10 experts from different parts of the company that could assess the model from different directions. We test all models with the same internal benchmarking system we have used to analyze Opus 4.7 and GPT 5.5. In this system, we take open source applications where vulnerabilities were previously discovered, freeze them at the vulnerable version, and run our agents against them.
But this time, we expanded our testing to analyze other angles as well:
The model’s judgment with regard to threat modeling, vulnerability validation, and safety
The model’s ability to read source code versus interact with live systems
Its ability to find exploits we’re not yet looking for in our standard assessments, e.g., native app vulnerabilities
A note on terminology: When people say “Mythos,” they sometimes refer to the raw model. In this evaluation, we explored Mythos Preview both inside Claude Code, and as a raw model, using it via its API as an engine for XBOW’s agents. We separate those cases because orchestration, tools, prompting, and live-site access materially affect outcomes.
Results
Our testers who tried out Mythos Preview in interactive use were quite impressed. “This is a lot closer to `just go and find something` than anything I’ve seen so far,” said one of them. We tried giving it our own source code, and it found weaknesses – nothing truly terrible, thankfully, but there were several items we wanted to repair.
We tried it on open source software, and at the end of week one, we had quite a few new vulnerabilities we had to disclose.
Our testers who tried out Mythos Preview on benchmarks were also quite impressed, but their appreciation was a slightly different kind: impressed _with data_. Their results also laid bare the difference between areas where the model was runaway powerful, and where it presented only a modest advance.
Finding a vulnerability isn’t the same as proving it’s exploitable.
See how XBOW orchestrates frontier models with live-site validation to prove which findings are real, with working exploit evidence.
Our key takeaways after analyzing Mythos Preview include:
It’s extremely powerful for source code audits.
It’s good, but less powerful, at validating exploits.
Its judgment is mixed. It can be too literal and conservative, and also tends to overstate the practical relevance of its findings.
It’s strong in native-code vulnerability discovery and reverse engineering.
Next-level vulnerability discovery
Mythos Preview presents a significant step up over all existing models, regardless of provider, on XBOW’s web exploit benchmark.
This benchmark is designed to test whether a model can help XBOW find validated, actionable vulnerabilities in live website environments. A case is counted as passed only when the system finds a validated way to act on the vulnerability (PoC||GTFO) after a series of 80 “actions,” where an action might be a shell or a Python script using standard commands or XBOW’s suite of attack tools.
Note: We haven’t included Opus 4.7 in this chart because that model interacts with our system in a unique way, making this particular stat less relevant for it – we’ve written up the full story here.
Compared to the newest model at the time (Opus 4.6), this was a strong increase:
The number of false negatives was cut by 42%.
In a variation where we gave both models the site’s source code, it was even cut by 55%.
This was the first instance of a theme that would surface again and again: Mythos Preview is impressive at writing code, but even more impressive at reading it.
Below are the pass rates of Mythos Preview, Opus 4.6, and GPT 5.5 as a function of the allowed number of actions (executed scripts). Mythos Preview finds vulnerabilities in significantly fewer iterations than Opus 4.6, although the difference to GPT-5.5 is less pronounced.
It becomes more clear when adding two considerations:
Models could choose many small steps or few large steps (more details here) – and that shouldn’t matter so much. Instead of giving a budget of actions, let’s consider a budget of output tokens.
Instead of mean pass rate, i.e., the probability of finding a vulnerability, it’s often more instructive to look at the odds for discovery, i.e., what ratio you would bet on the model getting a discovery right. Computationally, that’s the hit rate divided by the miss rate.
Under these considerations, the picture becomes much more clear: Token-for-token, Mythos Preview hones in on the vulnerability with absolutely unprecedented precision.
Live-site validation is the hard part
Mythos Preview is excellent at source-code reasoning, but our evaluation reinforced a practical truth: many exploitable issues do not appear as obvious defects in application source code. They emerge from configuration, dependencies, deployment choices, or the way otherwise safe components are combined.
For instance, a dependency on its own could be safe. The source code on its own could be safe. But the source code uses the dependency in an unsafe way and creates a vulnerability. As Gary McCraw famously declared, you won’t find the majority of defects by “staring at code” alone.
That’s of particular interest to us. XBOW performs pentests, where our target is a live site (the way an attacker sees it), whereas Mythos Preview as used, for example, by Project Glasswing excels at auditing source code (the way a developer sees it).
Interacting with the live site can be very powerful, but it brings a completely new, very delicate dimension into the mix. Does Mythos Preview change the balance here?
Due to the way we harvest our web benchmarks set, you can actually find the vulnerability from the code alone on that set. So it’s fair to ask: For these benchmarks, can Mythos Preview find an exploit without being allowed to interact with the live site?
It turns out that even for these benchmarks, where the vulnerability is purely in the code, removing access to the live site hurts performance more than removing access to source code. In many ways, live-site access matters more than source-code access. That, of course, is the XBOW value proposition: it gives frontier models a safe, structured way to interact with real application behavior and prove which findings are actually exploitable.
The results of XBOW powered by Mythos Preview are shown below.
We now have a solid answer to the question, “Can a model find something interesting in code?” Increasingly, the answer will be yes, even though “something” won’t be the same as “everything.”
But even then, the question still looming is, “Which of these findings are exploitable, reproducible, safe to test, and worth fixing?”
The answer lies in combining Mythos Preview’s powerful source code analysis with something like XBOW’s ability to analyze a live site safely, in an orchestrated, validated way.
It’s notable that, even though Mythos Preview suffers greatly from being denied access to the live site, other models suffer even more. Another confirmation that Mythos’ greatest strength is reading source code.
The best results are always, of course, with the combination of access to the live site and source code.
It allows the ideal detection pattern when XBOW orchestrates Mythos Preview: Analyze the source code to find a lead, probe the live site to understand how the weakness is reflected in the deployment, then craft an exploit from it.
Other findings
We also explored the model in terms of judgment, reverse engineering, assessment of native apps, and visual acuity.
Judgment results were mixed
Mythos Preview’s judgment results were more mixed than its discovery results. Across command safety, threat modeling, and trace triage, it was often careful and precise, but also literal and conservative. It rejected false positives better than many predecessors, but sometimes lost true positives when evidence did not formally satisfy its criteria or when the intended rule was broader than the written one.
This makes Mythos Preview valuable, but not self-sufficient: it needs precise prompts, explicit threat models, and validation infrastructure to turn strong reasoning into reliable security outcomes.
One bit that slightly shocked us here was Mythos Preview’s performance on our command safety benchmark, where we ask the models to consider whether a given script is safe to execute without impacting the target site. We hand-labeled a large set of example cases close to the edge of the decision boundary, and Haiku 4.5 delivered 90.1% accuracy.
We also optimized the prompts for Haiku 4.5, so the better comparison is Opus 4.6, which had a 81.2% accuracy … but Mythos Preview had only 77.8%.
When we probed deeper and looked at its reasoning, it would often have a point. There were cases that technically weren’t against the letter of the rules, but they were against the spirit. Opus 4.6 prioritized the spirit, but Mythos prioritized the letter.
The model is strong in native code and reverse engineering
Beyond web applications, the model showed substantial strength in native-code vulnerability discovery and reverse engineering.
In Chromium-related testing, it found more real bugs with fewer false positives than prior baselines. In V8 sandbox work, it identified true positives in a subtle threat model where previous approaches had produced many findings but no successful true positives. It also proved capable of triaging both its own results and competitor-model findings.
The reverse-engineering results were among the most striking. The model reasoned through unusual firmware and embedded systems contexts, including architectures and operating-system combinations that required more than rote pattern matching.
Browser interaction and visual acuity are strong enough for practical workflows
XBOW’s workflows often require models to interact with live websites through a browser interface. In that setting, visual acuity is important: the model needs to identify the right UI element and click in the right place.
The evaluated model performed extremely well on XBOW’s visual-acuity QA, roughly matching Sonnet 4.6 and dramatically outperforming Opus 4.6. It was not perfectly pixel-accurate when asked for exact coordinates, but it was practically effective at selecting the right browser actions.
We should note that Opus 4.7 also shone at this benchmark. Maybe the real story here isn’t “Mythos Preview is good,” but more: This is a specific area where recent Anthropic models had begun to deteriorate. But now Anthropic has caught that deterioration and reversed it.
Power at a cost
Mythos Preview is not just any new model: it’s a true titan.
But titans are big, and big means expensive. How much money are you willing to spend on how much assurance? Can you spend that same money differently to get better results?
At the time of writing, Mythos Preview is not yet available over public APIs, but Anthropic did mention that it would be 5x as expensive as an Opus model – already one of the more expensive options, token for token. Begging the question:
Could we give an agent powered by a different model more time , and still get more accuracy for less cost?
As it turns out: yes. If we normalize by estimated running cost, the picture is rather clear: Mythos Preview isn’t terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either.
This finding lines up with similar comparisons, e.g. Point Estimate’s analysis of the AI Security Institute’s benchmarking of Mythos Preview vs GPT-5.5: Mythos Preview is powerful, but the real choice is to either pay for an agent to use Mythos Preview for a bit, or to use GPT-5.5 for as long as needed. The better option depends on the use case; often, it’s the latter.
XBOW’s evaluation suggests that frontier models have taken a major step forward in vulnerability discovery. Mythos Preview is strong at finding candidate vulnerabilities, especially from source code, and shows impressive ability across web, native-code, and reverse-engineering tasks.
But it needs to be mounted in the right harness and equipped with the right tools to reach its full potential. And even then, it should just be one of the arrows in your quiver – depending on the task, it may be more sensible to let another model try several times than to let Mythos Preview try once.
Such considerations, after all, are one of the reasons XBOW maintains a cadre of models, rather than restricting itself to a single one.
To see XBOW’s powerful vulnerability validation capabilities in practice, please contact us for a demo.
Gunshots, water cannon and tear gas have been used by Kenya’s police in the central town of Nanyuki, where hundreds of protesters lit fires and hurled stones at law enforcement officers as they demonstrated against a quarantine centre for US citizens exposed to Ebola.
Tuesday’s violence came as the proposed quarantine centre at the town’s Laikipia Air Base has caused anger among Kenyans who accuse the United States of shifting the risks of caring for people exposed to the Ebola outbreak in eastern Democratic Republic of the Congo and Uganda onto Kenya.
Kenya has never recorded a case of Ebola, and many residents oppose bringing potential carriers of the virus into the country.
The centre is designed to have 50 isolation beds, run by US staff, and was nearing completion late last week.
Construction has continued despite a temporary halt order from Kenya’s High Court and vocal opposition from local politicians.
President William Ruto’s government has pledged to press ahead with the project, arguing that Kenya owes Washington for years of financial and technical support.
The US has committed $13.5m to support Kenya’s Ebola preparedness efforts.
Access Denied
You don’t have permission to access “http://hindi.gadgets360.com/ai/tcs-could-have-same-number-of-ai-agents-as-staff-in-few-years-infosys-news-11613898” on this server.