Summary
Deepfake fraud now scales like phishing, but arrives with executive-level credibility. The economics changed permanently in the last 18 months.
Your controls activate after the decision is already made. Payment systems, fraud detection, and audit logs all sit at the transaction layer, two layers downstream of where the attack succeeds.
Training will not fix this. These attacks are engineered to satisfy the exact signals human judgment relies on. The problem is architectural, not behavioral.
The only coherent defense moves controls upstream to the identity layer, where synthetic trust signals are manufactured. That means simulation and real-time detection, not awareness programs.
Every organization already has a breaking point. The question is whether you find it, or an attacker does.
Identity has become cheaper to fake than to verify. That changes everything.
Not the technology itself, which is improving faster than most security teams are tracking. What changes is the assumption underneath most enterprise defenses: that a familiar voice or a recognizable face on a video call is a signal worth trusting. That assumption is now a liability. The cost of holding it is rising.
In building detection and simulation infrastructure for enterprises facing these attacks, we have come to understand deepfake fraud in a specific way that differs from most industry analysis. The risk is not primarily technical. It is economic. For the first time, attacks can be both highly personalized and cheap to execute at scale. That combination changes everything about how organizations need to think about trust.
The organizations most exposed are not the ones that haven't heard of deepfakes. They are the ones that have, and believe their current controls are sufficient.
It reframes deepfake fraud not as a media authenticity issue but as a structural vulnerability in how enterprise decisions get made. Once you see it that way, the inadequacy of most current defenses becomes obvious: they are designed to catch fraud at the transaction layer, while the Synthetic Trust Problem operates upstream of it entirely.
→How Deepfake Attacks Actually Work Inside OrganizationsThe specific patterns, case studies, and where controls fail. Covered in our companion piece.How did deepfake fraud become an enterprise-scale threat?
The evolution of fraud over the past decade follows a clear trajectory. Understanding it is what makes the Synthetic Trust Problem inevitable, not surprising.
Before around 2015, fraud was largely manual. Social engineering required human effort: phone calls, direct impersonation, patience. That limited scale. A well-executed attack could be effective, but repeating it at volume was hard.
From roughly 2015 to the early 2020s, automation changed the math. Phishing kits, botnets, and credential stuffing made it possible to reach enormous numbers of targets with minimal marginal effort. The tradeoff was realism: these attacks were efficient but generic, and they depended on volume to work.
What is emerging now breaks that tradeoff entirely.
With generative AI, scale and customization are no longer in tension: messages can be personalized, voices can be cloned, identities can be simulated. None of it requires the manual effort that used to make targeted attacks expensive. Deepfake fraud has grown by 2,000–3,000% in recent years, a pace of proliferation that places it in a different category from previous fraud waves [BIIA 2026]. The number of detected incidents increased tenfold between 2022 and 2023, and the first half of 2025 saw nearly four times as many incidents as all of 2024 combined [Surfshark 2025].
How cheap has it become to fake someone's identity?
The barrier to entry for deepfake fraud has not just fallen. It has collapsed in a way that fundamentally changes who can mount these attacks, how often, and against whom.
In 2019, a high-quality deepfake video cost between $300 and $20,000 per minute, required specialist expertise and took days to produce [Kaspersky 2023]. The 2019 UK energy CEO voice fraud (€220,000 extracted via a cloned voice) was notable precisely because it required that investment. It was a craft operation.
By 2024, real-time deepfake video had dropped to $50 per clip on dark web markets, voice cloning to $30 [CyberSecureFox 2025]. Commercial APIs now price voice cloning at $0.01–$0.20 per minute [Surfshark 2025]. The deepfake Biden robocall that disrupted New Hampshire's 2024 primary cost $1 and took under 20 minutes [Deepstrike 2025]. IBM puts the average creation cost at $1.33 [IBM 2025]. A 99.99% price compression in five years.
Deepfakes have existed in some form for years, and for most of that time they were detectable with moderate scrutiny: flickering edges, mismatched lip sync, unnatural blinking, lighting inconsistencies. What has changed in the last twelve to eighteen months is qualitative, not just quantitative.
Real-time face swaps now maintain consistent skin tone across lighting conditions. Voice clones replicate not just pitch and accent, but the micro-patterns of someone's speech: hesitation rhythms, filler words, the way a person trails off at the end of a sentence. Lip sync has reached the point where enterprise-grade video calls are indistinguishable from the real thing even to people who know the target personally. The technology did not merely get cheaper. It crossed a threshold of authenticity that changes what trust means in a live interaction.
It is that combination of near-zero cost and fidelity that satisfies human judgment under real conditions. That is the engine of the Synthetic Trust Problem.
The consequence shows up directly in the loss data. From 2019 to 2023, when costs were still relatively high, cumulative deepfake fraud losses totaled $130 million across five years. In 2024 alone, losses approached $400 million. By 2025, losses exceeded $1.56 billion, with over $1 billion occurring in that single year [Surfshark 2025]. That trajectory tracks almost exactly with the cost compression curve.
The accessibility story compounds this further. Today, as little as three seconds of audio is sufficient to generate an 85% voice match to the original speaker [StationX 2025]. That source material is freely available for most executives: earnings calls, conference presentations, LinkedIn videos, internal town halls. A 2026 study described the resulting phenomenon as operating at industrial scale [The Guardian 2026].
From a systems perspective, the Synthetic Trust Problem sits upstream of most enterprise defenses. Traditional fraud targets systems, credentials, or transactions. Deepfakes target identity itself, before technical safeguards are engaged and before any system has a reason to intervene.
What does a real deepfake attack look like from the inside?
The most documented case is the February 2024 Hong Kong fraud. The mechanics are worth walking through slowly, because each step is unremarkable until you see them together.
This is what makes the case instructive beyond its scale. The attackers did not breach anything. They reconstructed a normal internal workflow convincingly enough that the employee had no reason to step outside it. The fraud was complete before any system was involved.
Other cases follow the same logic at smaller scale. Voice cloning used to impersonate executives in urgent payment requests has caused over $217 million in documented losses globally [Surfshark 2025]. The FBI has investigated cases in which more than 300 companies unknowingly hired impostors using deepfake technology in video interviews [US DOJ 2024].
Across these examples the common element is not the medium. It is the Synthetic Trust Problem in action: the ability to manufacture the signals that make a decision feel legitimate, at the exact moment that decision is about to be made. Momenta built its detection and simulation infrastructure around this specific insight: understanding where trust breaks is the prerequisite for protecting it.
The Synthetic Trust Problem
In every significant enterprise deepfake attack on record, the decisive judgment was made before any control had a reason to activate. The attack did not defeat the system. It operated upstream of it entirely.
Will training employees stop deepfake fraud?
Most deepfake defenses are focused on the wrong layer.
The conventional response is that training is all you need. Teach employees to be skeptical. Run awareness programs. Update policies. The implicit assumption is that people fail because they don't know enough, and that if they knew more, they would behave differently. That framing is incomplete, and the industry has been slow to say so directly.
The problem is not knowledge alone. It is architecture. These attacks are specifically engineered to satisfy the exact signals human judgment is built to trust. A voice that sounds right. A face that matches. A context that fits. Awareness training helps — but no amount of it changes the fact that a sufficiently convincing attack is, by design, indistinguishable from a legitimate interaction at the moment it counts. Training raises awareness. It does not change the architecture, and architecture is where the problem lives.
The Synthetic Trust Problem requires a structural response, not a behavioral one. That means moving controls earlier in the chain and detecting synthetic identity during live interactions, not coaching employees to second-guess their CFO on every call.
A familiar voice in an expected context. A recognizable face. An urgent request from someone with authority.
These are not weaknesses unique to naive employees. They are the signals that enable organizations to function. Deepfakes exploit the infrastructure of trust rather than its absence. That distinction matters: you cannot patch your way out of a problem that is built into how decisions are made.
What we consistently see is that the gap between awareness and readiness is wider than most organizations expect. Knowing that deepfakes are a threat does not translate into knowing how your specific teams, in your specific workflows, behave when a convincing one arrives. That gap between knowing the threat exists and knowing where your organization breaks is exactly where losses are occurring right now.
The Synthetic Trust Problem cannot be solved by the same controls that were designed before identity was cheap to fake. Updating the underlying assumption is the prerequisite for everything else.
Why are organizations that defend one channel still exposed?
One pattern that emerges consistently in our work with enterprise clients is the underestimation of how attacks combine modalities.
The instinct is to treat voice fraud and video fraud as separate problems requiring separate defenses. In practice, the most effective attacks use both simultaneously. A fabricated voice on a phone call is concerning. A fabricated executive on a video call, with matching voice and behavior, operating inside a familiar meeting workflow, is a structurally different threat, one where each additional signal reduces the space for doubt before a decision is made.
Nearly half of businesses globally (49%) reported experiencing audio or video deepfake incidents by 2024 [European Parliament 2025]. Most enterprise security programs still treat these as distinct threat vectors requiring separate responses. That is the gap attackers are already using.
Effective protection against the Synthetic Trust Problem requires thinking across the full communication surface: not just where an attack starts, but where it is completed.
How large is the deepfake fraud problem, and where is it heading?
The scale of what is coming is not speculative. The Deloitte Center for Financial Services projects that generative AI-enabled fraud losses in the US alone will reach $40 billion by 2027, growing from $12.3 billion in 2023 at a compound annual rate of 32% [Deloitte 2024]. To put that in context: $40 billion is larger than the entire GDP of a number of European nations. It is not a niche fraud problem. It is a macroeconomic one.
That projection was made before the most recent cost and quality data was fully available. It may prove conservative.
The broader implication extends beyond individual enterprises. Identity infrastructure (the systems that verify who someone is across financial services, hiring, access management, and communications) was built on the assumption that live presence and voice are difficult to fabricate. Regulators, identity providers, and platform operators are only beginning to grapple with what it means to rebuild that infrastructure for a world where those assumptions no longer hold. The organizations that move first are not just protecting themselves. They are shaping what the next generation of identity verification looks like.
When attack costs were high, deepfake fraud was episodic: it happened to other companies, in dramatic cases that made the news. As costs approach zero, the profile changes. It stops being episodic and starts being continuous. It stops requiring specialist knowledge and starts being accessible to anyone. It stops being reserved for high-value targets and starts being economically rational against any organization where a single transfer, a single hire, or a single bypassed identity check could generate a return.
Near-zero attack cost breaks detection logic. When an attack costs $1 to attempt, the attacker can afford to fail many times. Detection built on volume-based pattern recognition, the basis of most fraud monitoring, is poorly matched to an adversary who generates a fresh, contextually tailored attack for each target at negligible cost. The average enterprise loss per deepfake incident was nearly $500,000 in 2024 [Eftsure 2025] and that figure is more likely a floor than a ceiling.
What is the most effective defense against deepfake fraud?
If the Synthetic Trust Problem operates at layer one, at the identity layer before any system is consulted, then there is only one coherent structural response: move the defense to layer one as well. That means two things, in sequence.
First, know where your layer one breaks. Not in theory, but in practice, under realistic conditions, in the specific channels your teams use. Simulation is the only way to generate that picture. Not just awareness training, which tells employees that deepfakes exist and asks them to be more careful. Controlled simulation that delivers convincing synthetic attacks through the channels your teams actually use and measures how they respond: what decisions get made, which workflows fail, where verification never happens. Momenta's simulation platform was built specifically to produce that map. You cannot defend a perimeter you have not walked.
Second, detect synthetic identity in real time. Once you know where trust breaks, you need the ability to identify synthetic voices and video during live interactions, not after the fact. This is where most legacy fraud tools fail structurally: they are built for layer three, evaluating transactions after human judgment has already been applied. Momenta's detection engine operates at layer one, analyzing audio and video streams in milliseconds, producing a risk score before a decision is acted on, and triggering security actions during the interaction itself: step-up authentication, transaction holds, call routing.
The Synthetic Trust Problem has a specific shape. The defense has to match it.
The Synthetic Trust Problem has a simple, uncomfortable logic. As attack cost approaches zero, volume limits disappear. As fidelity approaches human-indistinguishable, detection by human judgment fails. As identity collapses as a reliable signal, every workflow that treats it as a constant becomes a potential entry point. Most security is built for layer three. The attack happens at layer one.
Every organization already has a breaking point. The question is whether you find it, or an attacker does.
Find your breaking point before an attacker does.
Run a controlled simulation across your actual communication channels. Get a behavioral map of where your organization's trust breaks down, in less than 48 hours.