Summary
Attacks succeed by timing, not by being undetectable. The decisive judgment happens inside a normal-feeling process before any control activates.
Each additional signal narrows the space for doubt. Voice plus video plus context plus sequence, each signal reducing doubt before the next arrives.
The workflow is the exploit. Attackers recreate familiar situations deliberately. Single-channel defenses are blind to this by design.
Most analysis of enterprise deepfake fraud focuses on the wrong thing.
It focuses on the technology: how convincing the video is, how accurate the voice clone, how sophisticated the underlying model. Those details matter, but they are not what determines whether an attack succeeds. What determines success is timing. Specifically, whether the synthetic interaction completes before anyone inside the organization has a reason to question it.
In every significant enterprise deepfake attack on record, that timing has favored the attacker. Not because the technology was undetectable (in some cases it would have been, with the right tools in place), but because the attack was designed to fit so naturally inside a familiar workflow that detection was never triggered. The CFO on the video call. The urgent payment request. The candidate in the interview. None of these felt like attacks. That is what made them effective.
→The New Economics of FraudThe cost curve, the Synthetic Trust Problem, and why enterprise controls are positioned at the wrong layer.How do deepfake attacks actually get inside an organization?
The most effective attacks share one structural feature: they do not create unusual situations. They recreate familiar ones.
A routine video call with senior leadership. An urgent CFO request. A job interview with a credible candidate. They look like Tuesday. The closer an attack mirrors normal operations, the less room exists for doubt before a decision is made. Attackers borrow existing behaviors and redirect where they lead: process simulation, not disruption.
This is why most enterprise controls activate too late. They evaluate transactions and credentials, not the human judgments that authorize them.
What happens when every person on the call is synthetic?
Case 1: Hong Kong, February 2024HK$200M lost
The entire meeting was synthetic
A multinational firm in Hong Kong lost HK$200 million after an employee attended a video conference where every participant except themselves was a deepfake. [SCMP, Feb 2024] The scammers reconstructed the CFO and senior executives from publicly available footage. The employee approved a series of transfers.
A CFO request in the expected format does not trigger a verification reflex, because in every previous legitimate interaction, verification wasn't necessary. The attack was calibrated to stay below that threshold.
The control failure
Payment systems assume the human who approved the transfer had a legitimate reason to do so. When that assumption is wrong, the system functions exactly as designed and produces exactly the wrong outcome.
TakeawayThe attack succeeded before verification was ever triggered.
How do attackers build trust across multiple channels?
Case 2: Singapore, March 2025US$499,000 · recovered by luck
Five trust signals, one decision
Singapore police reported that a finance director lost more than US$499,000 in a fraud that unfolded across several weeks and multiple channels. The funds were ultimately traced and withheld, but the finance director had already made the decision to transfer. The recovery was luck.
Signal 01
Executive authority established
WhatsApp message impersonating the CFO. Right name, right tone, plausible rationale.
WhatsAppSignal 02
Legal formality added
A confidentiality agreement. Institutional weight, and a reason not to ask colleagues.
DocumentSignal 03
Third-party corroboration
A lawyer impersonation added independent validation. The instruction now had legal backing.
Phone / EmailSignal 04
Scheduling friction applied
Scheduling friction made the meeting feel earned, not rushed.
ZoomSignal 05: Decision point
Deepfake CEO and executives on video call
By this point, four signals had already done their work. The video call was not the moment of deception. It was confirmation of something that already felt legitimate.
Transfer authorizedWhat single-channel defenses miss
An organization that deploys voice detection but not video detection, or that verifies identity at onboarding but not at the moment of authorization, has created exactly the gaps this model exploits. Disruption was possible at several points, but only with controls across all those channels simultaneously.
TakeawayBy the time the video call happened, the decision had already been shaped. Each channel reinforced the last.
What happens when the attacker is hired through the front door?
Case 3: US Synthetic hiring, 2025300+ companies affected
The attack that starts on the inside
The FBI issued a warning that North Korean IT workers had been observed using AI and face-swapping technology during video job interviews to obtain positions at US companies, with the goal of gaining access to internal systems, source code, and sensitive data from the inside.
The instinct is to treat this as a hiring problem. It is not. A video interview tests competence and demeanor. It was never designed to verify that the face and voice on screen belong to the person on the application. That gap did not matter before identity could be synthesized in real time.
The same gap exists in any enterprise process where live video presence is used as a proxy for identity authenticity.
Vendor onboarding callsIdentity assumed from video presence
Remote access approvalsVisual presence used as verification
Executive verificationLive video treated as identity proof
KYC checksVideo presence as authenticity shortcut
The structural point
When an attack follows your process perfectly and still produces the wrong outcome, the process is the vulnerability. [FBI IC3] [US DOJ 2024] Over 300 companies affected. Most discovered it after the fact.
TakeawayThe attack didn't exploit a weakness. It followed the process exactly. The process was the weakness.
The pattern connecting all three cases
The workflow
is the vulnerability.
Attackers don't find gaps in your defenses. They reconstruct your normal operations, and your defenses never see it coming.
01
Attack completes before controls activate
The decisive judgment happens in a context that feels normal. Transaction-layer controls activate after it's over.
02
Multi-modal reinforcement is the mechanism
Voice plus video plus context plus sequence, each signal narrowing doubt before the next arrives. Single-channel detection is insufficient by design.
03
The workflow is the exploit, not a side effect
Defenses need to be positioned where normal operations happen, not where anomalies appear.
What do these cases have in common?
The honest limit
Detection technology is improving, but so is synthesis. No system is absolute. The goal is meaningful friction: raising the cost of an attack high enough that the economics stop working for the attacker.
What does an effective defense actually look like?
Out-of-band verification for payment authorizations
At the decision point itself, not at onboarding, not at login. The moment a transfer or access grant is about to happen.
Real-time detection in live channels
Layered into phone calls, video meetings, and identity verification flows during the interaction. A risk score before a decision is acted on, not after.
A tested map of where your trust breaks down
Most organizations believe their controls are adequate. They have not tested what happens when a convincing synthetic attack arrives through the channels their teams use every day.
Most organizations have not tested what happens when a convincing synthetic attack arrives in the workflows their teams trust. That gap is where these attacks live.
The honest question
Do you know where your workflow breaks?
Most organizations haven't tested it. Momenta runs controlled synthetic attacks through your actual channels and maps where your teams make decisions they shouldn't, in less than 48 hours.