ZeroTrue_manifesto

Zero Truth: How the ZeroTrue startup challenges the global deepfake\generative AI era

Generative image detection

The digital ecosystem has entered a new phase. Generative AI (GenAI) has reached a critical point: synthetic content is now indistinguishable from human content in how convincing it looks and feels. We already see a “boss’s voice” pushing someone to wire money over Zoom, a “journalist” in a video promoting questionable schemes, a “government official” sending messages from spoofed accounts.

This isn’t a preview; it’s the present. In Hong Kong, an employee on a multi-person video conference was shown familiar “executives” and was pressured into transferring roughly £20 million — video and voice deepfakes working together like clockwork.

The problem grows faster than organizations update their safeguards. Gartner’s data show a structural lag: a majority of IT orgs were still stuck at initial research while only a minority had integrated GenAI into processes — an exact kind of gap adversaries love.

“We’ve already crossed the line where a person could spot a fake video or image by eye. Today you cannot unambiguously classify the origin of information.”

Let’s talk about another layer. This isn’t only about money, it’s about manipulating public opinion, propaganda, deception, and even mimicking states and institutions. And yes, it’s about complex, targeted, multi-vector attacks, where an email from the “CFO” is amplified by a call from the “CEO” and then a quick “alignment meeting” over video - one seamless synthetic play. High-profile impersonation attempts via voice cloning are now documented openly; they’re just the visible tip.

In human terms: we’re losing the right to trust our senses. In civic terms: the risk of systematic disinformation in elections and media (including full narrative substitution). Research groups and institutes disagree on scale, but on the ground this already harms people’s ability to separate reality from simulation.

Why we’re doing this

ZeroTrue is our answer to the trust crisis. We’re building a universal filter that automatically classifies synthetic content across six modalities: audio, music, video (in production), text and code (beta/R&D), and images next in line. We don’t want to ship “another detector.” We want a separator between honest and harmful AI use.

Why universality? Because attackers have been multichannel for a while. A text-only detector misses the “email + phone call” combo. An audio detector won’t notice that the video is fake, too. We have to cover combinations of signals.

“There’s practically no service that covers the full spectrum of modalities and also provides public social proof. We will do that — separately and in chains: images, video, music, voices, text, and their combinations.”

What’s actually happening right now (cases)

Multi-person “leadership” video call → funds transferred (~£20M): video and voice deepfakes in one scene.
Impersonation of top managers/public figures → pressure on staff/partners using cloned voices and social hooks.
“Journalist” video ads → pushing shady products with synthetic faces/lines. AI-voice vishing campaigns → at-scale SMS/voice vectors flagged by the FBI.

Financial and reputational damage go hand in hand. Industry briefings warn that contact centers and real-time comms (Zoom/Teams) are becoming a new front: deepfake-driven fraud risk is rising sharply.

Part 2. Product, Proof, and Roadmap

What already works:

Audio, music, video — production.
Text and code — beta / R&D.
Complex pipelines for multi-vector attacks: an AI-generated video plus face/voice swaps; cloned speech plus AI-written text lightly edited by a human; cross-checks on metadata, timing, and behavioral patterns.
Defense against adversarial attacks: data augmentation, saturating datasets with known bypass attempts, regular shadow tests on the newest generators.

Where the data come from — and how we treat them
We combine sources: we generate samples ourselves, we use external providers, and we collect generative samples from public sources (with licenses respected). If we discover any sample is rights-protected after the fact, we remove it. Some “hard” classes are genuinely difficult to collect, but we keep pushing.

Social proof and transparency

We’re preparing public metrics reports (AUROC/F1, FP/FN, cross-dataset robustness under compression/noise/translation), pilot case studies (anonymized where needed), and expanded report metadata for interpretability. Our stance is simple: detection is probabilistic, but it’s useful. We show confidence scores and explain the verdict. No “magic truth” claims.

Where ZeroTrue is useful today

Email/communications security: anti-phishing with voice validation for incoming “from the boss” calls.
Media and platforms: check uploaded video/audio/images before posting or labeling.
Contact centers and finance: detect synthetic speech during the call; trigger second-factor checks automatically.

Integrations and standards

We’re oriented toward C2PA / Content Credentials (provenance metadata), watermarking, and enterprise plumbing: SIEM/SOAR, MTA, CMS/MDM. For a verifiable supply chain of content, this is critical.

Privacy and regions

We treat data with care: we let users select their region and keep data in-region. We encrypt, we limit retention, we delete on request on-term. Users can opt in to allow their data to improve the models.

Minimal compliance commitments (starter)

Regional storage and processing; encryption at rest and in transit.
Access controls and audit logs; isolated training environments.
Incident reporting procedures and human-in-the-loop for disputed cases.
Language aligned with upcoming regulatory regimes (EU-AI-Act-friendly wording).

Access model and economics
Freemium with weekly credits renewed for early users; SDK/API for integrations; tiered API packages and a credit model. Clear TCO matters: predictable limits, queue priorities, call-level reporting.

The “Race” and Our Three Key Features

Multi-signal correlation
ZeroTrue never looks at a single modality in isolation. We build a verification graph: we link video frames, audio spectra, speech biometrics, text patterns, metadata, and behavioral cues (tempo, latency, the hidden “edit tree”). The output is combined confidence plus an explanation of where the signal “drifts.” This sharply reduces false positives from “narrow” detectors.
Real-Time Call-Guard
A lightweight agent for live calls and conferences: liveness checks, synthetic speech indicators, “trust-but-verify” triggers, safe words to escalate to manual review. The scenario is obvious: an employee hears the “director” — Guard highlights risk and proposes verification through a second channel. This pain is very real right now in contact-center operations.
Developer SDK with inline verdicts
Drop-in packages for mail gateways, CMS, call stacks, even Git hooks for code. You get the verdict and a confidence score inline, without wrestling your stack. For complex chains we use webhooks and emit events into your SIEM.

Release rhythm

We iterate in a tight loop: fresh sample collection → adversarial augmentation → retraining → release. When new generators spike, we spin up multi-ensemble defenses and run “shadow” experiments. We’ll keep posting these updates in the blog and social channels.
Honesty, limits, expectations

We say it plainly: no detector is absolute. We show probability ranges and explain why. That’s our contract with users, and the base layer for trust. Agencies like the FBI have explicitly warned that AI is boosting fraud; the task isn’t to “spot it by eye,” but to make verification the default.

Closing Manifesto

ZeroTrue is infrastructure for a world where trust is scarce. We don’t promise miracles. We promise a tool that learns faster than the attacker and makes lying more expensive — every single day. Join the pilots, send us hard cases. “Zero Truth” is built with you.

17 oct / 2025

author: Stanislav N