Picture a human moderator sitting in a dark room somewhere, headphones on, listening to millions of simultaneous voice chat sessions across Call of Duty, GTA Online, and dozens of other games. Catching slurs, threats, targeted harassment. Flagging the worst of it. Writing up reports.
Obviously thatβs not whatβs happening. No human team could cover that volume. The math doesnβt work.
Whatβs actually happening is ToxMod, an AI system built by a Boston company called Modulate that has processed over 160 million hours of gaming voice chat and enabled more than 80 million moderator actions across games youβre probably playing right now.
If youβve ever wondered whether anyone is actually listening when someone loses their mind at you in voice chat, ToxMod is usually the answer. Hereβs everything youβd want to know about how it works, what it does and doesnβt do, and what it means for your actual gaming experience.
The Company Behind It
Modulate was founded in 2017 by Mike Pappas (CEO) and Carter Huffman (CTO), two MIT graduates who started with a different product entirely, AI voice skins that let you alter how your voice sounded in games. The insight that shifted everything was noticing how voice communication created problems that nobody had good tools to solve. The voice skin product was interesting. The moderation problem was massive.
The company raised between $30 and $36 million over several funding rounds and built its reputation almost entirely on one product: ToxMod. Where most content moderation companies started with text and bolted on voice capabilities later, Modulate built ToxMod from the ground up as a voice-first system. That matters technically, which weβll get to.
Theyβre based in Boston with a team that skews heavily toward machine learning engineers and trust-and-safety specialists. Their client roster reads like a whoβs who of large multiplayer games: Activision (Call of Duty), Rockstar (GTA Online), Rec Room, and now, via a 2026 Discord integration, any game using Discordβs Social SDK for in-game voice.
What ToxMod Actually Does
The short version: ToxMod listens to voice chat in real time, analyzes it, and flags potential violations for a human moderator to review. No automated bans. A person makes every enforcement call.
The longer version is more interesting.
It doesnβt work like a keyword filter. This is the first thing to understand. A keyword filter is a list of words. If someone says a word on the list, the filter catches it. That sounds reasonable until you realize how many ways people have found to work around it, deliberate misspellings, phonetic substitutions, inventing new slang every few weeks, or just saying terrible things in ways that donβt involve the flagged words at all. Keyword filters have been the standard for 20 years, and theyβve never really worked.
ToxMod doesnβt match text. It analyzes speech.
The Ensemble Listening Model. This is what Modulate calls the core architecture of ToxMod, more than 100 individual AI models organized across five analytical layers. Think of it less like a single judge and more like a review panel where every member is a specialist looking at something different.
Some of those component models analyze what you said, the actual words, transcribed from audio. Others analyze how you said it: tone, pitch, pace, emotional coloring, whether your voice is calm or escalating. Others look at context: what happened in the session before this moment, whoβs speaking to whom, whether this is an exchange between two people whoβve been trash-talking each other for ten minutes or someone targeting a single player unprompted. The outputs of all those models get combined into a single assessment.
Modulate describes this approach as moving from pattern-matching to actual comprehension of intent. Whether it fully achieves that is debatable. But itβs meaningfully more sophisticated than anything the industry had five years ago.
Real-time stream processing. ToxMod processes audio as a continuous stream rather than storing full session recordings. The audio isnβt saved to a server for a lawyer to dig through someday. Itβs analyzed as it happens. If nothing gets flagged, the data isnβt retained. If something does get flagged, a clip of the relevant section is held for the human review step.
This is a meaningful privacy distinction, but itβs worth being clear that βnot a full recordingβ still means the audio was analyzed. If you say something bad enough to trigger the system, a human moderator may hear it.
Behavioral history integration. ToxMod doesnβt treat every session in isolation. The system checks whether the flagged player has previous incidents. A first-time minor violation from a clean account is treated very differently from the same violation from someone whoβs been flagged three times in the past month. This is part of why the system produces behavior change rather than just removal, enforcement feels proportional because it actually is.
The handoff to humans. When ToxMod flags something, it doesnβt issue punishment. It creates a package: the relevant audio clip, a transcript, the playerβs behavioral history, what triggered the flag, and a confidence score. That package goes to a human moderator who reviews it and makes the actual enforcement decision.
This design is intentional and important. Automated bans at scale create false positive disasters. The human review layer exists to catch cases where the AI got it wrong. The trash talk that reads as harassment out of context, the gaming slang the model misclassified, the accent the transcription system didnβt parse correctly. Human review is the error-correction mechanism.
Which Games Are Using It Right Now
Call of Duty is the flagship deployment and the most documented. ToxMod launched in beta for Modern Warfare II in August 2023, went global (minus Asia) with Modern Warfare III in November 2023, and has continued through Black Ops 6. Activision publishes progress reports on how itβs performing, we analyzed those in detail here, and the numbers are real. 43% reduction in toxic voice chat exposure sustained across two game titles. 80% of penalized players didnβt reoffend.
GTA Online via Rockstar Games is a notable addition because GTA Online has historically had one of the more chaotic voice chat environments in gaming. Rockstar doesnβt publish the kind of detailed transparency data that Activision does, so thereβs no direct comparable dataset. But the deployment is live.
Rec Room is worth mentioning because itβs a different use case, a social VR and gaming platform with a much younger user base than Call of Duty. Voice moderation in a space where kids are present has a different urgency level than in a rated-M shooter. ToxMod running in Rec Room is doing work that has direct child safety implications.
Discord Social SDK is the most significant recent development. In January 2026, Modulate integrated ToxMod with Discordβs Social SDK, the system game developers use to embed Discord features directly into their games. What that means practically: any game that uses Discordβs in-game voice now has a relatively simple path to add ToxMod voice moderation. The engineering barrier dropped from βbuild it yourselfβ to βenable an integration.β
Thatβs a big deal for mid-size and indie developers who couldnβt previously afford or staff an enterprise moderation solution. Expect the list of ToxMod deployments to grow considerably over the next year.
The Privacy Question, Answered Directly
I know this is what some of you jumped to the minute I described what ToxMod does. So let me answer it cleanly.
Is your voice chat being recorded? Not in the traditional sense. ToxMod processes audio as a stream. If nothing gets flagged, the audio data is not retained. Youβre not building up an archive of your voice chat sessions somewhere.
Can a moderator hear what you said? Yes, if the AI flagged it. The flagging process creates a short clip of the relevant section that goes to a human reviewer. If ToxMod flagged something you said, a person may listen to that clip.
Who has access to that clip? The moderation team at the game studio (or their moderation partner). Modulate processes the audio, but game studios control their own moderation queues.
How long is it kept? This varies by studio and isnβt publicly documented in detail. The clip is retained long enough for the review process and any resulting appeal. Itβs not indefinite storage of everything youβve ever said.
Should you be worried? If youβre not saying things that violate the gameβs terms of service, the practical answer is no. The system exists to catch genuine harassment, threats, and targeted hate speech. Trash talk, competitive frustration, and even fairly heated arguments between consenting players donβt typically meet the threshold for flagging, let alone enforcement.
If you do say something that gets flagged and you believe the enforcement was wrong, every major studio that uses ToxMod has an appeals process. The human review layer exists precisely because the AI makes mistakes.
Where ToxMod Falls Short
Iβd be doing you a disservice if I only covered the upside. Hereβs where the technology has real limitations.
Gaming slang evolves faster than training data. ToxModβs models are trained on historical data, which means brand-new slang, community-specific insults, and freshly invented workarounds can take time to get incorporated into the system. Bad actors who want to harass people while avoiding detection will keep finding new ways to phrase things. Itβs an ongoing arms race.
Cultural and regional context is genuinely hard. A phrase thatβs friendly banter in one gaming community is a targeted insult in another. Tone that reads as aggressive in one cultural context is normal communication style in another. ToxModβs multi-model approach helps with this more than a keyword system would, but itβs not solved. The system is better at catching explicit content than it is at catching subtle targeted harassment that relies on community context to understand.
Sarcasm and irony are still imperfect. The same words delivered sarcastically versus sincerely can mean completely opposite things. Audio tone analysis helps (genuine anger and performed sarcasm have different acoustic signatures), but itβs nowhere near reliable enough to catch every case.
Asia is not covered. ToxModβs Call of Duty deployment specifically excludes the Asian market. Modulate hasnβt published detailed reasons for this, but the most likely factors are the additional linguistic complexity of real-time transcription and analysis across Asian languages, and the different regulatory environments in those markets. Players in those regions are still operating under the old report-and-review model.
False positives happen. Ubisoft publishes its false positive rate for Rainbow Six Siege at 0.1%, and its system is built on similar principles. Applied to a game with tens of millions of players, 0.1% is still a lot of individual wrong calls. Activision doesnβt publish its false positive rate, which is a real gap in its transparency. The human review layer catches many of these before enforcement, but not all of them.
The January 2026 Update: What Changed
Modulate announced a significant architecture update in January 2026. The full release of what theyβre calling Velma 2.0, their next-generation Ensemble Listening Model.
The headline improvement is better detection of intent versus raw content. The older system was better at catching explicit harmful content, slurs, direct threats, than at catching the subtler forms of harassment that can be just as damaging: sustained targeting, exclusion campaigns, gaslighting, coordinated abuse that individually looks borderline but in aggregate is clearly hostile.
The new architecture also improves handling of multi-speaker conversations. In a chaotic voice lobby with five or six people talking over each other, the previous system could struggle to correctly attribute statements to individuals and understand who was speaking to whom. Velma 2.0 does better on this.
The Discord Social SDK integration came in the same announcement window and is arguably the bigger deal for the industry overall. The technology getting better matters. The technology becoming more accessible matters more for how many players it actually protects.
What This Means for Your Gaming Experience
ToxMod is currently running in some of the biggest multiplayer games in the world. If you play Call of Duty or GTA Online, itβs running when youβre in voice chat. If you play games that use Discordβs in-game voice and the developer has enabled the ToxMod integration, it may be running there, too.
The practical impact, based on the data we have:
Voice chat is safer in games that have deployed it. The 43% reduction in Call of Duty isnβt a press release claim. Itβs from Activisionβs own published data, covering two complete game titles. The effect is real and itβs sustained.
Reporting still helps, even with AI running. When you file a report, you add to a playerβs behavioral record that the AI moderation system uses alongside its own detections. Your report doesnβt directly cause enforcement, but it contributes to a pattern file. Combined AI flags and player reports build stronger cases than either alone.
The system catches more than your reports would. ToxMod detected violations proactively, before any report was filed, in the vast majority of cases. The old model, where enforcement only happened when a victim filed a report, has been replaced. Bad actors donβt control whether they get caught by controlling whether their targets bother reporting.
Not every game has this. ToxMod is deployed at studios that have the budget and the will to implement it. Plenty of multiplayer games are still running on keyword filters and an understaffed report queue. Our TAG Community Safety Score, coming later this year, will tell you exactly which games have invested in real moderation infrastructure and which havenβt.
For now: if youβre in a Call of Duty lobby and someone in voice chat decides to make your session miserable, something is more likely to happen about it than at any point in the franchiseβs history. Thatβs not everything. But itβs genuinely not nothing.
Running into situations where you think the AI moderation got it wrong, or got it right when nothing else would have? I want to hear about it. Drop your experience in the comments or find me in the TAG Discord. Real player experiences help us understand how these systems are performing in practice.