Every major game gets reviewed. Graphics. Story. Gameplay feel. Value for money. Hundreds of outlets scoring hundreds of games on whether the combat is satisfying or the writing holds up.
Nobody reviews whether the game will protect you from the people playing it alongside you.
The ESRB rates content. Their descriptor for online multiplayer reads: “Online Interactions Not Rated.” PEGI has an online safety logo that signals online features exist without scoring how well they’re managed. Common Sense Media evaluates privacy policies and age-appropriateness for children but doesn’t assess moderation sophistication for adults. There is no consumer-facing rating system that tells you whether a game has real voice moderation, a meaningful appeals process, published transparency data, or a community safety design philosophy, versus a report button that feeds into a void and a terms of service document nobody reads.
That gap is what the TAG Community Safety Score is built to fill.
Why We Built This
Nine articles into this series, you already know the stakes.
83 million Americans experienced gaming harassment in a single six-month period. 76% of online multiplayer players encounter toxicity. First-time players who experience harassment are 320% more likely to quit a game immediately. Players spend 54% more in communities they perceive as safe. Identity-based harassment against Black, women, and LGBTQ+ gamers is getting worse by measurable margin even as overall moderation tools improve.
The data is clear. Moderation quality has a direct, documented impact on who stays in gaming, who leaves, and how much anyone spends while they’re there.
What’s been missing is a way for players, the people at the receiving end of all of this, to evaluate moderation quality before they choose a game, before they invest time and money, before they find out the hard way that the community safety infrastructure they assumed existed doesn’t.
Game reviews tell you the product is good. The TAG Community Safety Score tells you whether the environment around that product is worth spending time in.
What the Score Measures
The TAG Community Safety Score rates games on 100 points across eight dimensions. Each dimension is weighted based on its actual impact on player experience and the difficulty of achieving it at scale.
Here’s how the scoring breaks down:
1. Moderation Infrastructure (20 points)
The biggest single dimension, because the tools you deploy determine the ceiling of everything else.
We’re looking at the full spectrum from “no tools” to best-in-class AI-human hybrid systems. A game with nothing but a keyword filter and a report button scores at the bottom. A game with proactive AI voice moderation, text analysis, automated detection that catches violations before they’re reported, and published accuracy metrics scores at the top.
The benchmarks we use: Xbox’s 87% proactive enforcement rate as a platform-level target. ToxMod and GGWP’s documented effectiveness as game-level comparisons. Moderation that requires a player to have already been harmed before anything gets flagged scores significantly lower than moderation that prevents the harm from arriving.
Score range:
- 0-4: No meaningful moderation tools. Keyword filter or nothing.
- 5-9: Basic report system with human review. No AI detection.
- 10-14: AI text moderation. No voice coverage.
- 15-17: AI text and voice moderation with human review layer.
- 18-20: Full hybrid AI-human system, proactive detection, voice and text coverage, published effectiveness data.
2. Response and Enforcement (15 points)
Having tools is one thing. What happens after someone gets flagged is another.
This dimension covers how quickly reports get reviewed, what sanctions look like, and whether enforcement is graduated or binary. Games that only issue permanent bans, with no intermediate steps, score lower than games with a full range from warnings through temporary restrictions to permanent removal. Proportional enforcement produces better behavioral outcomes than all-or-nothing systems, and the published data backs this up.
Benchmarks: Activision’s 8-hour appeal response target. The 80% non-reoffend rate from Call of Duty’s ToxMod deployment as an effectiveness signal. Ubisoft’s graduated Exemplary-to-Dishonorable reputation tier system as a design reference.
Score range:
- 0-4: No documented enforcement process. Reports go unacknowledged.
- 5-8: Enforcement exists but is opaque or binary (ban or nothing).
- 9-11: Graduated sanctions with documented timelines.
- 12-13: Fast response with graduated sanctions and measurable outcomes.
- 14-15: Documented response targets, graduated sanctions, published effectiveness data.
3. Communication Controls (15 points)
How much control do you actually have over who can reach you, and how?
Xbox’s system sets the benchmark here: independently configurable controls for video, voice, text, community content, profile visibility, and more, each settable separately for Friends, Friends of Friends, and Everyone. That granularity gives players real tools to manage their exposure rather than a binary mute-everything switch.
Games score points for the breadth of controls available, the ease of accessing them (in-game vs. buried in settings menus), and whether controls can be applied during a session without disrupting gameplay.
Score range:
- 0-4: Mute only. No blocking, no filtering, no chat controls.
- 5-8: Mute and block available. Basic text filter options.
- 9-11: Full mute, block, filter system. Reporting accessible mid-session.
- 12-13: Fine-grained per-category controls (voice, text, profiles separately configurable).
- 14-15: Full Xbox-level independent configuration per communication type. Accessible in-game without friction.
4. Transparency and Accountability (15 points)
Publishing what you’re doing is how you get held accountable for it.
This dimension scores studios on whether they publish moderation data, how detailed that data is, and how frequently they update it. A studio that publishes nothing scores at the bottom regardless of how good their internal systems might be, because “trust us” isn’t accountability.
The gold standard: Xbox’s semi-annual transparency reports with enforcement volumes, proactive-vs-reactive breakdowns, tool descriptions, and trend data over time. Call of Duty’s game-specific progress reports with before-and-after comparisons. Ubisoft’s false positive rate publication.
Score range:
- 0-4: No public moderation data published.
- 5-7: Some public communication about moderation policies, but no data.
- 8-10: Published enforcement policies with some quantitative data.
- 11-13: Regular reports with meaningful statistics. Year-over-year comparisons.
- 14-15: Semi-annual or more frequent reports. Detailed methodology. False positive rates. Trend data.
5. Community Guidelines Quality (10 points)
The rules only matter if players can find them, read them, and understand what they actually prohibit.
We evaluate guidelines on four criteria: clarity (plain language, specific examples of prohibited behavior not just categories), discoverability (linked prominently in-game, not buried three clicks deep in a support site), currency (updated when the platform or community evolves, not a 2019 document on a 2026 game), and completeness (covering voice, text, usernames, profile content, external community spaces, and in-game behavior).
The best guidelines, Riot’s for Valorant and League of Legends, Ubisoft’s for Rainbow Six Siege, explicitly describe both what’s prohibited and what’s encouraged. Prosocial community norms aren’t just implied. They’re stated.
Score range:
- 0-2: No accessible community guidelines or purely legal boilerplate.
- 3-5: Guidelines exist and are publicly accessible. Vague on specifics.
- 6-7: Clear, specific guidelines with examples. Updated within past year.
- 8-9: Plain language, specific prohibited and encouraged behaviors. In-game accessible.
- 10: All of the above plus community input process. Explicit prosocial expectations.
6. Appeals and Due Process (10 points)
If you get banned, correctly or not, can you do anything about it?
This dimension covers whether a formal appeals process exists, whether a human reviews appeals, whether evidence is provided to the person being appealed (what they were accused of doing), and how long the process takes. Games that only allow appeals for permanent bans score lower than games where any sanction can be appealed. Games where appeal review takes weeks score lower than games with documented response targets.
The CHI 2023 research finding drives the weight we put on this dimension: “moderation explanation plays a critical role in perceived transparency and fairness.” When players understand why a decision was made, even when the decision stands, they’re significantly more likely to trust the system and less likely to reoffend after sanctions.
Score range:
- 0-2: No formal appeals process. Bans are final.
- 3-5: Appeals available for permanent bans only. Process opaque.
- 6-7: All bans appealable. Human review confirmed. Timeline documented.
- 8-9: Clear process, evidence provided to appellant, multiple possible outcomes.
- 10: All of the above plus documented response targets and published appeal outcomes data.
7. Parental Controls and Age Protections (10 points)
Adults don’t need babysitting. But many adults play with younger family members, and even those who don’t benefit from knowing that a platform has taken the youngest and most vulnerable players seriously.
This dimension scores the breadth and usability of parental control systems, content filters, spending limits, communication restrictions, playtime management, companion app accessibility, and protection systems that apply to younger accounts.
Nintendo’s companion app and default-restrictive approach represent one strong model here. Xbox’s fine-grained per-category account management represents another. We’re not evaluating “most restrictive”, we’re evaluating whether the tools exist and whether they’re usable by parents who aren’t technical experts.
Score range:
- 0-2: No meaningful parental controls beyond platform-level age gating.
- 3-5: Basic content and communication restrictions. In-game settings only.
- 6-7: Companion app controls. Spending limits. Communication restrictions.
- 8-9: Full suite: content, communication, spending, playtime, activity reporting.
- 10: All of the above, companion app, real-time mobile management, PIN-protected settings.
8. Proactive Safety Design (5 points)
The smallest dimension by weight, but the one that signals the deepest commitment to community health.
This is about whether safety is designed into the game rather than bolted on after the fact. Behavioral nudges that interrupt players before they send flagged messages. Prosocial reward systems that make positive behavior worth something, Riot’s Honor system, Ubisoft’s Exemplary tier, Deep Rock Galactic‘s culture of saluting teammates. Onboarding restrictions for new accounts that limit high-stakes interactions until a player has demonstrated baseline behavior.
Games that have thought about how to build a community worth having, not just how to remove the worst members of the one they got, score here.
Score range:
- 0-1: No proactive safety design elements.
- 2-3: One or more elements (nudges, or rewards, or onboarding restrictions) present.
- 4: Multiple proactive elements integrated into the game experience.
- 5: Full prosocial design philosophy. Safety is a stated design value with documented implementation.
How We Assess Games
The TAG Community Safety Score uses a three-level assessment methodology.
Level 1: Public record review. We examine everything a studio has published, community guidelines, transparency reports, policy updates, developer communications, enforcement announcements. This is the baseline, and it’s what the transparency dimension scores.
Level 2: Hands-on testing. We test the in-game reporting flows, communication controls, and moderation features ourselves. We file test reports, test appeal processes, evaluate how accessible controls are from inside the game, and document what the actual player experience is, not just what the policy documents say it should be.
Level 3: Community survey data. We collect player perception data. What do the people actually playing each game think about how safe they feel? How effective do they find the reporting system? Do they feel the community standards are enforced? Player experience is a reality check on policy claims.
Current scores are based on Levels 1 and 2. Level 3 survey data will be incorporated into the next scoring cycle. We’ll note clearly in each game’s profile what data sources contributed to its score.
The First Scores
Here are the initial TAG Community Safety Scores for eight major multiplayer titles. These scores represent our editorial assessment based on publicly available data, hands-on testing of in-game systems, and the published research covered throughout this series. They’re our honest evaluation, not a ranking system sponsored by or submitted to the studios being scored.
Each score will have a full detailed breakdown published on its own page. What follows is the summary version.
Deep Rock Galactic, 79/100
The highest-scoring game on this first list, and the one that most demonstrates what proactive safety design looks like in practice.
DRG doesn’t have enterprise-grade AI moderation infrastructure. What it has is cooperative-only gameplay that eliminates most of the structural conditions that produce toxicity, a developer-community relationship that has actively cultivated a culture of inclusion, and a player base that self-enforces community norms. The game scores a near-perfect 5/5 on Proactive Safety Design, the “Rock and Stone” culture, the saluting, the humor that signals “we don’t take ourselves seriously enough to target each other” is genuine community design, not accidental.
Where it loses points: no voice moderation infrastructure, limited formal appeals process, no published moderation data. Ghost Ship Games is a smaller studio that has built a remarkable community through design and culture rather than moderation technology. The score reflects both what they’ve built and the infrastructure gaps that come with being a smaller operation.
Breakdown: Infrastructure 11/20, Enforcement 9/15, Controls 11/15, Transparency 7/15, Guidelines 8/10, Appeals 6/10, Parental 7/10, Design 5/5, Weighted total: 79
Valorant, 76/100
Riot’s commitment to community safety is more documented than almost any other studio in gaming and Valorant benefits from that directly.
Voice recording for moderation (pioneered here in 2022), the Honor system, hardware bans for repeat offenders, a code of conduct that explicitly prohibits amplifying discriminatory movements, and the most explicit public stance on player behavior from any named executive in the industry. The transparency data Riot publishes isn’t as detailed as Activision’s game-specific reports, but the behavioral system documentation, the 20x improvement in automated enforcement, the 87% net-neutral flagged player finding, gives you real data to evaluate.
Where it loses points: no regular formal transparency report in the style of Xbox or Activision. The appeals process for permanent bans is limited. The competitive mode environment has a documented toxicity profile significantly worse than quick play, and the scoring reflects the full experience rather than the best-case mode.
Breakdown: Infrastructure 16/20, Enforcement 12/15, Controls 12/15, Transparency 10/15, Guidelines 9/10, Appeals 6/10, Parental 6/10, Design 4/5, Weighted total: 76
Call of Duty (Black Ops 6), 74/100
The game with the best-documented moderation improvement story in gaming. The 43% sustained reduction in toxic voice chat exposure and the public transparency reports give Activision more accountability infrastructure than almost any other publisher.
Strong on infrastructure (ToxMod + Community Sift running simultaneously), strong on transparency (the only franchise publishing game-specific progress reports at this level of detail), meaningful on enforcement. Loses points for the appeal system that only covers permanent bans, the absence of published false positive rates, and the fact that the reports, while excellent, don’t include absolute baselines that would let you fully contextualize the percentage improvements.
Breakdown: Infrastructure 16/20, Enforcement 11/15, Controls 11/15, Transparency 13/15, Guidelines 7/10, Appeals 5/10, Parental 7/10, Design 3/5, Weighted total: 74
Rainbow Six Siege, 72/100
Ubisoft’s most complete community safety implementation. The multi-tier reputation system (Exemplary through Dishonorable), the published false positive rate (0.1%), the 50% reduction in flagged messages after deploying automated text moderation, the 22% reduction in team kills and 35% reduction in griefing, the data package here is real and specific.
The community-based review system, where high-reputation players validate or challenge AI flagging decisions, is the most interesting design experiment in moderation governance in the industry. The Fair Play educational modules take a root-cause approach to behavior change that most studios haven’t attempted.
Where it loses points: the voice moderation infrastructure doesn’t match the text moderation depth. The appeals process documentation is less public than the enforcement data. The transparent data is good but published less frequently than Xbox’s semi-annual reports.
Breakdown: Infrastructure 14/20, Enforcement 13/15, Controls 12/15, Transparency 11/15, Guidelines 9/10, Appeals 6/10, Parental 4/10, Design 4/5, Weighted total: 72
Overwatch 2, 65/100
Real behavioral infrastructure, Defense Matrix, the commendation-and-report feedback loop, the silence penalty for weaponized reporting, but a meaningful transparency gap that prevents the score from going higher.
Blizzard has not published effectiveness data the way Activision and Riot have. The claim that Defense Matrix works is supported by anecdotal community experience and some third-party research but not by Blizzard’s own published reports. The in-game controls are solid and the graduated sanctions system (chat restrictions, then ranked restrictions, then bans) is well-designed. The score would rise significantly if Blizzard committed to publishing moderation effectiveness data.
Breakdown: Infrastructure 13/20, Enforcement 10/15, Controls 12/15, Transparency 6/15, Guidelines 8/10, Appeals 6/10, Parental 6/10, Design 4/5, Weighted total: 65
Fortnite, 63/100
Epic sits in an interesting position: a game with a massive child player base that has driven real investment in parental controls and age protection, but moderation infrastructure that hasn’t been documented at the same level as the competitive titles on this list.
Strong parental controls, among the best on this list. The graduated sanctions approach with one formal appeal per sanction is a reasonable policy. The voice moderation investment driven by child safety obligations is real. What’s missing: published data on moderation effectiveness, a formal transparency report program, and public documentation of how the AI moderation systems interact with the report system.
Breakdown: Infrastructure 12/20, Enforcement 9/15, Controls 11/15, Transparency 5/15, Guidelines 7/10, Appeals 7/10, Parental 9/10, Design 3/5, Weighted total: 63
League of Legends, 62/100
Riot’s GATES system improvement (15x better toxic text detection) and the Instant Feedback Report notifications are genuine differentiators. The Honor system is one of the older and more developed prosocial design implementations in gaming. The data on new player toxicity churn, and Riot’s documented investment in protecting that cohort, is some of the most compelling behavioral research in the industry.
The score reflects a gap between the investment in League‘s community safety tools and the transparency about how those tools are performing. No regular formal transparency reports. No published voice moderation data despite the 2022 expansion of voice recording. The competitive environment remains one of the rougher ones in online gaming by player perception data, which is reflected in the community guidelines and enforcement dimensions.
Breakdown: Infrastructure 14/20, Enforcement 10/15, Controls 10/15, Transparency 7/15, Guidelines 8/10, Appeals 5/10, Parental 5/10, Design 3/5, Weighted total: 62
GTA Online, 41/100
The score that most reflects the gap between a game’s potential and its execution on community safety.
GTA Online has ToxMod deployed for voice moderation, a real investment that many games of its size haven’t made. But Rockstar doesn’t publish any data on how it’s performing. The community guidelines are generic. The appeals process is minimal. The in-game communication controls are limited compared to the other titles on this list. The overall experience in public lobbies is documented in player research as among the more hostile in mainstream gaming.
Rockstar gets credit for deploying ToxMod. They lose significant points for the near-total absence of published data, accountability infrastructure, or formal transparency commitments. A studio with Rockstar’s resources should be publishing safety reports. The fact that they’re not is a policy choice.
Breakdown: Infrastructure 10/20, Enforcement 6/15, Controls 7/15, Transparency 3/15, Guidelines 4/10, Appeals 3/10, Parental 4/10, Design 2/5, Weighted total: 41
What These Scores Mean
A few things worth being clear about:
These scores will change. Studios improve their practices. Policies get updated. Transparency reports get published. A game that scores 41 today can score 65 in two years if Rockstar decides to take community safety accountability seriously. We’ll update the scores when meaningful changes occur and do a full re-scoring cycle annually.
A high score isn’t a guarantee of a safe experience. A game can have excellent moderation infrastructure and still have toxic individual sessions. These scores measure what’s in place to address toxicity, not whether toxicity has been eliminated.
A low score isn’t a verdict on the game. GTA Online scoring 41 doesn’t mean the game is bad. It means Rockstar hasn’t built, or published, the community safety infrastructure their player base deserves. That’s about the studio’s choices, not the quality of the game itself.
We’re transparent about our methodology. The eight dimensions, their weights, and the scoring ranges are all published above. If you think we’ve got a score wrong, make the case. We’re not issuing verdicts from a mountaintop. We’re publishing our reasoning and inviting scrutiny.
What Comes Next
The TAG Community Safety Score will update annually, with mid-cycle updates when significant changes occur. Our next additions to the scoring list include Final Fantasy XIV, Sea of Thieves, Apex Legends, and Destiny 2, all games with interesting safety profiles that deserve detailed assessment.
We’re also building out the Level 3 survey component, player perception data collected directly from the communities playing each game. If you play any of the games on this list and want to contribute to the next scoring cycle, watch for our community survey announcement in the TAG Discord.
One last thing. When we started this series nine articles ago, the argument was that no gaming outlet was doing this, systematically evaluating games and platforms on community safety rather than just game quality. That gap exists because the industry hasn’t asked for this kind of accountability and most outlets haven’t bothered to provide it.
We think the 83 million people experiencing harassment in multiplayer games deserve better information about what they’re getting into. We think the studios investing seriously in community safety deserve to have that investment recognized and distinguished from the ones paying lip service. We think the tools and research to do this properly now exist.
That’s what the TAG Community Safety Score is for. It starts here, and it gets better with every cycle.
Disagree with a score? Think we missed something in a game’s safety infrastructure? Want to contribute community survey data for the next scoring cycle? All of it belongs in the comments or in the TAG Discord. This framework is designed to improve through scrutiny, bring yours.