Quarantine borderline content

Last Updated 6/9/25

Definition: Implement Quarantine Systems for Gray-Area Content.

Abuse Types:

Online harassment Inappropriate content Deceptive synthetic media

Impact Types:

Psychological & emotional harm Abuse normalization

Targets:

Public figure Organization, group, community

Responsible Organizations:

Digital platform non-governmental organization (NGO) / Third-party tool AI generation organization

The information on this page is adapted with permission from Prevention by Design by lead authors Lena Slachmuijlder and Sofia Bonilla.

Use quarantine systems to temporarily hold flagged content in a separate review area, allowing users and platform moderators from the Trust and Safety team to examine it without immediate public exposure. Quarantine systems balance the need for safety with respect for the freedom of expression. By providing a structured approach to handle ambiguous cases, platforms can address concerns about over-censorship and ensure appropriate content management while preventing TFGBV.

The quarantining approach promotes transparency because the inclusion of a "quarantine" section pulls instances of harmful behavior off the platform and invites users to collaborate with the Trust & Safety team in the flagging/triage process, offering them greater agency. A dedicated UI element such as this–rather than customer support email threads or unmanned chat bots–emphasizes the importance of these issues to the platform and simultaneously empowers users. In terms of protection, it ensures that content, which might otherwise be removed entirely (which could be seen as censorship) or ignored (which could be seen as negligence), receives the appropriate attention. This is especially relevant in content moderation “gray areas,” where the Trust & Safety team may not be certain whether the content is problematic, and in response, brings in the user to approach the matter thoughtfully rather than simply removing swaths of content.

Examples

Instagram Restrict Feature: The “restrict” feature prevents flagged comments from appearing publicly until approved by the user, reducing exposure to harassment. Direct messages from restricted accounts move to Message Requests and do not trigger notifications. Users can view them, but the sender won’t see read receipts or activity. This empowers users to manage exposure to other specific users without alerting the restricted party.
Instagram and TikTok Sensitive Content Flags: Offers users greater control over what they see while maintaining freedom of expression.
TikTok Shadowbanning: According to the TikTok Safety Center, accounts that repeatedly share content deemed unsuitable for the For You feed may have their posts removed from the feed and become less visible in search results.

References

Rowa, J. Y., & Saltman, E. (2022). The Contextuality of Lone Wolf Algorithms: An Examination of (Non)Violent Extremism in the Cyber-Physical Space. In Global Internet Forum to Counter Terrorism (GIFCT). https://gifct.org/wp-content/uploads/2022/09/GIFCT-22WG-ContextualityIntros-1.1.pdf
Slachmuijlder, L., & Bonilla, S. (2025). Prevention by design: A roadmap for tackling TFGBV at the source. https://techandsocialcohesion.org/wp-content/uploads/2025/03/Prevention-by-Design-A-Roadmap-for-Tackling-TFGBV-at-the-Source.pdf

Limitations

What constitutes “borderline” content? Who defines that term within the company and how inclusive (or vague) is it?

“There is no overarching agreement between different sectors or geographies’ on what borderline content is" (Rowa, 2022, p. 11).

Is something missing, or could it be better?