The information on this page is adapted with permission from Prevention by Designby lead authors Lena Slachmuijlder and Sofia Bonilla.
Integrate real-time “nudge” prompts that encourage users to pause and reconsider harmful language before posting.
These kind of nudges are a subtle intervention technique to deter harmful behavior before it occurs, reminding the user of the platform’s policies and standards and encouraging constructive interaction. Nudges decrease the amount of online abuse and harassment being posted, resulting in an overall increase in healthy content online.
Users who were nudged were less likely to receive offensive replies themselves, he added. “By cutting off one offensive comment at the start, you reduce the likelihood of a toxic back-and-forth." Instagram has reported that when, over the course of one week it sent approximately one million nudges to users, that 50 per cent of the time users deleted or amended their comment as a result. The reduction in hurtful comments posted is also long lasting, according to Instagram’s research on what it calls “repeat hurtful commenters” — people who leave multiple offensive comments within a window of time.
Examples
Snapchat: The expanded in-app warning feature warns teens when they are receiving messages in chat from someone who has been blocked or reported by others, or is from a region where the teen’s network isn't typically located – signs that the person may be a scammer or otherwise suspicious.
Instagram: Instagram has reported that, over the course of one week it sent approximately one million nudges to users, 50% of the time users deleted or amended their comment as a result. The reduction in hurtful comments posted is also long lasting, according to Instagram’s research on what it calls “repeat hurtful commenters” — people who leave multiple offensive comments within a window of time.
In 2022, Twitter had embedded prompts for users to reconsider potentially harmful tweets before posting. Matt Katsaros studied the effect of this feature and found that 9% of users decided not to post, and 22% edited their tweet. More importantly, the nudge had a lasting impact, as recipients were less likely to post offensive content in the following weeks. “The nudge changes the behavior in the moment, but more importantly, it has a lasting impact. People are more likely to rethink their approach in future interactions,” explained Katsaros at a 2024 Symposium on the Comments Sections.
Machine learning models could be used to detect a greater range of nuance, allowing platforms to prompt users based on context-aware language patterns. There are existing models, like Perspective API’s, which offer clients a foundation for language detection and reconsideration prompts that can then be adjusted to fit more specific use cases.
Machine learning models can also detect inappropriate images as someone is uploading (and is about to share) a photo containing nudity, and prompt them to reconsider if it is appropriate for that surface.
However, we must remember that these models are often trained on very biased sample sets, and are likely to make more mistakes in situations where regional context is low. As data sets and models improve, we can expect them to become more effective detection tools.
Limitations
While these nudges can be effective, they are limited by the accuracy of what triggers them. While an overly tentative trigger could miss significant sections of abusive content, an overly active trigger could lead to users distrusting and ignoring the warnings, negating their effectiveness.
It also matters what kind of perpetrator is seeing the prompt. A nudge is far more likely to be effective on an individual who is on the borderline of stepping into the world of TFGBV (e.g. an impressionable teen) than a dedicated, well resourced bad actor intent on causing harm.
Trust & Safety teams are limited by where they have regional experts — those not only who can speak the language, but understand the idioms and evolving terminology used in those regions.
There are ways to skirt the filters. Some of the most common are to use emojis or "algospeak" (small spelling changes to flagged words that are still human-readable).
On platforms that allow it, these nudges can have the appearance of censoring consensual adult nudity. It is difficult to repurpose adult content filters for intimate image abuse, since in many cases the harm is from lack of consent, not the depiction of nudity itself.