Chatbots: (S)elected Moderation

Measuring the Moderation of Election-Related Content Across Chatbots, Languages and Electoral Contexts

project overview

AI Forensics had previously exposed that Microsoft Copilot's answers to simple election-related questions contained factual errors 30% of the time. In collaboration with Nieuwsuur, we uncovered how chatbots can recommend and support the dissemination of disinformation as a campaign strategy. Following those investigations as well as a request for information from the European Commission, Microsoft and Google introduced “moderation layers" to their chatbots so that they refuse to answer election-related prompts.

This report evaluates and compares the effectiveness of these safeguards in different scenarios. In particular, we investigate the consistency with which electoral moderation is triggered, depending on (i) the chatbot, (ii) the language of the prompt, (iii) the electoral context, and (iv) the interface. We find significant discrepancies:

  • The effectiveness of the moderation safeguards deployed by Copilot, ChatGPT, and Gemini is widely different. Gemini's moderation was the most consistent, with a moderation rate of 98%. For the same sample on Copilot, the rate was around 50%, while on the OpenAI web version of ChatGPT, there is no additional election-related moderation.
  • Moderation is strictest in English and highly inconsistent across languages. When prompting Copilot about EU Elections, the moderation rate was the highest for English (90%), followed by Polish (80%), Italian (74%), and French (72%). It falls below 30% for Romanian, Swedish, Greek, or Dutch, and even for German (28%) despite it being the EU’s second most spoken language.
  • For a given language, when asking the analogous prompts for both the EU and the US elections, the moderation rate can vary substantially. This confirms the inconsistency of the process.
  • Moderation is inconsistent between the web and API versions. The electoral safeguards on the web version of Gemini have not been implemented on the API version of the same tool.

As chatbots become a primary interface for access to online knowledge, it is crucial for their moderation layers to be consistent, transparent, and accountable.

This work was supported by a grant from NGI Search. The final report is available here.