large language model

Exploring ChatGPT guardails: from protected to forgotten countries (Part 1)

When ChatGPT writes negative poems for some countries only.

Jean-Matthieu Schertzer

10 minute read

OpenAI ChatGPT has built guardrails on limiting the generation of negative content. How do these guardrails behave for countries? Spoiler: there are some gaps and disparities in ChatGPT safety mechanisms. Based on 24100 ChatGPT queries, this blog post is an exploration of ChatGPT responses when prompted to generate negative content about a country.