Credit: David Baillot/University of California San Diego
Toxic prompts, cloaked in benign language, can be detected far better by ToxicChat, a new benchmark developed by University of California San Diego computer scientists than by models trained on previous toxicity benchmarks.