Evaluating AI Safety Through Local Policy: Findings from the UbuntuGuard benchmark
A new paper introducing the UbuntuGuard benchmark questions whether strong results on English-language safety tests consistently translate into responsible use. Built from policies developed by 155 African domain experts across ten languages and six countries, UbuntuGuard's framework assesses whether AI tools comply with the norms that shape services in non-Western contexts. The findings suggest that institutions, wherever they operate, need the capacity to define their own standards before using these tools to improve public-sector outcomes.