Anthropic Offers Payments for AI Jailbreak Discoveries

posted  8 Aug 2024
Photo - Anthropic Offers Payments for AI Jailbreak Discoveries
Anthropic, an AI development company, broadened its bug bounty program to reward those who discover vulnerabilities in AI models. Researchers can earn up to $15,000 for finding a universal jailbreak that can circumvent most restrictions on current models.

Testing will be carried out on a new, unpublished version of the security system in a fully isolated virtual environment. Anthropic is particularly interested in vulnerabilities in areas potentially dangerous to humanity, such as chemical, biological, radiological, nuclear, and cybersecurity threats.
This initiative aligns with commitments we’ve signed onto with other AI companies for developing responsible AI such as the Voluntary AI Commitments announced by the White House and the Code of Conduct for Organizations Developing Advanced AI Systems developed through the G7 Hiroshima Process
stated Anthropic
The program is run in partnership with cybersecurity company HackerOne, which will also handle the rewards for successful researchers. Currently, participation requires a special invitation after applying, but developers plan to streamline and expand the process soon. The current recruitment phase runs until August 16.

Sidebar ad banner