OpenAI Creates Superalignment Team to Control Rogue AI

OpenAI, the creator of ChatGPT, announced that it is forming a new Superalignment team to keep rogue AI systems under control. OpenAI is dedicating 20% of its total compute to solve this problem within four years.

“Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI,” reads OpenAI’s blog post, authored by co-founders of the new team Ilya Sutskever and Jan Leike. “But humans won’t be able to reliably supervise AI systems much smarter than us.”

AI alignment refers to the process of creating AI systems that understand ethical concepts, societal standards, and positive human objectives and act in accordance with those. Existing AI alignment practices used to control GPT-4, OpenAI’s most advanced technology to date, use reinforcement learning from human feedback. But this approach won’t be sufficient to control future AI systems that outsmart human intelligence.

While still a thing of the future, OpenAI believes superintelligence could become a reality “this decade.” It’s expected to be the most impactful technology humanity has invented, but its power is a double-edged sword. It could solve humanity’s biggest problems or lead to “disempowerment of humanity or even human extinction.”.

To manage those risks, OpenAI calls for aligning these AI systems with human values and creating governance bodies that will control them. OpenAI intends to create a “human-level automated alignment researcher” and scale its efforts from there. The ultimate goal is to build AI systems that can conduct alignment research faster and better than humans.

“Human researchers will focus more and more of their effort on reviewing alignment research done by AI systems instead of generating this research by themselves,” Leike and teammates John Schulman and Jeffrey Wu said in another blog post.

OpenAI stresses that superintelligence alignment is primarily a machine learning rather than an engineering problem. The AI startup has encouraged successful machine learning experts to join its team in solving the pressing alignment problem.

OpenAI’s CEO Sam Altman has been outspoken about AI dangers since ChatGPT started an AI revolution. In May, Altman met with US politicians to discuss government AI regulations and said AI can cause “significant harm to the world.” Meanwhile, OpenAI lobbied for weaker AI regulations across the European Union.