“This work is an important step in the right direction,” says Douve Kiela, a researcher at Hugging Face, an artificial intelligence company working on open source language models. He suggests that the feedback-based learning process can be repeated over many rounds, further improving the model. Leike says OpenAI can do this based on customer feedback.
InstructGPT still makes simple mistakes, sometimes leading to irrelevant or nonsensical answers. For example, if she receives a hint that contains a lie, she will accept that lie as true. And since it has been trained to do what people are asking for, InstructGPT will produce a much more toxic language than GPT-3 if directed to do so.
Ehud Reiter, who works on AI for text generation at the University of Aberdeen, UK, welcomes any technique that reduces the amount of misinformation generated by language models. But he notes that for some applications, such as AI giving medical advice, no amount of lying is acceptable. Reuters wonders if large language models based on black box neural networks can guarantee user safety. For this reason, he prefers a combination of neural networks and symbolic AI, hard-coded rules restrict what the model can and cannot say.
Whatever the approach, there is still a lot of work to be done. “We are not even close to solving this problem yet,” says Kiela.