[Interesting content] InstructGPT, RLHF and SFT ·

31 May 2024, 00:00

Arize invited Long Ouyang and Ryan Lowe to their podcast to talk about InstructGPT, the model ChatGPT is based on, and the whole content is 🔥.

Key takeaways:

The concept of alignment (a term popularised by Stuart Russell from Berkeley, I link an interview with him in the comments).
InstructGPT is based on GPT-3, but it is aware that it is getting instructions while the older model was only "tricked" into performing them.
What is a Reward Model, and how is it incorporated into the training process
The difference between RLHF (reinforcement learning from human feedback) and SFT (Supervised fine-tuning): RLHF is for fine-grain tuning, while SFT causes more significant shift in model behaviour.
Regarding prompts, the major improvement is that older models needed to be prompted in a specific "almost coded" language, and now they can be prompted more intuitively. They are less sensitive to the prompts but still steerable and, most importantly, "naturally" steerable.
(Full disclosure: I am an advisor of Arize AI.)

Further material

One interesting takeaway from the paper is the training cost of fine-tuning:

Due to the huge interest in ChatGPT, I plan to post regularly about it, so:

Or follow me on LinkedIn:

ncG1vNJzZmikkajHrbuNrKybq6SWsKx6wqikaKhfnru1sdGeqq2hnpx6pLvNrZynrF2eu7TA0a6arZ%2BgqXqzuMef

[Interesting content] InstructGPT, RLHF and SFT