Tom LuAug 84 minChai-GPT. RLHF Part I: Reward ModellingA deep-dive into Reinforcement Learning with Human Feedback (RLHF), to improve user retention.