Course

Skills

Reinforcement Learning from Human Feedback (RLHF)

by Jerry Kurata

In this course we explore one corner of the expanding AI universe, and review some of the basic principles found in reinforcement learning from human feedback (RLHF), the technology underlying great AI tools such as ChatGPT, Bard, and more.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Level

Beginner

Updated

Oct 31, 2023

Duration

40m

What you'll learn

Have you ever wondered how tools like ChatGPT and Bard are able to generate great responses to the questions we pose? How they can respond to a prompt like “Plan a trip to Italy this fall and suggest great things to see,” and produce a response containing a full itinerary with places to see, the best time to visit, and the sites you shouldn't miss?

In this course, Reinforcement Learning from Human Feedback (RLHF), you’ll gain the ability to understand what is going on behind the scenes to create responses to your prompts.

First, you’ll explore why having all the information available is not enough to create a great response.

Next, you’ll discover how we teach a machine learning model to handle all that data and craft a response that people like.

Finally, you’ll learn how none of it is magic, just some really great engineering by some bright people.

When you’re finished with this course, you’ll have the skills and knowledge of reinforcement learning with human feedback needed to understand how this great engineering works and produces its amazing results.

Course Overview

1min

Course Overview 2m

Understanding Text-generative Applications

6mins

Understanding Text-generative Applications 7m

What Is Wrong with the Pre-trained GPT Model?

5mins

What Is Wrong with the Pre-trained GPT Model? 5m

Supervised Fine-tuning

4mins

Supervised Fine-tuning 5m

Reward Model Training

11mins

Reinforcement Learning Components 5m
Applying Reinforcement Learning 7m

Fine-tuning via Reinforcement Learning

5mins

Fine-tuning via Reinforcement Learning 5m

Challenges and Limitations of RLHF

5mins

Challenges and Limitations of RLHF 5m

About the author

Jerry Kurata

Jerry has Bachelor of Science degrees in Geology and Physics. His plans to work in the oil exploration industry were sidetracked when he discovered he preferred to work with computers on simulation and data processing, instead of reading mud and core samples in the North Sea. His love of computers and tech resulted in him spending many additional hours working on computers while getting his Master’s degree in Computer Science. His current areas of interests include Machine Learning, Big Data,... more

See more courses by Jerry Kurata

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Level

Beginner

Updated

Oct 31, 2023

Duration

40m

Ready to upskill? Get started

Contact Sales

Reinforcement Learning from Human Feedback (RLHF)

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Reinforcement Learning from Human Feedback (RLHF)

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?