1999. Its essence is to use the

Self-hosted database solution offering control and scalability.
Post Reply
rriiffaatt77
Posts: 5
Joined: Mon Dec 23, 2024 4:09 pm

1999. Its essence is to use the

Post by rriiffaatt77 »

MCTS uses the action probability distribution provided by the policy network and the situation evaluation results provided by the value network to guide the search. ) Strategy Update: Based on the results of self-play, use reinforcement learning to update the neural network parameters so that the model can gradually learn better strategies. Self-play Learning, RLHF More 8. Ilya Sutskever believed that reinforcement learning and self-play are one of the most critical methods on the path to AGI. Ilya summarized reinforcement learning in one sentence: Let the AI ​​try new tasks using random paths. If the effect exceeds expectations, update the neural network weights so that the AI ​​remembers to use this successful event more and starts the next attempt.



) Difference between traditional reinforcement belgium email list learning and self-play: The biggest difference between traditional reinforcement learning and today's reinforcement learning is that the model of a reinforcement learning algorithm (such as AlphaZero) is a neural network with tens of millions of parameters, which is different from today's self-play learning. Language models differ by - orders of magnitude. ) The difference between self-play learning and RLHF: The purpose of RLHF is not to acquire machine intelligence, but to harmonize humans and machines so that AI can be more like humans, but it cannot surpass humans and become super intelligence.



Simply put: RLHF, like humans, prefer things that are easy to understand, rather than content that is more logical. The goal of self-play learning is how to improve logical ability, with greater absolute power, even surpassing the strongest humans and experts. ) The essence of RLHF is to train a language model through assisted learning, but due to the lack of the necessary reward function factor, the reward function needs to be learned by collecting feedback from humans. ) Reinforcement learning is not a model, but a complete system, which contains many factors. First, reinforcement learning involves agents, and an agent is a model.
Post Reply