About 44 results
Open links in new tab
  1. Proximal Policy Optimization — Spinning Up documentation

    PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of PPO supports parallelization …

  2. Proximal Policy Optimization - OpenAI

    Jul 20, 2017 · We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches …

  3. Deep Deterministic Policy Gradient — Spinning Up documentation …

    Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman equation to learn the Q-function, and uses …

  4. Part 3: Intro to Policy Optimization — Spinning Up documentation …

    In this section, we’ll discuss the mathematical foundations of policy optimization algorithms, and connect the material to sample code. We will cover three key results in the theory of policy …

  5. Twin Delayed DDPG — Spinning Up documentation - OpenAI

    Twin Delayed DDPG (TD3) is an algorithm that addresses this issue by introducing three critical tricks: Trick One: Clipped Double-Q Learning. TD3 learns two Q-functions instead of one …

  6. Soft Actor-Critic — Spinning Up documentation - OpenAI

    Soft Actor Critic (SAC) is an algorithm that optimizes a stochastic policy in an off-policy way, forming a bridge between stochastic policy optimization and DDPG-style approaches.

  7. Trust Region Policy Optimization — Spinning Up documentation

    TRPO is an on-policy algorithm. TRPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of TRPO supports parallelization …

  8. OpenAI Baselines: DQN

    May 24, 2017 · We’re open-sourcing OpenAI Baselines, our internal effort to reproduce reinforcement learning algorithms with performance on par with published results. We’ll …

  9. Key Papers in Deep RL — Spinning Up documentation - OpenAI

    Contribution: Systematic analysis of parallelization in deep RL across algorithms.

  10. Running Experiments — Spinning Up documentation - OpenAI

    Substitute ppo with ppo_tf1 for the Tensorflow version. clip_ratio , hid , and act are flags to set some algorithm hyperparameters. You can provide multiple values for hyperparameters to run …

Refresh