An Entity of Type: Thing, from Named Graph: http://dbpedia.org, within Data Space: dbpedia.org

Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function.

Property Value
dbo:abstract
  • Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function. (en)
dbo:wikiPageExternalLink
dbo:wikiPageID
  • 70774614 (xsd:integer)
dbo:wikiPageLength
  • 1786 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID
  • 1113497752 (xsd:integer)
dbo:wikiPageWikiLink
dbp:date
  • October 2022 (en)
dbp:reason
  • Both sources currently in the article are from OpenAI. First paper is by researcher's at OpenAI, second is to OpenAI's website. What developments have been published since 2017? (en)
dbp:wikiPageUsesTemplate
dcterms:subject
rdfs:comment
  • Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function. (en)
rdfs:label
  • Proximal Policy Optimization (en)
owl:sameAs
prov:wasDerivedFrom
foaf:isPrimaryTopicOf
is dbo:wikiPageDisambiguates of
is dbo:wikiPageWikiLink of
is foaf:primaryTopic of
Powered by OpenLink Virtuoso    This material is Open Knowledge     W3C Semantic Web Technology     This material is Open Knowledge    Valid XHTML + RDFa
This content was extracted from Wikipedia and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License