Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function.
Property | Value |
---|---|
dbo:abstract |
|
dbo:wikiPageExternalLink | |
dbo:wikiPageID |
|
dbo:wikiPageLength |
|
dbo:wikiPageRevisionID |
|
dbo:wikiPageWikiLink | |
dbp:date |
|
dbp:reason |
|
dbp:wikiPageUsesTemplate | |
dcterms:subject | |
rdfs:comment |
|
rdfs:label |
|
owl:sameAs | |
prov:wasDerivedFrom | |
foaf:isPrimaryTopicOf | |
is dbo:wikiPageDisambiguates of | |
is dbo:wikiPageWikiLink of | |
is foaf:primaryTopic of |