About: Proximal Policy Optimization

An Entity of Type: Thing, from Named Graph: http://dbpedia.org, within Data Space: dbpedia.org

Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function.

Property	Value
dbo:abstract	Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function. (en)
dbo:wikiPageExternalLink	https://github.com/openai/baselines/tree/master/baselines/ https://openai.com/blog/openai-baselines-ppo/
dbo:wikiPageID	70774614 (xsd:integer)
dbo:wikiPageLength	1786 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1113497752 (xsd:integer)
dbo:wikiPageWikiLink	dbc:Reinforcement_learning dbr:Game_theory dbr:Temporal_difference_learning dbc:Machine_learning_algorithms dbr:Reinforcement_learning dbr:Model-free_(reinforcement_learning) dbr:OpenAI dbr:Policy_gradient_method
dbp:date	October 2022 (en)
dbp:reason	Both sources currently in the article are from OpenAI. First paper is by researcher's at OpenAI, second is to OpenAI's website. What developments have been published since 2017? (en)
dbp:wikiPageUsesTemplate	dbt:More_citations_needed dbt:Reflist dbt:Short_description dbt:Compu-AI-stub dbt:Machine_learning
dcterms:subject	dbc:Reinforcement_learning dbc:Machine_learning_algorithms
rdfs:comment	Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms developed at OpenAI in 2017. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs. PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity. It is done by using a different objective function. (en)
rdfs:label	Proximal Policy Optimization (en)
owl:sameAs	wikidata:Proximal Policy Optimization https://global.dbpedia.org/id/GXCj7
prov:wasDerivedFrom	wikipedia-en:Proximal_Policy_Optimization?oldid=1113497752&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Proximal_Policy_Optimization
is dbo:wikiPageDisambiguates of	dbr:PPO
is dbo:wikiPageWikiLink of	dbr:OpenAI_Five dbr:PPO dbr:Reinforcement_learning dbr:Model-free_(reinforcement_learning)
is foaf:primaryTopic of	wikipedia-en:Proximal_Policy_Optimization