About: Model-free (reinforcement learning)

Property	Value
dbo:abstract	In reinforcement learning (RL), a model-free algorithm (as opposed to a one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP), which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm. An example of a model-free algorithm is Q-learning. (en) У навчанні з підкріпленням безмодельний алгоритм (на відміну від заснованого на моделі) — це алгоритм, який не використовує розподіл ймовірностей переходу і функцію винагороди, що пов'язані з Марковським процесом вирішування (МПВ), відображаючим необхідну для розв'язання проблему. Розподіл ймовірностей переходу (або модель переходу) і функцію винагороди зазвичай загалом називають «моделлю» середовища (або МПВ), звідси й назва «безмодельний». Алгоритм безмодельного навчання з підкріпленням можна розглядати як «явний» алгоритм спроб і помилок. Прикладом безмодельного алгоритму може слугувати Q-навчання. (uk)
dbo:wikiPageID	60008386 (xsd:integer)
dbo:wikiPageLength	2656 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1009400678 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Q-learning dbc:Reinforcement_learning dbr:Proximal_Policy_Optimization dbr:Reinforcement_learning dbr:Trial_and_error dbr:Markov_decision_process dbr:Asynchronous_Advantage_Actor-Critic_Algorithm dbr:Deep_Deterministic_Policy_Gradient dbr:Model-based_(reinforcement_learning) dbr:Soft_Actor-Critic dbr:Trust_Region_Policy_Optimization dbr:Twin_Delayed_Deep_Deterministic_Policy_Gradient
dbp:wikiPageUsesTemplate	dbt:Refimprove dbt:Short_description dbt:Machine_learning_bar
dcterms:subject	dbc:Reinforcement_learning
rdfs:comment	In reinforcement learning (RL), a model-free algorithm (as opposed to a one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP), which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm. An example of a model-free algorithm is Q-learning. (en) У навчанні з підкріпленням безмодельний алгоритм (на відміну від заснованого на моделі) — це алгоритм, який не використовує розподіл ймовірностей переходу і функцію винагороди, що пов'язані з Марковським процесом вирішування (МПВ), відображаючим необхідну для розв'язання проблему. Розподіл ймовірностей переходу (або модель переходу) і функцію винагороди зазвичай загалом називають «моделлю» середовища (або МПВ), звідси й назва «безмодельний». Алгоритм безмодельного навчання з підкріпленням можна розглядати як «явний» алгоритм спроб і помилок. Прикладом безмодельного алгоритму може слугувати Q-навчання. (uk)
rdfs:label	Model-free (reinforcement learning) (en) Безмодельне навчання (uk)
owl:sameAs	wikidata:Model-free (reinforcement learning) dbpedia-uk:Model-free (reinforcement learning) dbpedia-vi:Model-free (reinforcement learning) https://global.dbpedia.org/id/9GPJT
prov:wasDerivedFrom	wikipedia-en:Model-free_(reinforcement_learning)?oldid=1009400678&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Model-free_(reinforcement_learning)
is dbo:wikiPageWikiLink of	dbr:Q-learning dbr:Glossary_of_artificial_intelligence dbr:Proximal_Policy_Optimization dbr:Temporal_difference_learning dbr:Sample_complexity
is foaf:primaryTopic of	wikipedia-en:Model-free_(reinforcement_learning)