About: State–action–reward–state

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: State–action–reward–state–action Goto Sponge NotDistinct Permalink

An Entity of Type : owl:Thing, within Data Space : dbpedia.org associated with source document(s)
QRcode icon

http://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FState%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action&graph=http%3A%2F%2Fdbpedia.org&graph=http%3A%2F%2Fdbpedia.org

State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote.

Attributes	Values
rdfs:label	SARSA (fr) SARSA (it) State–action–reward–state–action (en) Алгоритм SARSA (uk) SARSA算法 (zh)
rdfs:comment	En intelligence artificielle, plus précisément en apprentissage par renforcement, SARSA est un algorithme d'apprentissage. Son nom est l'acronyme de State-Action-Reward-State-Action (Etat-Action-Récompense-Etat-Action). C'est un algorithme on-policy : il utilise la politique en train d'être apprise pour mettre à jour les valeurs internes apprises. (fr) SARSA算法是机器学习领域的一种强化学习算法，得名于“状态-动作-奖励-状态-动作”（State–Action–Reward–State–Action）的英文首字母缩写。 SARSA算法最早是由G.A. Rummery, M. Niranjan在1994年提出的，当时称为“改进型联结主义Q学习”（Modified Connectionist Q-Learning）。提出了使用替代名SARSA。 SARSA算法和Q学习算法的区别主要在期望奖励Q值的更新方法上。SARSA算法使用五元组(st, at, rt, st+1, at+1)来进行更新，其中s、a、r分别为马可夫决策过程（MDP）中的状态、动作、奖励，t和t+1分别为当前步和下一步。 (zh) State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. (en) Lo stato–azione–ricompensa–stato–azione (SARSA) è un algoritmo di apprendimento di una funzione di policy per i processi decisionali di Markov, usato nelle aree dell'apprendimento per rinforzo e dell'apprendimento automatico. Fu proposto da Rummery e Niranjan col nome di "Modified Connectionist Q-Learning" (MCQ-L). L'acronimo alternativo e con cui oggi è più noto l'algoritmo, SARSA, fu proposto da Rich Sutton. (it) SARSA — алгоритм пошуку стратегії Марковського процесу вирішування, який використовується в області навчання з підкріпленням машинного навчання. Цей алгоритм було запропоновано Гевіном Руммері та Махесаном Ніранжаном у статті з назвою «Modified Connectionist Q-Learning» (MCQ-L). Альтернативна назва SARSA, запропонована Річардом Саттоном, згадувалася лише як виноска. (uk)
dcterms:subject	Machine learning algorithms
Wikipage page ID	10584297 (xsd:integer)
Wikipage revision ID	1119455586 (xsd:integer)
Link from a Wikipage to another Wikipage	Q-learning Algorithm Richard S. Sutton Prefrontal cortex basal ganglia working memory Constructing skill trees Machine learning Temporal difference learning Machine learning algorithms Learning rate Reinforcement learning Markov decision process Sammon Mapping
sameAs	State–action–reward–state–action State–action–reward–state–action State–action–reward–state–action State–action–reward–state–action State–action–reward–state–action State–action–reward–state–action
dbp:wikiPageUsesTemplate	dbt:! dbt:Math dbt:Other_uses dbt:Reflist dbt:Machine_learning dbt:Differentiable_computing
has abstract	En intelligence artificielle, plus précisément en apprentissage par renforcement, SARSA est un algorithme d'apprentissage. Son nom est l'acronyme de State-Action-Reward-State-Action (Etat-Action-Récompense-Etat-Action). C'est un algorithme on-policy : il utilise la politique en train d'être apprise pour mettre à jour les valeurs internes apprises. (fr) State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. This name reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S1", the action the agent chooses "A1", the reward "R" the agent gets for choosing this action, the state "S2" that the agent enters after taking that action, and finally the next action "A2" the agent chooses in its new state. The acronym for the quintuple (st, at, rt, st+1, at+1) is SARSA. Some authors use a slightly different convention and write the quintuple (st, at, rt+1, st+1, at+1), depending on which time step the reward is formally assigned. The rest of the article uses the former convention. (en) Lo stato–azione–ricompensa–stato–azione (SARSA) è un algoritmo di apprendimento di una funzione di policy per i processi decisionali di Markov, usato nelle aree dell'apprendimento per rinforzo e dell'apprendimento automatico. Fu proposto da Rummery e Niranjan col nome di "Modified Connectionist Q-Learning" (MCQ-L). L'acronimo alternativo e con cui oggi è più noto l'algoritmo, SARSA, fu proposto da Rich Sutton. Questo acronimo, infatti, sta a indicare che la funzione principale di aggiornamento dei valori di Q dipende esclusivamente dallo stato attuale st, dall'azione at che l'agente sceglie, dalla ricompensa rt, dallo stato st+1 in cui si entra dopo aver effettuato at e dall'azione at+1 che l'agente sceglie nel nuovo stato; formalmente, quindi, SARSA rappresenta la quintupla (st, at, rt, st+1, at+1). (it) SARSA — алгоритм пошуку стратегії Марковського процесу вирішування, який використовується в області навчання з підкріпленням машинного навчання. Цей алгоритм було запропоновано Гевіном Руммері та Махесаном Ніранжаном у статті з назвою «Modified Connectionist Q-Learning» (MCQ-L). Альтернативна назва SARSA, запропонована Річардом Саттоном, згадувалася лише як виноска. Ця назва означає, що оновлення Q-функції залежить від поточного стану агента S1, дії A1, яку агент обирає, винагороди R, яку отримує агент за вибір цієї дії, стану S2, в який переходить агент після виконання цієї дії, та, нарешті, наступної дії А2, яку агент обирає виходячи зі свого нового стану. Скорочення букв (st, at, rt, st + 1, at + 1) і дає назву SARSA. Деякі автори використовують інший підхід і записують набір букв у вигляді (st, at, rt + 1, st + 1, at + 1), залежно від того, за який крок агента формально дається винагорода. В решті статті використовується перша домовленість. (uk) SARSA算法是机器学习领域的一种强化学习算法，得名于“状态-动作-奖励-状态-动作”（State–Action–Reward–State–Action）的英文首字母缩写。 SARSA算法最早是由G.A. Rummery, M. Niranjan在1994年提出的，当时称为“改进型联结主义Q学习”（Modified Connectionist Q-Learning）。提出了使用替代名SARSA。 SARSA算法和Q学习算法的区别主要在期望奖励Q值的更新方法上。SARSA算法使用五元组(st, at, rt, st+1, at+1)来进行更新，其中s、a、r分别为马可夫决策过程（MDP）中的状态、动作、奖励，t和t+1分别为当前步和下一步。 (zh)
prov:wasDerivedFrom	wikipedia-en:State–action–reward–state–action?oldid=1119455586&ns=0
page length (characters) of wiki page	5612 (xsd:nonNegativeInteger)
foaf:isPrimaryTopicOf	wikipedia-en:State–action–reward–state–action
is Link from a Wikipage to another Wikipage of	Q-learning List of algorithms Prefrontal cortex basal ganglia working memory Constructing skill trees State-action-reward-state-action Reinforcement learning State-Action-Reward-State-Action Sammon mapping Outline of machine learning
is Wikipage redirect of	State-action-reward-state-action State-Action-Reward-State-Action
is foaf:primaryTopic of	wikipedia-en:State–action–reward–state–action

Faceted Search & Find service v1.17_git139 as of Feb 29 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3330 as of Mar 19 2024, on Linux (x86_64-generic-linux-glibc212), Single-Server Edition (378 GB total memory, 67 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software