About: Multi-armed bandit

Property	Value
dbo:abstract	El problema de la màquina escurabutxaques es pot esquematitzar de la manera següent: * S'és davant dues màquines escurabutxaques * L'una, , està en funcionament. Retorna per tant 1 euro per fitxa amb una probabilitat coneguda. * L'altra, , està espatllada, i retorna per tant 1 euro per fitxa amb una probabilitat desconeguda. * Es disposa de fitxes. Què fer per maximitzar raonablement el guany ? (ca) En teoría de la probabilidad, el problema del bandido multibrazo (también llamado (problema del bandido de N o K brazos) es un problema en el que un jugador ante una fila de tragaperras (también denominadas "bandidos de un solo brazo") tiene que decidir con qué máquinas juega, y en qué orden. Cuando juega, cada tragaperras devuelve una recompensa aleatoria derivada de la distribución de probabilidad específica de la máquina. El objetivo del jugador es maximizar la suma de las recompensas obtenidas a través de una secuencia de máquinas. En versiones iniciales de este problema, el jugador no tiene información inicial sobre las máquinas. El compromiso esencial que el jugador debe abordar en cada iteración es entonces entre la explotación de la máquina con mayor recompensa esperada y la exploración del resto para obtener más información sobre las recompensas esperadas de las demás tragaperras. Este compromiso entre exploración y explotación también aparece en el contexto de aprendizaje reforzado (reinforced learning). * Datos: Q2882343 (es) In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling. In the problem, each machine provides a random reward from a probability distribution specific to that machine, that is not known a-priori. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls. The crucial tradeoff the gambler faces at each trial is between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also faced in machine learning. In practice, multi-armed bandits have been used to model problems such as managing research projects in a large organization, like a science foundation or a pharmaceutical company. In early versions of the problem, the gambler begins with no initial knowledge about the machines. Herbert Robbins in 1952, realizing the importance of the problem, constructed convergent population selection strategies in "some aspects of the sequential design of experiments". A theorem, the Gittins index, first published by John C. Gittins, gives an optimal policy for maximizing the expected discounted reward. (en) En mathématiques, plus précisément en théorie des probabilités, le problème du bandit manchot (généralisable en problème du bandit à K bras ou problème du bandit à N bras) se formule de manière imagée de la façon suivante : un utilisateur (un agent), face à des machines à sous, doit décider quelles machines jouer. Chaque machine donne une récompense moyenne que l'utilisateur ne connait pas a priori. L'objectif est de maximiser le gain cumulé de l'utilisateur. C'est un exemple d'apprentissage par renforcement. Typiquement, la politique de l'utilisateur oscille entre exploitation (utiliser la machine dont il a appris qu'elle récompense beaucoup) et exploration (tester une autre machine pour espérer gagner plus). Le problème de bandit manchot peut être vu comme un processus de décision markovien avec un seul état. (fr) 多腕バンディット問題（たわんばんでぃっともんだい、Multi-armed bandit problem）は、確率論と機械学習において、一定の限られた資源のセットを競合する選択肢間で、期待利得を最大化するように配分しなければならない問題。それぞれの選択肢の特性が、配分時には一部しか分かっておらず、時間が経過したり選択肢に資源が配分されることで理解できる可能性がある。これは、探索 exploration と搾取 exploitation のトレードオフのジレンマを例証する古典的な強化学習の問題である。この名前は、スロットマシン（単腕バンディットとも呼ばれる）の列で、どのマシンをプレイするか、各マシンを何回プレイするか、どの順番でプレイするか、現在のマシンを続けるか別のマシンを試すかを決めなければならないギャンブラーを想像することに由来している。多腕バンディット問題も、広義の確率的スケジューリングに分類される。 (ja) У теорії ймовірностей та машинному навчанні задача багаторукого бандита (яку іноді називають задачею K- або N-рукого бандита) — це задача розподілу обмеженої множини ресурсів між конкуруючими альтернативами таким чином, щоб максимізувати очікуваний виграш, коли властивості кожного варіанту відомі лише частково на момент ухвалення рішення, і можуть стати краще зрозумілими з плином часу або шляхом розподілу ресурсів для реалізації варіанту. Це класична задача навчання з підкріпленням, яка є прикладом дилеми балансу між дослідженням та розвідкою. Назва походить від уявного гравця на низці ігрових автоматів (їх часто називають «однорукими бандитами»), який має вирішити, на яких автоматах варто грати, скільки разів варто грати на кожному автоматі та в якому порядку слід грати, і чи продовжувати з поточним автоматом або спробувати інший. Проблема багаторуких бандитів також підпадає під широку категорію . У цій задачі кожен автомат забезпечує випадкову винагороду відповідно до розподілу ймовірностей, який властивий для цього автомату. Мета гравця — максимізувати суму винагород, яку він отримує відповідно у випадку задіяння обраної послідовності важелів. Принципова проблема, з якою стикається гравець на кожному кроці, полягає в тому, що він має зробити вибір, між «експлуатацією» автомата, який має найвищий очікуваний прибуток, і «розвідкою», щоб отримати більше інформації про очікувані виграші інших автоматів. Питання компромісу між розвідкою та експлуатацією також виникає у машинному навчанні. На практиці багаторукі бандити використовувались для моделювання таких задач, як управління дослідницькими проектами у великій організації, як науковий фонд або фармацевтична компанія. У ранніх формулюваннях задачі гравець починає взаємодію без початкових знань про автомати. Герберт Роббінс у 1952 році, усвідомлюючи важливість цієї задачі, побудував збіжні стратегії відбору сукупності в «деяких аспектах послідовного планування експериментів». Теорема про , вперше опублікована , дає оптимальну стратегію максимізації очікуваної винагороди з урахуванням коефіцієнта знецінювання. (uk)
dbo:thumbnail	wiki-commons:Special:FilePath/Las_Vegas_slot_machines.jpg?width=300
dbo:wikiPageExternalLink	https://mpatacchiola.github.io/blog/2017/08/14/dissecting-reinforcement-learning-6.html http://homes.di.unimi.it/~cesabian/Pubblicazioni/banditSurvey.pdf http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html https://pavlov.tech/2019/03/02/animated-multi-armed-bandit-policies/ https://web.archive.org/web/20131211192714/http:/webdocs.cs.ualberta.ca/~sutton/book/the-book.html https://feynmanlectures.caltech.edu/info/exercises/Feynmans_restaurant_problem.html http://techtalks.tv/talks/54451/ http://techtalks.tv/talks/54455/ https://mloss.org/software/view/415/ http://bandit.sourceforge.net https://archive.today/20121212095047/http:/www.cs.washington.edu/research/jair/volume4/kaelbling96a-html/node6.html https://arxiv.org/abs/1508.03326 https://github.com/Nth-iteration-labs/contextual https://github.com/fmr-llc/mabwiser https://github.com/jkomiyama/banditlib http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html https://semanticscholar.org/paper/e4fe28113fed71999a0db30a930e0b42d3ce55f1
dbo:wikiPageID	2854828 (xsd:integer)
dbo:wikiPageLength	69336 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1124037510 (xsd:integer)
dbo:wikiPageWikiLink	dbc:Sequential_experiments dbc:Sequential_methods dbr:Probability_distribution dbr:Bayes'_theorem dbr:Annals_of_Applied_Probability dbr:Peter_Whittle_(mathematician) dbr:Ridge_regression dbr:Thompson_sampling dbr:Open_source dbr:Search_theory dbr:Gambler dbr:Germany dbr:Concept_drift dbr:Optimal_stopping dbr:Machine_learning dbr:Slot_machines dbr:Clinical_trial dbr:Pharmaceutical_industry dbr:Portfolio_(finance) dbc:Machine_learning dbr:Gittins_index dbr:Adaptive_routing dbr:Medical_ethics dbr:Nonparametric_regression dbr:Probability_theory dbr:Random_forest dbr:Regret_(decision_theory) dbr:Reinforcement_learning dbr:Herbert_Robbins dbc:Stochastic_optimization dbr:Asymptotic dbr:John_C._Gittins dbr:Collaborative_filtering dbr:Bulletin_of_the_American_Mathematical_Society dbr:Softmax_function dbr:Greedy_algorithm dbr:Michael_Katehakis dbr:R_(programming_language) dbr:World_War_II dbr:Markov_decision_process dbr:Stochastic_scheduling dbr:Iterated_prisoner's_dilemma dbr:Singular-value_decomposition dbr:Open-Source dbr:Condorcet_winner dbr:Voting_paradoxes dbr:File:Las_Vegas_slot_machines.jpg dbr:File:Framework_of_UCB-ALP_for_Constrained_Contextual_Bandits.jpg dbr:File:The_Jet_Propulsion_Laboratory_(9416811752).jpg dbr:Wikt:one-armed_bandit
dbp:wikiPageUsesTemplate	dbt:Citation dbt:Citation_needed dbt:See dbt:Short_description dbt:Scholia dbt:Differentiable_computing
dcterms:subject	dbc:Sequential_experiments dbc:Sequential_methods dbc:Machine_learning dbc:Stochastic_optimization
gold:hypernym	dbr:Problem
rdf:type	yago:WikicatSequentialExperiments yago:WikicatSequentialMethods yago:Ability105616246 yago:Abstraction100002137 yago:Act100030358 yago:Activity100407535 yago:Cognition100023271 yago:Event100029378 yago:Experiment100639556 yago:Investigation100633864 yago:Know-how105616786 yago:Method105660268 yago:PsychologicalFeature100023100 yago:Research100636921 yago:Work100575741 yago:YagoPermanentlyLocatedEntity dbo:Disease yago:ScientificResearch100641820
rdfs:comment	El problema de la màquina escurabutxaques es pot esquematitzar de la manera següent: * S'és davant dues màquines escurabutxaques * L'una, , està en funcionament. Retorna per tant 1 euro per fitxa amb una probabilitat coneguda. * L'altra, , està espatllada, i retorna per tant 1 euro per fitxa amb una probabilitat desconeguda. * Es disposa de fitxes. Què fer per maximitzar raonablement el guany ? (ca) 多腕バンディット問題（たわんばんでぃっともんだい、Multi-armed bandit problem）は、確率論と機械学習において、一定の限られた資源のセットを競合する選択肢間で、期待利得を最大化するように配分しなければならない問題。それぞれの選択肢の特性が、配分時には一部しか分かっておらず、時間が経過したり選択肢に資源が配分されることで理解できる可能性がある。これは、探索 exploration と搾取 exploitation のトレードオフのジレンマを例証する古典的な強化学習の問題である。この名前は、スロットマシン（単腕バンディットとも呼ばれる）の列で、どのマシンをプレイするか、各マシンを何回プレイするか、どの順番でプレイするか、現在のマシンを続けるか別のマシンを試すかを決めなければならないギャンブラーを想像することに由来している。多腕バンディット問題も、広義の確率的スケジューリングに分類される。 (ja) En teoría de la probabilidad, el problema del bandido multibrazo (también llamado (problema del bandido de N o K brazos) es un problema en el que un jugador ante una fila de tragaperras (también denominadas "bandidos de un solo brazo") tiene que decidir con qué máquinas juega, y en qué orden. Cuando juega, cada tragaperras devuelve una recompensa aleatoria derivada de la distribución de probabilidad específica de la máquina. El objetivo del jugador es maximizar la suma de las recompensas obtenidas a través de una secuencia de máquinas. * Datos: Q2882343 (es) En mathématiques, plus précisément en théorie des probabilités, le problème du bandit manchot (généralisable en problème du bandit à K bras ou problème du bandit à N bras) se formule de manière imagée de la façon suivante : un utilisateur (un agent), face à des machines à sous, doit décider quelles machines jouer. Chaque machine donne une récompense moyenne que l'utilisateur ne connait pas a priori. L'objectif est de maximiser le gain cumulé de l'utilisateur. (fr) In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether (en) У теорії ймовірностей та машинному навчанні задача багаторукого бандита (яку іноді називають задачею K- або N-рукого бандита) — це задача розподілу обмеженої множини ресурсів між конкуруючими альтернативами таким чином, щоб максимізувати очікуваний виграш, коли властивості кожного варіанту відомі лише частково на момент ухвалення рішення, і можуть стати краще зрозумілими з плином часу або шляхом розподілу ресурсів для реалізації варіанту. Це класична задача навчання з підкріпленням, яка є прикладом дилеми балансу між дослідженням та розвідкою. Назва походить від уявного гравця на низці ігрових автоматів (їх часто називають «однорукими бандитами»), який має вирішити, на яких автоматах варто грати, скільки разів варто грати на кожному автоматі та в якому порядку слід грати, і чи продовжувати (uk)
rdfs:label	El problema de la màquina escurabutxaques (ca) Bandido multibrazo (es) Bandit manchot (mathématiques) (fr) Multi-armed bandit (en) 多腕バンディット問題 (ja) Багаторукий бандит (uk)
owl:sameAs	freebase:Multi-armed bandit yago-res:Multi-armed bandit wikidata:Multi-armed bandit dbpedia-ca:Multi-armed bandit dbpedia-es:Multi-armed bandit dbpedia-fr:Multi-armed bandit dbpedia-ja:Multi-armed bandit dbpedia-uk:Multi-armed bandit https://global.dbpedia.org/id/2g5DN
prov:wasDerivedFrom	wikipedia-en:Multi-armed_bandit?oldid=1124037510&ns=0
foaf:depiction	wiki-commons:Special:FilePath/Framework_of_UCB-ALP_for_Constrained_Contextual_Bandits.jpg wiki-commons:Special:FilePath/The_Jet_Propulsion_Laboratory_(9416811752).jpg wiki-commons:Special:FilePath/Las_Vegas_slot_machines.jpg
foaf:isPrimaryTopicOf	wikipedia-en:Multi-armed_bandit
is dbo:knownFor of	dbr:Michael_Katehakis
is dbo:wikiPageDisambiguates of	dbr:Bandit_(disambiguation) dbr:Mab
is dbo:wikiPageRedirects of	dbr:Collaborative_bandit dbr:Approximate_solutions_of_the_multi-armed_bandit_problem dbr:Epsilon-greedy_strategy dbr:E-greedy_strategy dbr:Multiarmed_bandit dbr:Multi–armed_bandit dbr:Two-armed_bandit dbr:Two_armed_bandit dbr:Bandit_(machine_learning) dbr:Bandit_model dbr:Bandit_problem dbr:Bandit_process dbr:K_armed_bandit dbr:Contextual_bandit_algorithm dbr:Adversarial_bandit dbr:N-armed_bandit dbr:K-armed_bandit dbr:Multi-arm_bandit dbr:Multi-armed_bandit_problem dbr:Multi-armed_bandits dbr:Multi_armed_bandit dbr:N_armed_bandit
is dbo:wikiPageWikiLink of	dbr:Bayesian_optimization dbr:Metalearning_(neuroscience) dbr:Probabilistic_numerics dbr:Peter_Whittle_(mathematician) dbr:Vowpal_Wabbit dbr:Design_of_experiments dbr:Dynamic_treatment_regime dbr:Thompson_sampling dbr:Online_machine_learning dbr:Search_theory dbr:Collaborative_bandit dbr:Glossary_of_artificial_intelligence dbr:Convergent_thinking dbr:Creativity dbr:Approximate_solutions_of_the_multi-armed_bandit_problem dbr:Emilie_Kaufmann dbr:John_Langford_(computer_scientist) dbr:Medoid dbr:A/B_testing dbr:Dual_control_theory dbr:Gittins_index dbr:Adaptive_design_(medicine) dbr:Bretagnolle–Huber_inequality dbr:Nicolò_Cesa-Bianchi dbr:History_of_statistics dbr:Epsilon-greedy_strategy dbr:Reinforcement_learning dbr:Herbert_Robbins dbr:Bandit_(disambiguation) dbr:E-greedy_strategy dbr:Greedy_algorithm dbr:Michael_Katehakis dbr:Recommender_system dbr:Mab dbr:Slot_machine dbr:UCB dbr:Nerd_sniping dbr:List_of_statistics_articles dbr:Wisdom_of_the_crowd dbr:Multiarmed_bandit dbr:Multi–armed_bandit dbr:Tournament_solution dbr:Randomized_weighted_majority_algorithm dbr:Reward-based_selection dbr:Outline_of_machine_learning dbr:Stochastic_scheduling dbr:Tsetlin_machine dbr:Two-armed_bandit dbr:Two_armed_bandit dbr:Bandit_(machine_learning) dbr:Bandit_model dbr:Bandit_problem dbr:Bandit_process dbr:K_armed_bandit dbr:Contextual_bandit_algorithm dbr:Adversarial_bandit dbr:N-armed_bandit dbr:K-armed_bandit dbr:Multi-arm_bandit dbr:Multi-armed_bandit_problem dbr:Multi-armed_bandits dbr:Multi_armed_bandit dbr:N_armed_bandit
is foaf:primaryTopic of	wikipedia-en:Multi-armed_bandit