In statistics, data-snooping bias is a form of statistical bias generated by the misuse of data mining techniques which can lead to bogus results in scientific research. Although data-snooping biases can occur in any field that uses data mining, data snooping biases are a particular concern in finance and medical research, both of which make heavy use of data mining techniques.
| Property | Value |
| dbpprop:abstract
|
- In statistics, data-snooping bias is a form of statistical bias generated by the misuse of data mining techniques which can lead to bogus results in scientific research. Although data-snooping biases can occur in any field that uses data mining, data snooping biases are a particular concern in finance and medical research, both of which make heavy use of data mining techniques. In the process of data mining, huge numbers of hypotheses about a single data set can be tested in a very short time, by exhaustively searching for combinations of variables that might show a correlation. Because conventional tests of statistical significance are based on the probability that an observation arose by chance, it is reasonable to expect that 5% of randomly chosen hypotheses will turn out to be significant at the 5% level, 0.1% will turn out to be significant at the 0.1% significance level, and so on, simply by chance. Thus, given enough hypotheses tested, it is virtually certain that some of them will appear to be highly statistically significant, even on a data set with no real correlations at all. Researchers who are using data mining techniques can be easily misled by these apparently significant results, even though they are merely chance artifacts. Data-snooping bias most commonly occurs when researchers have not formed an hypothesis in advance, and therefore are open to any hypothesis suggestions presented by the data; or when researchers narrow the data used in order to reduce the probability of the sample refuting a specific hypothesis.
|
| dbpprop:date
| |
| dbpprop:hasPhotoCollection
| |
| dbpprop:reference
| |
| dbpprop:wikiPageUsesTemplate
| |
| rdfs:comment
|
- In statistics, data-snooping bias is a form of statistical bias generated by the misuse of data mining techniques which can lead to bogus results in scientific research. Although data-snooping biases can occur in any field that uses data mining, data snooping biases are a particular concern in finance and medical research, both of which make heavy use of data mining techniques.
|
| rdfs:label
| |
| owl:sameAs
| |
| skos:subject
| |
| foaf:page
| |
| is dbpprop:disambiguates
of | |
| is dbpprop:redirect
of | |