An Entity of Type: Thing, from Named Graph: http://dbpedia.org, within Data Space: dbpedia.org

The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Given two objects, A and B, each with n binary attributes, SMC is defined as: where: is the total number of attributes where A and B both have a value of 0. is the total number of attributes where A and B both have a value of 1. is the total number of attributes where the attribute of A is 0 and the attribute of B is 1. is the total number of attributes where the attribute of A is 1 and the attribute of B is 0.

Property Value
dbo:abstract
  • The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Given two objects, A and B, each with n binary attributes, SMC is defined as: where: is the total number of attributes where A and B both have a value of 0. is the total number of attributes where A and B both have a value of 1. is the total number of attributes where the attribute of A is 0 and the attribute of B is 1. is the total number of attributes where the attribute of A is 1 and the attribute of B is 0. The simple matching distance (SMD), which measures dissimilarity between sample sets, is given by . SMC is linearly related to Hamann similarity: . Also, , where is the squared Euclidean distance between the two objects (binary vectors) and n is the number of attributes. The SMC is very similar to the more popular Jaccard index. The main difference is that the SMC has the term in its numerator and denominator, whereas the Jaccard index does not. Thus, the SMC counts both mutual presences (when an attribute is present in both sets) and mutual absence (when an attribute is absent in both sets) as matches and compares it to the total number of attributes in the universe, whereas the Jaccard index only counts mutual presence as matches and compares it to the number of attributes that have been chosen by at least one of the two sets. In market basket analysis, for example, the basket of two consumers who we wish to compare might only contain a small fraction of all the available products in the store, so the SMC will usually return very high values of similarities even when the baskets bear very little resemblance, thus making the Jaccard index a more appropriate measure of similarity in that context. For example, consider a supermarket with 1000 products and two customers. The basket of the first customer contains salt and pepper and the basket of the second contains salt and sugar. In this scenario, the similarity between the two baskets as measured by the Jaccard index would be 1/3, but the similarity becomes 0.998 using the SMC. In other contexts, where 0 and 1 carry equivalent information (symmetry), the SMC is a better measure of similarity. For example, vectors of demographic variables stored in dummy variables, such as binary gender, would be better compared with the SMC than with the Jaccard index since the impact of gender on similarity should be equal, independently of whether male is defined as a 0 and female as a 1 or the other way around. However, when we have symmetric dummy variables, one could replicate the behaviour of the SMC by splitting the dummies into two binary attributes (in this case, male and female), thus transforming them into asymmetric attributes, allowing the use of the Jaccard index without introducing any bias. By using this trick, the Jaccard index can be considered as making the SMC a fully redundant metric. The SMC remains, however, more computationally efficient in the case of symmetric dummy variables since it does not require adding extra dimensions. The Jaccard index is also more general than the SMC and can be used to compare other data types than just vectors of binary attributes, such as probability measures. (en)
  • 简单匹配系数(英語:simple matching coefficient,缩写SMC),又称为兰德相似系数(Rand similarity coefficient),是用于比较样本信合之间相似性与多样性的统计量。 假设两个对象A与B分别有n个二值属性,则SMC的定义为: 其中 表示A与B的数值都为1的属性数量;表示A的数值为0、而B的数值为1的属性数量;表示A的数值为1、而B的数值为0的属性数量;表示A与B的数值都为0的属性数量。 类似地,可以定义简单匹配距离(simple matching distance,缩写SMD)为,用于量度样本集合间的不相似度。 SMC与汉明相似度间呈线性关系:。而其与欧基里得距离间的关系为,其中n为属性总数。SMC与雅卡尔指数也很相似,区别在于在雅卡尔指数的定义中分子与分母都没有项。 (zh)
dbo:wikiPageID
  • 45040494 (xsd:integer)
dbo:wikiPageLength
  • 4845 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID
  • 1108599533 (xsd:integer)
dbo:wikiPageWikiLink
dbp:wikiPageUsesTemplate
dcterms:subject
rdfs:comment
  • 简单匹配系数(英語:simple matching coefficient,缩写SMC),又称为兰德相似系数(Rand similarity coefficient),是用于比较样本信合之间相似性与多样性的统计量。 假设两个对象A与B分别有n个二值属性,则SMC的定义为: 其中 表示A与B的数值都为1的属性数量;表示A的数值为0、而B的数值为1的属性数量;表示A的数值为1、而B的数值为0的属性数量;表示A与B的数值都为0的属性数量。 类似地,可以定义简单匹配距离(simple matching distance,缩写SMD)为,用于量度样本集合间的不相似度。 SMC与汉明相似度间呈线性关系:。而其与欧基里得距离间的关系为,其中n为属性总数。SMC与雅卡尔指数也很相似,区别在于在雅卡尔指数的定义中分子与分母都没有项。 (zh)
  • The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Given two objects, A and B, each with n binary attributes, SMC is defined as: where: is the total number of attributes where A and B both have a value of 0. is the total number of attributes where A and B both have a value of 1. is the total number of attributes where the attribute of A is 0 and the attribute of B is 1. is the total number of attributes where the attribute of A is 1 and the attribute of B is 0. (en)
rdfs:label
  • Simple matching coefficient (en)
  • 简单匹配系数 (zh)
owl:sameAs
prov:wasDerivedFrom
foaf:isPrimaryTopicOf
is dbo:wikiPageWikiLink of
is foaf:primaryTopic of
Powered by OpenLink Virtuoso    This material is Open Knowledge     W3C Semantic Web Technology     This material is Open Knowledge    Valid XHTML + RDFa
This content was extracted from Wikipedia and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License