A w-shingling is a set of unique "shingles"—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set.

PropertyValue
p:abstract
  • A w-shingling is a set of unique "shingles"—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set. The document, "a rose is a rose is a rose" can be tokenized as follows: :(a,rose,is,a,rose,is,a,rose) The set of all contiguous sequences of 4 tokens (N-grams, here: 4-grams) is :{ (a,rose,is,a), (rose,is,a,rose), (is,a,rose,is), (a,rose,is,a), (rose,is,a,rose) } By removing duplicate elements from this set, a 4-shingling is obtained: :{ (a,rose,is,a), (rose,is,a,rose), (is,a,rose,is) } (en)
p:hasPhotoCollection
p:reference
rdfs:comment
  • A w-shingling is a set of unique "shingles"—contiguous subsequences of tokens in a document—that can be used to gauge the similarity of two documents. The w denotes the number of tokens in each shingle in the set. (en)
rdfs:label
  • W-shingling (en)
skos:subject
foaf:page