Correlation clustering operates in a scenario where the relationship between the objects is known instead of the actual representation of the objects. For example, given a fully connected graph <math>G=(V,E)</math> where the edge label indicates whether two nodes are similar (+) or different (&minus), the task is to cluster the vertices such that similar objects are grouped together.

PropertyValue
dbpprop:abstract
  • Correlation clustering operates in a scenario where the relationship between the objects is known instead of the actual representation of the objects. For example, given a fully connected graph <math>G=(V,E)</math> where the edge label indicates whether two nodes are similar (+) or different (&minus), the task is to cluster the vertices such that similar objects are grouped together. Unlike other clustering algorithms this doesn't require to choose the number of clusters <math>k</math> to be specified because the objective is to minimize the disagreements and is independent of the number of clusters. Notice that, it may not be possible to find a perfect clustering i.e. all the similar items are in a cluster while all dissimilar vertices are in different clusters. If the graph indeed admits a perfect clustering, then simply deleting all the negative edges and finding the connected components in the remaining graph will return the required clusters. But, in general a graph may not have a perfect clustering. For example, given nodes {a,b,c} such that (a,b) and (a,c) are similar while (b,c) are dissimilar a perfect clustering is not possible. In such cases, the task is to find a clustering that maximizes the number of agreements (number of + edges inside clusters plus the number of - edges between clusters) or minimizes the number of disagreements (the number of - edges inside clusters plus the number of + edges between clusters). This problem of maximizing the agreements is NP-complete (Multiway cut problem reduces to maximizing weighted agreements and the problem of partitioning into triangles can be reduced to unweighted version) Bansal et al. discusses the NP-completeness proof and also presents both a constant factor approximation algorithm and polynomial-time approximation scheme to find the clusters in this setting. Ailon et al. propose a randomized 3-approximation algorithm for the same problem. CC-Pivot(G=) Pick random pivot i ∈ V Set <math>C=\{i\}</math>, V'=Ø For all j ∈ V, j ≠ i; If (i,j) ∈ E then Add j to C Else (If ∈ E) Add j to V' Let G' be the subgraph induced by V' Return clustering C,CC-Pivot(G') The authors show that the above algorithm is a 3-approximation algorithm for correlation clustering.
rdfs:comment
  • Correlation clustering operates in a scenario where the relationship between the objects is known instead of the actual representation of the objects. For example, given a fully connected graph <math>G=(V,E)</math> where the edge label indicates whether two nodes are similar (+) or different (&minus), the task is to cluster the vertices such that similar objects are grouped together.
rdfs:label
  • Correlation clustering
skos:subject
foaf:page