Article

Non-redundant clustering with conditional ensembles

Authors:
David Gondek

IBM T. J. Watson Research Center, Hawthorne, NY

IBM T. J. Watson Research Center, Hawthorne, NY
View Profile

,
Thomas Hofmann

Fraunhofer IPSI, Darmstadt, Germany

Fraunhofer IPSI, Darmstadt, Germany
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 70–77https://doi.org/10.1145/1081870.1081882

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 70–77

ABSTRACT

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.

References

M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 81--88, 2004. Google ScholarDigital Library
L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.Google Scholar
G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.Google Scholar
M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In Proceedings of the 15th Conference of the American Association for Artificial Intelligence, pages 509--516, 1998. Google ScholarDigital Library
I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.Google Scholar
B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002. Google ScholarDigital Library
M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.Google Scholar
D. Gondek and T. Hofmann. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 75--82, 2004. Google ScholarDigital Library
A. Gordon. A survey of constrained classification. Computational Statistics and Data Analysis, 21:17--29, 1996. Google ScholarDigital Library
J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.Google Scholar
D. Klein, S. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307--314, 2002. Google ScholarDigital Library
M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.Google ScholarCross Ref
B. Minaei-Bidgoli, A. Topchy, and W. F. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology, volume 2, pages 188--192, 2004. Google ScholarDigital Library
B. Mirkin. Reinterpreting the category utility function. Machine Learning, 45(2):219--218, 2001. Google ScholarDigital Library
M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.Google Scholar
A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. Journal of Machine Learning Research, 3:583--617, 2002. Google ScholarDigital Library
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.Google Scholar
A. Topchy, M. Law, and A. K. Jain. Analysis of consensus partition in clustering ensemble. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 225--232, 2004. Google ScholarDigital Library
A. Topchy, A. K. Jain, and W. Punch. Combining multiple weak clusterings. In Proceedings of the Third IEEE International Conference on Data Mining, pages 331--338, 2003. Google ScholarDigital Library
A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.Google ScholarCross Ref
S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.Google Scholar
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103--1110, 2000. Google ScholarDigital Library
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.Google Scholar

Index Terms

Non-redundant clustering with conditional ensembles
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A clustering ensemble

The aim of clustering ensemble is to combine multiple base partitions into a robust, stable and accurate partition. One of the key problems of clustering ensemble is how to exploit the cluster structure information in each base partition. Evidence ...
Read More
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...
Read More
A robust adaptive clustering analysis method for automatic identification of clusters

Identifying the optimal cluster number and generating reliable clustering results are necessary but challenging tasks in cluster analysis. The effectiveness of clustering analysis relies not only on the assumption of cluster number but also on the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster ensembles
non-redundant clustering
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 839
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Non-redundant clustering with conditional ensembles

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A clustering ensemble

Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

A robust adaptive clustering analysis method for automatic identification of clusters