skip to main content
10.1145/1081870.1081882acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Non-redundant clustering with conditional ensembles

Published:21 August 2005Publication History

ABSTRACT

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.

References

  1. M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 81--88, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.Google ScholarGoogle Scholar
  3. G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.Google ScholarGoogle Scholar
  4. M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In Proceedings of the 15th Conference of the American Association for Artificial Intelligence, pages 509--516, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.Google ScholarGoogle Scholar
  6. B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.Google ScholarGoogle Scholar
  8. D. Gondek and T. Hofmann. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 75--82, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gordon. A survey of constrained classification. Computational Statistics and Data Analysis, 21:17--29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.Google ScholarGoogle Scholar
  11. D. Klein, S. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307--314, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  13. B. Minaei-Bidgoli, A. Topchy, and W. F. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology, volume 2, pages 188--192, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Mirkin. Reinterpreting the category utility function. Machine Learning, 45(2):219--218, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.Google ScholarGoogle Scholar
  16. A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. Journal of Machine Learning Research, 3:583--617, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.Google ScholarGoogle Scholar
  18. A. Topchy, M. Law, and A. K. Jain. Analysis of consensus partition in clustering ensemble. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 225--232, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Topchy, A. K. Jain, and W. Punch. Combining multiple weak clusterings. In Proceedings of the Third IEEE International Conference on Data Mining, pages 331--338, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.Google ScholarGoogle Scholar
  22. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103--1110, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Non-redundant clustering with conditional ensembles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
        August 2005
        844 pages
        ISBN:159593135X
        DOI:10.1145/1081870

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 August 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader