ACTIVE LEARNING FOR CONSTRAINT SELECTION BASED ON DENSITY PEAK
Main Article Content
Abstract
Semi-supervised clustering, which integrates auxiliary information to enhance clustering performance, has gained considerable attention in the research community. This approach leverages two primary types of side information: seeds (labeled data) and constraints (must-link and cannot-link relationships). By incorporating insights from users or domain experts, semi-supervised clustering can produce results that align more closely with user expectations. However, the quality of clustering outcomes is highly dependent on the side information provided, and different inputs can lead to varying results. In some cases, improper selection of side information may even negatively impact clustering performance. This paper addresses the critical issue of selecting high-quality constraints for semi-supervised clustering algorithms. To this end, we propose an active learning-based approach for constraint selection, which employs a min-max strategy and density-based peak estimation to optimize the selection process. Experimental evaluations on real-world datasets from UCI validate the effectiveness of our method, demonstrating significant improvements in clustering performance.
Keywords
Clustering, semi-supervised clustering, constraint, density peak, active learning, min-max method.
Article Details
References
[2] Kiri Wagstaff, Claire Cardie (2000), Clustering with Instance-level Constraints. ICML: 1103-1110.
[3] Viet-Vu Vu et al. (2022), Active constraints selection based on density peak. Proc. of International Conference on Advanced Communications Technology, 447-452.
[4] Nizar Grira, Michel Crucianu, Nozha Boujemaa (2008), Active semi-supervised fuzzy clustering. Pattern Recognit. 41(5): 1834-1844.
[5] Ahmad Ali Abin, Viet-Vu Vu (2020), A density-based approach for querying informative constraints for clustering. Expert Syst. Appl. 161: 113690.
[6] Viet-Vu Vu, Nicolas Labroche, Bernadette Bouchon-Meunier (2012), Improving constrained clustering with active query selection. Pattern Recognit. 45(4): 1749-1758.
[7] Sugato Basu, Arindam Banerjee, Raymond J. Mooney (2004), Active Semi-Supervision for Pairwise Constrained Clustering. SDM, 333-344.
[8] Viet-Vu Vu, Hong-Quan Do, Vu-Tuan Dang, Nang-Toan Do (2019), An efficient density-based clustering with side information and active learning: A case study for facial expression recognition task. Intell. Data Anal. 23(1): 227-240.
[9] W. M. Rand (1971), Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. American Statistical Association. 66, 846–850.
[10] Rodriguez, A.; Laio (2014), A. Clustering by fast search and find of density peaks. Science, 344, 1492.
[11] Mehmood, R.; Zhang, G.; Bie, R.; Dawood, H.; Ahmad, H. (2016), Clustering by fast search and find of density peaks via heat diusion. Neurocomputing, 208, 210-217.
[12] Wang, S.; Wang, D.; Li, C.; Li, Y.; Ding, G. (2016), Clustering by Fast Search and Find of Density Peaks with Data Field. Chin. J. Electron, 25, 397-402.
[13] Du, M.; Ding, S.; Jia, H. (2016), Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst., 99, 135-145.
[14] Burr Settles (2009), Active Learning Literature Survey. Computer Sciences Tech nical Report 1648, University of Wisconsin-Madison.