ACTIVE LEARNING FOR CONSTRAINT SELECTION BASED ON DENSITY PEAK

Gia Bao Pham1, Quoc Viet Dinh1, Tuan Linh Le1, Van Linh Phung1, Khanh Ha1, Việt Vũ Vũ2, , Doan Vinh Tran1, Viet Thang Vu1
1 CMC University
2 Trường Đại học CMC

Main Article Content

Abstract

Semi-supervised clustering, which integrates auxiliary information to enhance clustering performance, has gained considerable attention in the research community. This approach leverages two primary types of side information: seeds (labeled data) and constraints (must-link and cannot-link relationships). By incorporating insights from users or domain experts, semi-supervised clustering can produce results that align more closely with user expectations. However, the quality of clustering outcomes is highly dependent on the side information provided, and different inputs can lead to varying results. In some cases, improper selection of side information may even negatively impact clustering performance. This paper addresses the critical issue of selecting high-quality constraints for semi-supervised clustering algorithms. To this end, we propose an active learning-based approach for constraint selection, which employs a min-max strategy and density-based peak estimation to optimize the selection process. Experimental evaluations on real-world datasets from UCI validate the effectiveness of our method, demonstrating significant improvements in clustering performance.

Article Details

References

[1] S. Basu, I. Davidson, and K. L. Wagstaff (2008), Constrained Clustering: Advances in Algorithms, Theory, and Applications, Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, 1st edn.,.
[2] Kiri Wagstaff, Claire Cardie (2000), Clustering with Instance-level Constraints. ICML: 1103-1110.
[3] Viet-Vu Vu et al. (2022), Active constraints selection based on density peak. Proc. of International Conference on Advanced Communications Technology, 447-452.
[4] Nizar Grira, Michel Crucianu, Nozha Boujemaa (2008), Active semi-supervised fuzzy clustering. Pattern Recognit. 41(5): 1834-1844.
[5] Ahmad Ali Abin, Viet-Vu Vu (2020), A density-based approach for querying informative constraints for clustering. Expert Syst. Appl. 161: 113690.
[6] Viet-Vu Vu, Nicolas Labroche, Bernadette Bouchon-Meunier (2012), Improving constrained clustering with active query selection. Pattern Recognit. 45(4): 1749-1758.
[7] Sugato Basu, Arindam Banerjee, Raymond J. Mooney (2004), Active Semi-Supervision for Pairwise Constrained Clustering. SDM, 333-344.
[8] Viet-Vu Vu, Hong-Quan Do, Vu-Tuan Dang, Nang-Toan Do (2019), An efficient density-based clustering with side information and active learning: A case study for facial expression recognition task. Intell. Data Anal. 23(1): 227-240.
[9] W. M. Rand (1971), Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. American Statistical Association. 66, 846–850.
[10] Rodriguez, A.; Laio (2014), A. Clustering by fast search and find of density peaks. Science, 344, 1492.
[11] Mehmood, R.; Zhang, G.; Bie, R.; Dawood, H.; Ahmad, H. (2016), Clustering by fast search and find of density peaks via heat diusion. Neurocomputing, 208, 210-217.
[12] Wang, S.; Wang, D.; Li, C.; Li, Y.; Ding, G. (2016), Clustering by Fast Search and Find of Density Peaks with Data Field. Chin. J. Electron, 25, 397-402.
[13] Du, M.; Ding, S.; Jia, H. (2016), Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst., 99, 135-145.
[14] Burr Settles (2009), Active Learning Literature Survey. Computer Sciences Tech nical Report 1648, University of Wisconsin-Madison.