A Two-Phase Educational Data Clustering Method Based On Transfer Learning And Kernel K-Means


  • Phùng Hứa Nguyễn HCMC University Of Technology
  • Châu Võ Thị Ngọc


Educational data clustering, kmeans, transfer learning, unsupervised domain adaptation, kernel-induced Euclidean distance


In this paper, we propose a two-phase educational data clustering method based on transfer learning and kernel k-means algorithms. The method is a solution to the student data clustering task on a small target data set associated with a target program while a much larger source data set associated with another source program is available. A small target data set might be insufficient for the clustering task in a high-dimensional space. Therefore, our method decides a transfer learning process in the first phase to exploit both unlabeled target and source data sets in order to find a more significant representation of the instances in the target domain. A number of new features are derived in the first phase using spectral clustering on the domain-independent features and the domain-specific features of both target and source domains. These new features are used for enhancing the target data space. In the second phase, our method performs the kernel k-means algorithm to form the clusters of students in the enhanced target feature space. In fact, these clusters become arbitrarily shaped in the enhanced target data space with more compactness and separation. As compared to the existing works in the educational data mining area, our clustering task as well as its corresponding method is novel for clustering the similar students into the proper groups based on their study performance at the program level. In addition, the experimental results and statistical tests on real data sets have confirmed the effectiveness of our method with the better clusters of higher quality in comparison with the clusters from the other approaches.