Comparison of Naive Bayes Method, K-NN (K-Nearest Neighbor) and Decision Tree for Predicting the Graduation of â€˜Aisyiyah University Students of Yogyakarta

Tikaridha Hardiani

doi:10.31101/ijhst.v2i1.1829

Authors

Tikaridha Hardiani Department of Information Technology, Science and Technology Faculty, University â€˜Aisyiyah Yogyakarta, Indonesia

DOI:

https://doi.org/10.31101/ijhst.v2i1.1829

Keywords:

data mining, prediction, student, graduation, decicion tree, naive bayes, K-NN

Abstract

The students of Universitas â€˜Aisyiyah Yogyakarta have been increasing including the number of students in the Faculty of Health Sciences. In 2016 the total number of UNISA students was 1851. The increasing number of students every year leads to great numbers of data stored in the university database. The data provide useful information for the university to predict student graduation or student study period whether they graduate on time with a study period of 4 years or late with a study period of more than 4 years. This can be processed by using a data mining technique that is the classification technique. Data needed in the classification technique are data of students who have graduated as training data and data of students who are still studying in the university as testing data. The training data were 501 records with 10 goals and the testing data were 428 records. Data mining process method used was the Cross-Industry Standard Prosses for Data Mining (CRISPDM). The algorithms used in this study were Naive Bayes, K-Nearest Neighbor (KNN) and Decision Tree. The three algorithms were compared to see the accuracy by using Rapidminer software. Based on the accuracy, it was found that the K-NN algorithm was the best in predicting student graduation with an accuracy of 91.82%. The K-NN algorithm showed that 100% of the students of Nursing study program of Universitas Aisyiyah Yogyakarta are predicted to graduate on time.

Downloads

Download data is not yet available.

References

A. Nadali and H. E. Nosratabadi, â€œEvaluating the Success Level of Data Mining Projects Based on CRISP-DM Methodology by a Fuzzy Expert System,â€ IEEE, pp. 161â€“165, 2011.

A. Saleh, â€œImplementasi Metode Klasifikasi NaÃ¯ve Bayes dalam Memprediksi Besarnya Penggunaan Listrik Rumah Tangga,â€ Citec J., vol. 2, no. 3, pp. 207â€“217, 2015.

A. Rakhman, â€œMenggunakan Metode Decision Tree Berbasis Particle Swarm Optimation ( PSO ),â€ Smart Camp, vol. 6, no. 1, pp. 193â€“197, 2017.

C. Catley, K. Smith, C. Mcgregor, and M. Tracy, â€œExtending CRISP-DM to Incorporate Temporal Data Mining of Multi- dimensional Medical Data Streams : A Neonatal Intensive Care Unit Case Study,â€ pp. 0â€“4, 2009.

D. Iskandar and Y. K. Suprapto, â€œPerbandingan akurasi klasifikasi tingkat kemiskinan antara algoritma C4 . 5 dan NaÃ¯ve Bayes Clasifier,â€ vol. 11, no. 1, pp. 14â€“17, 2013.

D. Sartika, D. I. Sensuse, U. Indo, G. Mandiri, and F. I. Komputer, â€œPerbandingan Algoritma Klasifikasi Naive Bayes , Nearest Neighbour , dan Decision Tree pada Studi Kasus Pengambilan Keputusan Pemilihan Pola Pakaian,â€ J. Tek. Inform. dan Sist. Inf., vol. 1, no. 2, pp. 151â€“161, 2017.

D. T. Larose and C. D. Larose, Discovering Knowledge in Data. 2014.

E. Parilla-ferrer, P. L. F. Jr, and J. T. B. Iv, â€œAutomatic Classification of Disaster- Related Tweets,â€ in International conference on Innovative Engineering Technologies (ICIETâ€™2014), 2015, no. September.

E. R. Paramita Mayadewi, â€œPrediksi Nilai Proyek Akhir Mahasiswa Menggunakan Algoritma Klasifikasi Data Mining,â€ Sist. Inf., vol. 11, no. November, pp. 1â€“7, 2015.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2012.

L. R. Fielitz and D. K. Scott, â€œPrediction of Physical Performance Using Association Rule Mining,â€ 2002.

M. Azmi and F. Sarmadi, â€œImproving the accuracy of K-nearest neighbour method in long-lead hydrological forecasting,â€ Sci. Iran., vol. 23, no. 3, pp. 856â€“863, 2016.

S. Pulakkazhy, â€œData Mining In Banking And Its Applications-A Review,â€ J. Comput. Sci., vol. 9, no. 10,pp. 1252â€“1259, 2013.

T. Hardiani, â€œSegmentasi Nasabah Simpanan Menggunakan Fuzzy C Means Dan Fuzzy Rfm ( Recency , Frequency , Monetary ) Pada Bmt Xyz,â€ Nero, vol. 3, no. 3, pp. 185â€“192, 2018.

Y. Kumar, G. Sahoo, and G. Yadav, â€œPredication of Parkinsonâ€²s disease using data mining methods: A comparative analysis of tree, statistical, and support vector machine classifiers,â€ Indian J. Med. Sci., vol. 65, no. 6, p. 231, 2011.