Perancangan Aplikasi Deteksi Plagiarisme dengan TF-IDF dan Cosine Similarity
DOI:
https://doi.org/10.30998/nz32fq32Keywords:
Python, Cosine Similarity, TF-IDF, Natural Language Processing, Deteksi PlagiarismeAbstract
Plagiarism is a serious challenge in the academic world that threatens scientific integrity. Access to commercial plagiarism detection tools like Turnitin is often limited due to high costs, while free tools have word count limitations and data privacy issues. This research aims to develop an offline plagiarism detection application based on natural language processing using the term frequency-inverse document frequency (TF-IDF) and cosine similarity algorithms. This application is designed to operate without an internet connection and without word count limitations and guarantees data security. The development method uses Python 3.13 with the PySide6 framework for the user interface and SQLite as the database. The test results on 1 test document and 9 comparison documents show a detection accuracy with an overall score of 9.38% (SkLearn method). The processing time for each is 8.55 seconds. This application is expected to be an alternative solution for students and educational institutions in independently and safely detecting plagiarism.
Downloads
References
Curtis, G. J., & Tremayne, K. (2021). Is Plagiarism Really On The Rise? Results From Four 5-yearly Surveys. Studies in Higher Education, 46(9), 1816–1826. https://doi.org/10.1080/03075079.2019.1707792.
Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic Plagiarism Detection: A Systematic Literature Review. Dalam ACM Computing Surveys (Vol. 52, Nomor 6). Association for Computing Machinery. https://doi.org/10.1145/3345317.
Gregory, A., & Leeman, J. (2021). On the Perception of Plagiarism in Academia: Context and Intent. http://arxiv.org/abs/2104.00574.
Halim, J., & Lasut, D. (2024). Document Plagiarism Detection Application Using Web-Based TF-IDF and Cosine Similarity Methods. bit-Tech, 7(2), 202–213. https://doi.org/10.32877/bt.v7i2.1697.
Ihle, C., Schubotz, M., Meuschke, N., & Gipp, B. (2020). A First Step Towards Content Protecting Plagiarism Detection. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 341–344. https://doi.org/10.1145/3383583.3398620.
Karim, A. M., & Zakiyah, E. (2023). Students’ Perception of Plagiarism as a Form of Academic Students’ Perception of Plagiarism as a Form of Academic Dishonesty (Case Study on Students of UIN Syarif Jakarta). https://digitalcommons.unl.edu/libphilprac/8001.
Meidelfi, D., Rahmayuni, I., Hidayat, T., & Chandra, D. (2021). TF-IDF Implementation for Similarity Checker on The Final Project Title. Dalam International Journal of Advanced Science Computing and Engineering (Vol. 3, Nomor 1). https://doi.org/https://doi.org/10.62527/ijasce.3.1.3.
Riyani, A., Zidny Naf’an, M., & Burhanuddin, A. (2019). Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen. Dalam JLK (Vol. 2, Nomor 1). https://www.academia.edu/download/81084477/19.pdf.
Singh, S. K., Singh, A., Tiwari, A., Kumar, M., & Chauhan, C. (2023). Plagiarism Checker Using TF-IDF, Cosine Similarity and Jaccard Similarity (Vol. 8, Nomor 5). https://ijnrd.org/papers/IJNRD2305788.pdf.
Wibowo, M., Quix, C., Hussien, N. S., Yuliansyah, H., & Adhinata, F. D. (2022). Similarity Identification of Large-Scale Biomedical Documents Using Cosine Similarity and Parallel Computing. Knowledge Engineering and Data Science, 4(2), 105. https://doi.org/10.17977/um018v4i22021p105-116.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Shandy Arkan, Heru Sulistiono, Irawan Setiadi (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.





