Perancangan Aplikasi Deteksi Plagiarisme dengan TF-IDF dan Cosine Similarity

Authors

DOI:

https://doi.org/10.30998/nz32fq32

Keywords:

Python, Cosine Similarity, TF-IDF, Natural Language Processing, Deteksi Plagiarisme

Abstract

Plagiarism is a serious challenge in the academic world that threatens scientific integrity. Access to commercial plagiarism detection tools like Turnitin is often limited due to high costs, while free tools have word count limitations and data privacy issues. This research aims to develop an offline plagiarism detection application based on natural language processing using the term frequency-inverse document frequency (TF-IDF) and cosine similarity algorithms. This application is designed to operate without an internet connection and without word count limitations and guarantees data security. The development method uses Python 3.13 with the PySide6 framework for the user interface and SQLite as the database. The test results on 1 test document and 9 comparison documents show a detection accuracy with an overall score of 9.38% (SkLearn method). The processing time for each is 8.55 seconds. This application is expected to be an alternative solution for students and educational institutions in independently and safely detecting plagiarism.

Downloads

Download data is not yet available.

References

Curtis, G. J., & Tremayne, K. (2021). Is Plagiarism Really On The Rise? Results From Four 5-yearly Surveys. Studies in Higher Education, 46(9), 1816–1826. https://doi.org/10.1080/03075079.2019.1707792.

Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic Plagiarism Detection: A Systematic Literature Review. Dalam ACM Computing Surveys (Vol. 52, Nomor 6). Association for Computing Machinery. https://doi.org/10.1145/3345317.

Gregory, A., & Leeman, J. (2021). On the Perception of Plagiarism in Academia: Context and Intent. http://arxiv.org/abs/2104.00574.

Halim, J., & Lasut, D. (2024). Document Plagiarism Detection Application Using Web-Based TF-IDF and Cosine Similarity Methods. bit-Tech, 7(2), 202–213. https://doi.org/10.32877/bt.v7i2.1697.

Ihle, C., Schubotz, M., Meuschke, N., & Gipp, B. (2020). A First Step Towards Content Protecting Plagiarism Detection. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 341–344. https://doi.org/10.1145/3383583.3398620.

Karim, A. M., & Zakiyah, E. (2023). Students’ Perception of Plagiarism as a Form of Academic Students’ Perception of Plagiarism as a Form of Academic Dishonesty (Case Study on Students of UIN Syarif Jakarta). https://digitalcommons.unl.edu/libphilprac/8001.

Meidelfi, D., Rahmayuni, I., Hidayat, T., & Chandra, D. (2021). TF-IDF Implementation for Similarity Checker on The Final Project Title. Dalam International Journal of Advanced Science Computing and Engineering (Vol. 3, Nomor 1). https://doi.org/https://doi.org/10.62527/ijasce.3.1.3.

Riyani, A., Zidny Naf’an, M., & Burhanuddin, A. (2019). Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen. Dalam JLK (Vol. 2, Nomor 1). https://www.academia.edu/download/81084477/19.pdf.

Singh, S. K., Singh, A., Tiwari, A., Kumar, M., & Chauhan, C. (2023). Plagiarism Checker Using TF-IDF, Cosine Similarity and Jaccard Similarity (Vol. 8, Nomor 5). https://ijnrd.org/papers/IJNRD2305788.pdf.

Wibowo, M., Quix, C., Hussien, N. S., Yuliansyah, H., & Adhinata, F. D. (2022). Similarity Identification of Large-Scale Biomedical Documents Using Cosine Similarity and Parallel Computing. Knowledge Engineering and Data Science, 4(2), 105. https://doi.org/10.17977/um018v4i22021p105-116.

Downloads

Published

2026-04-15

How to Cite

Arkan, S., Sulistiono, H., & Setiadi, I. (2026). Perancangan Aplikasi Deteksi Plagiarisme dengan TF-IDF dan Cosine Similarity. Jurnal Riset Dan Aplikasi Mahasiswa Informatika (JRAMI), 7(02), 230-239. https://doi.org/10.30998/nz32fq32