PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA | Chung | TNU Journal of Science and Technology

PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA

About this article

Received: 12/10/23                Revised: 06/11/23                Published: 06/11/23

Authors

1. Nguyen Van Chung Email to author, Vinh Phuc Technology - Economic College
2. Nguyen Van Tao, TNU - University of Information and Communication Technology

Abstract


The data mining process can reveal sensitive information about individuals or organizations thereby violating their privacy. The main purpose of the field of privacy preserving data mining is to develop various techniques to find valuable knowledge or information while still keeping sensitive data and private information for the owners. Up to now, there have been many solutions proposed, however these solutions either have low efficiency or do not ensure privacy. This article builds a privacy preserving Naive Bayes classifier solution in a multi-member classifier for horizontally partitioned data scenario based on the application of the secure sum protocol. The proposed protocol is assessed as good privacy, accuracy and efficiency in comparison to contemporary solutions. To confirm the effectiveness of the proposed solution, in the experimental part, the author used the python programming language to visualize the results. The author specifically, build a privacy preserving Naive Bayes classifier solution for the spam message detection model. Experimental results show that the proposed solution has good applicability in practice.

Keywords


Partitioned data; Horizontally partitioned; Privacy; Accuracy; Semi-honest model

References


[1] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical Secure Aggregation for Privacy-Preserving Machine Learning,” in CCS’17, ACM, 2017, pp. 1175–1191, doi: 10.1145/3133956.3133982.

[2] G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren, “Secure, privacy-preserving and federated machine learning in medical imaging,” Nat. Mach. Intell., vol. 2, no. 6, pp. 305–311, Jun. 2020, doi: 10.1038/s42256-020-0186-1.

[3] X. Zhou, K. Xu, N. Wang, J. Jiao, N. Dong, M. Han, and H. Xu, “A Secure and Privacy-Preserving Machine Learning Model Sharing Scheme for Edge-Enabled IoT,” IEEE Access, vol. 9, pp. 17256–17265, 2021, doi: 10.1109/ACCESS.2021.3051945.

[4] E. Zorarpacı and S. A. Özel, “Privacy preserving classification over differentially private data,” WIREs Data Min. Knowl. Discov., vol. 11, no. 3, pp. 1–20, May 2021, doi: 10.1002/widm.1399.

[5] O. Goldreich, “Foundations of Cryptography,” in volume II, Basic Applications, Cambridge University Press, 1998, p. 108.

[6] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, pp. 211–407, 2014.

[7] M. Kantarcioglu, J. Vaidya, and C. Clifton, “Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data,” IEEE ICDM Workshop Priv. Preserv. Data Min., 2003, pp. 3–9.

[8] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” ACM SIGKDD Explor. Newsl., vol. 4, no. 2, pp. 28–34, 2002, doi: 10.1145/772862.772867.

[9] B. Schneier, Applied Cryptography, 2nd ed. John Wiley & Sons, 1996.

[10] Z. Yang, S. Zhong, and R. N. Wright, “Privacy-Preserving Classification of Customer Data without Loss of Accuracy,” in Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM, 2005, pp. 92–102, doi: 10.1137/1.9781611972757.9.

[11] M. Huai, L. Huang, W. Yang, L. Li, and M. Qi, “Privacy-preserving Naive Bayes classification,” in International conference on knowledge science, engineering and management, Springer, Cham, 2015, pp. 627–638, doi: 10.1007/978-3-319-25159-2_57.

[12] P. Li, T. Li, H. Ye, J. Li, X. Chen, and Y. Xiang, “Privacy-preserving machine learning with multiple data providers,” Future Gener. Comput. Syst., vol. 87, pp. 341–350, Oct. 2018, doi: 10.1016/j.future.2018.04.076.

[13] T. Li, J. Li, Z. Liu, P. Li, and C. Jia, “Differentially private Naive Bayes learning over multiple data sources,” Inf. Sci., vol. 444, pp. 89–104, May 2018, doi: 10.1016/j.ins.2018.02.056.

[14] V. C. Nguyen, “A general secure sum protocol,” J. Sci. Tech., vol. 11, no. 1, Jun. 2022, doi: 10.56651/lqdtu.jst.v11.n01.362.ict.

[15] Kaggle, “SMS Spam Collection Dataset,” 2017. [Online]. Available: https://www.kaggle.com/uciml/ sms-spam-collection-dataset. [Accessed Jun. 13, 2022].

[16] PyCryptodome, “Welcome to PyCryptodome’s documentation — PyCryptodome 3.15.0 documentation,” 2021. [Online]. Available: https://pycryptodome.readthedocs.io/en/latest/index.html. [Accessed Jul. 10, 2022].




DOI: https://doi.org/10.34238/tnu-jst.8980

Refbacks

  • There are currently no refbacks.
TNU Journal of Science and Technology
Rooms 408, 409 - Administration Building - Thai Nguyen University
Tan Thinh Ward - Thai Nguyen City
Phone: (+84) 208 3840 288 - E-mail: jst@tnu.edu.vn
Based on Open Journal Systems
©2018 All Rights Reserved