Open Access Open Access  Restricted Access Subscription or Fee Access

Frequent Itemset Mining Using Normalized FP-Growth Algorithm

J Anitha, T Nithya, A Deepika, A Jeeva

Abstract


The growing interest in data storage has made the data size to be exponentially improved, hampering the process of knowledge discovery from these large volumes of high-dimensional and heterogeneous data. In recent years, many efficient algorithms for mining data associations have been proposed, facing up time and main memory requirements. However, this mining process could still become tough when the number of items and records is enormously high. In this paper, the goal is not to propose new efficient algorithms but a new data structure that could be used by a variety of existing algorithms without modifying its original schema. Thus, our goal is to speed up the association rule mining process regardless the algorithm used to this end, enabling the performance of efficient implementations to be enhanced. The structure simplifies, reorganizes, and speeds up the data access by sorting data by means of a shuffling strategy based on the hamming distance, which attain similar values to be closer, and considering both an inverted index mapping and a run length encoding compression. In the experimental study, we explore the bounds of the algorithms’ performance by using a wide number of data sets that comprise either thousands or millions of both items and records. The results demonstrate the utility of the proposed data structure in enhancing the algorithms’ runtime orders of magnitude, and substantially reducing both the auxiliary and the main memory requirements.

Full Text:

PDF

References


R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases, In: Proc 20th Int’l Conf. Very Large Data Bases (VLDB). 1994, 487–99p.

R. Agrawal, R. Srikant. Privacy-preserving data mining, In: Proc. ACM SIGMOD Conf. 2000, 439–50p.

D. Beaver, S. Micali, P. Rogaway. The round complexity of secure protocols, In: Proc. 22nd Ann. ACM Symp. Theory of Computing (STOC). 1990, 503–13p.

M. Bellare, R. Canetti, H. Krawczyk. Keying hash functions for message authentication, In: Proc. 16th Ann. Int’l Cryptology Conf. Advances in Cryptology (Crypto). 1996, 1–15p.

A. Ben-David, N. Nisan, B. Pinkas. FairplayMP – a system for secure multi-party computation, In: Proc. 15th ACM Conf. Computer and Comm. Security (CCS). 2008; 257–66p.

J.C. Benaloh. Secret sharing homomorphisms: keeping shares of a secret secret, Proc Adv Cryptol. 1986, 251–60p.

J. Brickell, V. Shmatikov. Privacy-preserving graph algorithms in the semi-honest model, In: Proc. 11th Int’l Conf. Theory and Application of Cryptology and Information Security (ASIACRYPT). 2005, 236–52p.

D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, Y. Fu. A fast distributed algorithm for mining association rules, In: Proc. Fourth Int’l Conf. Parallel and Distributed Information Systems (PDIS). 1996, 31–42p.

D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, Y. Fu. Efficient mining of association rules in distributed databases, IEEE Trans Knowled Data Eng. 1996; 8(6).

T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans Inform Theory. 1985; IT-31(4).

A.V. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke. Privacy preserving mining of association rules, In: Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD). 2002, 217–28p.

R. Fagin, M. Naor, P. Winkler. Comparing Information without Leaking It, Comm ACM. 1996; 39: 77–85p.

M. Freedman, Y. Ishai, B. Pinkas, O. Reingold. Keyword search and oblivious pseudorandom functions, In: Proc. Second Int’l Conf. Theory of Cryptography (TCC). 2005; 303–24p.

M.J. Freedman, K. Nissim, B. Pinkas. Efficient private matching and set intersection, In: Proc. Int’l Conf. Theory and Applications of Cryptographic Techniques (EUROCRYPT). 2004; 1–19p.

O. Goldreich, S. Micali, A. Wigderson. “How to play any mental game or a completeness theorem for protocols with honest majority, In: Proc. 19th Ann. ACM Symp. Theory of Computing (STOC). 1987, 218–29p.

H. Grosskreutz, B. Lemmen, S. R€uping. Secure distributed subgroup discovery in horizontally partitioned data, Trans Data Privacy. 2011; 4(3): 147–65p.

W. Jiang, C. Clifton. A secure distributed framework for achieving k-anonymity, VLDB J. 2006; 15: 316–33p.

M. Kantarcioglu, C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data, IEEE Trans Knowled Data Eng. 2004; 16(9): 1026–37p.

M. Kantarcioglu, R. Nix, J. Vaidya. An efficient approximate protocol for privacy-preserving association rule mining, In: Proc. 13th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD). 2009; 515–24p.

L. Kissner, D.X. Song. Privacy-preserving set operations, In: Proc. 25th Ann. Int’l Cryptology Conf. (CRYPTO). 2005; 241–57p.

T. Nithya. M. Vigneshwaran, M. Suganya, M. Thiyagarajan, B.Tech (IT), Velalar College of Engineering and Technology, Web image re-ranking using query-specific semantic signatures, Int J Eng Technol Sci. 2015; 2(3).

T. Nithya, S. Gayathri, E. Krishma. Mining user queries with Markov chain: application to content based image retrieval system, Int J Innov Emerg Res Eng. 2016; 3(2).

T. Nithya, V. Pavithra, S.V. Shreekeerthana, S. Yogapriya. Search and safe exchange of real-time video on mobile cloud, Int J Comput Sci Trends Technol. 2016; 4(1)




DOI: https://doi.org/10.37628/ijods.v3i1.249

Refbacks

  • There are currently no refbacks.