Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Journal of Information Science
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
0165551508100382v1
35/3/358    most recent
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Mahafzah, B. A.
Right arrow Articles by Zakaria, M. Z.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

A new sampling technique for association rule mining

Basel A. Mahafzah

King Abdullah School for Information Technology, University of Jordan, Jordan, b.mahafzah{at}ju.edu.jo

Amer F. Al-Badarneh

School of Computer and Information Technology, Jordan University of Science & Technology, Jordan

Mohammed Z. Zakaria

School of Computer and Information Technology, Jordan University of Science & Technology, Jordan

Association Rule Mining (ARM) is one of the data mining techniques used to extract hidden knowledge from datasets, that can be used by an organization's decision makers to improve overall profit. However, performing ARM requires repeated passes over the entire database. Obviously, for large database, the role of input/output overhead in scanning the database is very significant. A popular solution to improve the speed of ARM is to apply the mining algorithm on a sample instead of the entire database. In this paper, a parameterized sampling algorithm for ARM is presented. This algorithm extracts sample datasets based on three parameters: transaction frequency, transaction length and transaction frequency-length. To evaluate its performance and accuracy, a comparison against a two-phase sampling-based algorithm is performed using real and synthetic datasets. The experimental results show that the proposed sampling algorithm in some cases outperforms two-phase sampling algorithm, and achieves up to 98% accuracy.

Key Words: sampling • parameterized sampling • data reduction • data mining • association rule mining • information retrieval

This version was published on June 1, 2009

Journal of Information Science, Vol. 35, No. 3, 358-376 (2009)
DOI: 10.1177/0165551508100382


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?