Font Size:
Cross Validation Method in Frequent Itemset Mining
Last modified: 2011-10-04
Abstract
We suggest a new method for frequent itemset mining which is based on well known cross validation method from artificial intelligence and machine learning. In non optimized version we partition the database into two subsets. First, we choose one of the subsets for training, and the other for testing. From the training subset we mine frequent itemsets and use testing subset to calculate itemsets' support in whole database. We then swap the roles of the subsets, so that the previous training set becomes the test set and vice versa. Again we mine all frequent itemsets from training subset and use the other set to calculate supports in whole database. In this approach each record is used exactly once for training and once for testing, which means that the database is read just twice. Optimized version is based on the idea to use all known information about itemsets from the first step when we run the second step. This will reduce the number of itemsets to be considered.
Full Text:
PDF