Abstract
High utility itemset mining is a well-studied data mining task for analyzing customer transactions. The goal is to find all high utility itemsets, that is items purchased together that generate a profit equal to or greater than a user-defined minimum utility threshold. However, a limitation of traditional high utility itemset mining algorithms is that item categories (e.g. drinks, dairy products) are ignored. Recently, two algorithms were designed to find multi-level and cross-level high utility itemsets to reveal relationships between items and/or categories of items. This is achieved by considering a product taxonomy, where items are organized into a hierarchy. Though these algorithms can reveal interesting patterns, a problem is that setting the minimum utility threshold is not intuitive and greatly influences the number of patterns found and the algorithms' performance. If the user sets the threshold too low, a huge number of patterns is found and runtimes can be very long, while if the threshold is set too high, few patterns are found. Hence, a user often have to run an algorithm numerous times to find an appropriate threshold value to obtain just enough patterns. This paper addresses this issue by presenting a novel algorithm called TKC (Top-K Cross-level high utility itemset miner), which let the user directly set the number of patterns k to be discovered. TKC performs a depth-first search and include search space pruning techniques and an optimization to enhance its performance. Experiments were done on retail data with taxonomy information. Results indicate that the algorithm is efficient and the optimization improves its performance.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 |
| Editors | Giuseppe Di Fatta, Victor Sheng, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu |
| Publisher | IEEE Computer Society |
| Pages | 673-682 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781728190129 |
| DOIs | |
| State | Published - Nov 2020 |
| Externally published | Yes |
| Event | 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 - Virtual, Sorrento, Italy Duration: 17 Nov 2020 → 20 Nov 2020 |
Publication series
| Name | IEEE International Conference on Data Mining Workshops, ICDMW |
|---|---|
| Volume | 2020-November |
| ISSN (Print) | 2375-9232 |
| ISSN (Electronic) | 2375-9259 |
Conference
| Conference | 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 |
|---|---|
| Country/Territory | Italy |
| City | Virtual, Sorrento |
| Period | 17/11/20 → 20/11/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- cross-level high utility itemsets
- hierarchy
- high utility itemsets
- taxonomy
ASJC Scopus subject areas
- Software
- Computer Science Applications
Fingerprint
Dive into the research topics of 'TKC: Mining Top-K Cross-Level High Utility Itemsets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver