[关键词]
[摘要]
针对大数据中频繁项集挖掘方法存在的问题,提出一种基于fg-growth算法的大数据频繁项集挖掘方法。该方法首先使用最大最小标准化线性函数,对大数据进行离散化、归一化、异常值检测等相关性处理;然后,基于贪心策略均衡节点负载和通信量,完成节点优化;最后,经过数据库和 TID 表格之间的转化得到高频项集数据库用于数据供给,以确定最大合并候选项目阶次和产生候选项目,并对其进行分类累加,将结果与支持度之间进行判断,从而得到高频项集。实验结果表明:在相同实验条件下,设计方法的 F1值、可扩展性以及算法运行时间均优于传统方法。
[Key word]
[Abstract]
Aimming at the problems of the frequent item sets mining method in big data, a frequent item sets mining method for big data based on fg-growth algorithm is proposed. Firstly, the maximum and minimum standardized linear function is used to carry out correlation processing such as discretization, normalization and outlier detection for big data. Then, the load and traffic of nodes are balanced based on greedy strategy to complete node optimization. Finally, through the conversion between the database and the TID table, the high-frequency item sets database is obtained for data supply, so as to determine the maximum merged candidate items, generate candidate items, classify and accumulate them, with the judgment between results and support to obtain the highfrequency item sets. The experimental results show that under the same experimental conditions, the F1 value, scalability and algorithm running time of the designed method are better than those of traditional method.
[中图分类号]
TP301
[基金项目]