顾及背景知识的多事件序列关联规则挖掘方法

何占军; 邓敏; 蔡建南; 刘启亮

doi:10.13203/j.whugis20150616

顾及背景知识的多事件序列关联规则挖掘方法

A Context-Based Association Rules Mining Method for Multiple Event Sequences

摘要

摘要: 事件序列关联规则挖掘旨在发现序列中不同事件在邻近时间域内的相互依赖关系，对于理解事件间的交互作用机制具有重要意义。然而，当前事件序列关联规则挖掘方法忽略了序列中事件的分布特征，支持度与置信度阈值参数设置困难，进而造成了挖掘结果的冗余或遗漏问题。充分考虑序列中事件的固有分布特征，定义了新的规则度量指标，并给出了一种顾及背景知识的多事件序列关联规则挖掘算法。实验结果表明，与当前经典的MOWCATL算法比较，此方法挖掘结果更加准确，且规则度量指标间的一致性更好，可有效改善挖掘规则冗余或遗漏问题。应用此方法对2013年冬季北京市PM_2.5浓度与气象因素的多序列进行挖掘，发现PM_2.5浓度与空气相对湿度的联系最为紧密，高湿、低温和弱风环境最容易导致高浓度PM_2.5的形成。

Abstract: Association rules mining of event sequences aims to discover interesting patterns of different neighboring events and plays an important role in understanding their mutual relationship. However, for most existing methods, the distribution characters of events in the sequences are usually ignored and selecting proper thresholds is really a tough task, which brings about the problems of redundant results or interesting rules missing. Thus, new measuring indexes were defined and a context-based method for multiple event sequences mining was proposed. Results of both the simulated experiment and practical cases emphasized that the proposed method could effectively reduce the redundancy in the results in comparison with the classic MOWCATL method. Moreover, there was good consistency between the measuring indexes, which eases the selection of generated rules. Finally, the proposed method was applied to mine association rules between and PM_2.5 concentration and several meteorological factors. Results indicated that the most associated meteorological factor with PM_2.5 concentration was the humidity and an eligible environment for high PM_2.5 concentration were high humidity, low temperature and weak winds.

HTML全文

参考文献(21)

施引文献

资源附件(0)