顾及背景知识的多事件序列关联规则挖掘方法

A Context-Based Association Rules Mining Method for Multiple Event Sequences

  • 摘要: 事件序列关联规则挖掘旨在发现序列中不同事件在邻近时间域内的相互依赖关系,对于理解事件间的交互作用机制具有重要意义。然而,当前事件序列关联规则挖掘方法忽略了序列中事件的分布特征,支持度与置信度阈值参数设置困难,进而造成了挖掘结果的冗余或遗漏问题。充分考虑序列中事件的固有分布特征,定义了新的规则度量指标,并给出了一种顾及背景知识的多事件序列关联规则挖掘算法。实验结果表明,与当前经典的MOWCATL算法比较,此方法挖掘结果更加准确,且规则度量指标间的一致性更好,可有效改善挖掘规则冗余或遗漏问题。应用此方法对2013年冬季北京市PM2.5浓度与气象因素的多序列进行挖掘,发现PM2.5浓度与空气相对湿度的联系最为紧密,高湿、低温和弱风环境最容易导致高浓度PM2.5的形成。

     

    Abstract: Association rules mining of event sequences aims to discover interesting patterns of different neighboring events and plays an important role in understanding their mutual relationship. However, for most existing methods, the distribution characters of events in the sequences are usually ignored and selecting proper thresholds is really a tough task, which brings about the problems of redundant results or interesting rules missing. Thus, new measuring indexes were defined and a context-based method for multiple event sequences mining was proposed. Results of both the simulated experiment and practical cases emphasized that the proposed method could effectively reduce the redundancy in the results in comparison with the classic MOWCATL method. Moreover, there was good consistency between the measuring indexes, which eases the selection of generated rules. Finally, the proposed method was applied to mine association rules between and PM2.5 concentration and several meteorological factors. Results indicated that the most associated meteorological factor with PM2.5 concentration was the humidity and an eligible environment for high PM2.5 concentration were high humidity, low temperature and weak winds.

     

/

返回文章
返回