一种COVID-19病例个体时空轨迹交互式提取与质量评估方法

An Interactive Individual Spatiotemporal Trajectory Extraction and Quality Evaluation Method for COVID-19 Cases

  • 摘要: 针对当前新型冠状病毒肺炎(coronavirus disease 2019,COVID-19)病例个体时空轨迹描述文本高度非结构化的特点,提出了一种基于自然语言处理(natural language processing, NLP)辅助的交互式轨迹提取方法,用于提高轨迹提取的效率和质量。设计了交互式轨迹提取和质量评估流程,研究并实现了地址分割与组合算法、轨迹质量评估算法等关键技术。以黑龙江本土COVID-19聚集病例为例,通过轨迹提取效率和质量对比实验,验证了该方法的有效性和实用性。实验结果表明,与无NLP辅助的提取方法相比,该方法的轨迹提取效率得到了显著提升;同时,依据轨迹定量可信度评价结果,人机交互式的提取方法还可有效解决算法轨迹自动提取中存在的轨迹点遗漏、位置错误等问题。

     

    Abstract: Since the coronavirus disease 2019 (COVID-19) epidemic was kept under control in China, to conduct scientific research on the patterns of the virus transmission has become essential in terms of disease control. Therefore, the demand for the precise and structured trajectory of the individual cases is increasing. While considering the highly unstructured characteristics of the spatiotemporal trajectory source string retrieved from the official website, it is difficult to obtain a precise trajectory efficiently by either hand-crafted method or an automated algorithm. To address the above contradiction of efficiency and precision in trajectory extraction, a human-computer interactive (HCI) trajectory extraction and validation approach was proposed based on natural language processing (NLP) artificial intelligence algorithm, the source string was firstly analyzed by NLP, and coarse trajectories were then identified and extracted automatically, then the trajectories were confirmed or edited by user, after that other user will validate those trajectories whether correct or not by voting. The essential technologies of the approach were also investigated, including trajectory location segmentation and combination algorithm, trajectory quality evaluation algorithm, and trajectory extraction and validation workflow. A comparative experiment that takes the Harbin native clustered cases during April as a study case was conducted to evaluate the effectiveness and practicability of the proposed approach. The results show that the efficiency of the proposed approach is significantly improved one time more than the extraction method without NLP. The evaluation results of the trajectory credibility also suggest that the HCI extraction method can effectively reduce 26.34% of missing locations and wrong positioning of the trajectory automatically extracted by NLP alone. Furthermore, the validation results also suggest that there are 92.63% trajectories were assessed to be reliable, and those incorrect trajectory nodes were mainly created by the NLP algorithm rather than the hand-crafted method. According to the experimental result, our proposed approach can improve the efficiency and quality of trajectories extraction effectively. Apart from that, our prototype system can also be used as a potential tool for epidemiological investigations to assist doctors or patients.

     

/

返回文章
返回