English
实验室首页 成员信息 项目小组 研究成果 论文征稿 课程教学 相关资源
微软项目
数据库小组
数据挖掘小组
知识网格
XML小组
本体与语义web服务小组
 

Story Teller: Detecting and Tracking Hot Topics to Enhance Search Engine Performance

Demo Video

Introduction

In this project, we focus on developing a method to detect and track hot topics from click-through data so that the performance of current search engines could be enhanced. Detecting and tracking hot topics from click-through data effectively and efficiently is a very challenging task because the click-through dataset is large, noisy and sparse. Existing methods that perform clustering on the whole dataset suffer from efficiency problem, which make it difficult to incorporate it in current web search engine system. In this project, we study how to extract topic from data in a local style way and then detect events from each topic and infer relationships between events to draw an evolution graph efficiently.
Motivation

Detecting topics from click-through data is attracting many researchers’ attention recently because the data not only reflect web content but also reflect user’s activity. However, the click-through data is huge, sparse and very noisy, which makes the task very challenging. In this project, we propose a novel Local Expansion method for Topic Detection (LETD). The major steps of this method are illustrated in Fig 1. In this algorithm, we first develop an importance measure to choose URLs that are most likely to describe a real life topic, and then from these URLs we use the local expansion method to sift other topic related URLs, considering both link and content information. To overcome the sparseness problem, we define a new concept, context, to describe a URL. The local expansion method uses a divide-and-conquer strategy to analyze the huge date. Therefore, comparing with existing global style clustering method it is more efficient and more flexible. We implemented the algorithm and conducted experiments on the click-through data provided by Microsoft. The results demonstrate our algorithm’s efficiency and effectiveness. To facilitate the topic detection process, we also studied similarity computation method. We proposed a novel way to compute similarity score for a pair of nodes based on link information efficiently.

Fig. 1 Major steps of method LETD for topic and event detection

Research Work
Based on the hot topics and events detected and the evolution graph of events, we could improve the performance of web search engine, and enhance users’ search experience. To show the application scenarios, we implemented a prototype of a storyTeller system. In this system, when a user submits a query related to a hot topic, we can recommend queries by presenting an event evolution graph of the topic. In this graph, related web pages are clustered into nodes. In this way, users not only can have a global picture of how the event is developing, but also can browse the well-organized web pages by clicking any event node in the graph. The major functions of this system include five major parts: hot topic detection, event detection, Event Evolution Graph Discovery, Hot Query Ranking and Visualization, and query related topic search. The major process of this system is shown in Fig 2.

Fig. 2 Major processes of StoryTeller

Key Techniques:

LETD Algorithm for Topic Detection
Analyzing Burst Pattern for Event Detection
Extracting Web Page Content for Summarization and Images Show
Events Relevance Calculation

Grant
Microsoft Research Asia IFP Funded Project

Publications

1)Pei Li, Hongyan Liu, Jeffrey Yu, Jun He, Xiaoyong Du. Fast Single-Pair SimRank Computation. SIAM International conf erence on Data Mining (SDM 2010). April 29?May 1, 2010. Columbus, Ohio, USA. (accepted, Best Paper Candidates)
2)Yingqin Gu, Jianwei Cui, Hongyan Liu, Xuan Jiang, Jun He, Xiaoyong Du, Mengxia Jiang, Zhixu Li. Detecting Hot Events from Web Search Logs. (Submitted to WAIM2010)
3)StoryTeller: A System with Hot Topic and Event Detection Capability based on Click-through Data (Poster). (Submitted to SIGIR 2010)
4)Detecting Topics from Click-through Data Efficiently by Local Expansion. (Submitted to SIGKDD 2010)

Project Directors

Prof. Xiaoyong Du, Principal Investigator, School of Information, Renmin University
Prof. Xiaoyong Du, Principal Investigator, School of Information, Renmin University
Prof. Jun He, Director, School of Information, Renmin University

Member

Cui Jianwei,MSc
Gu Yingqin,MSc
Jiang Xuan,MSc
Li Pei,MSc
Zhixu Li (MSc,Graduated)

Copyright © RUC DB-IIR Lab, All rights reserved.