Big Data Analytics @DBIIR

Get what you want from big data.

概览

相关研究工作:(1)众包大数据管理与分析 (2)社交广告大数据分析

团队老师主页:范举   杜小勇

电子邮箱:fanj@ruc.edu.cn

学生成员:

  • 博士生:董兆安 杨婧如
  • 硕士生:孟庆飞 汪弘洋 韩涵 黄文韬 柴茗珂
  • 本科生:刘同禹 张真苗 费楠益 刘怡灿

当前研究课题

系统开发

  • Octopus
    Octopus系统为社交网络用户和研究者提供了一些有价值的社交影响力分析服务。它有如下的创新点。
    1.Octopus系统构建了用户友好的交互界面,允许用户通过直观的关键词对影响力进行探究。
    2.Octopus系统提供了三个强大的影响力分析功能:基于关键词的影响力用户挖掘,个性化影响力关键词推荐和交互式影响路径探究。在分析具有影响力用户的同时可以直观地展示影响力在社交网络的传播过程。
    3.Octopus保证了在线查询的实时性,满足终端用户的实时查询需求。
    基于以上几点,我们实现并部署了Octopus系统,并且在ACM引用数据集和QQ群数据集上证明它的效率和效果。
    系统架构:
  • 基于群体智能的规则学习方法
    由于良好的可解释性以及可以进行高效的交互式调试,基于规则的数据集成方法正被越来越多的人所接受。 然而,在数据集成时生成高质量的规则充满挑战性。由领域专家手工生成的规则十分可靠,但是并不具备可扩展性。 如果需要覆盖尽可能全面的数据,需要耗费大量的人力和时间来生成规则。 另一方面,弱监督规则的自动生成(如远程监督规则),可以覆盖尽可能多的数据;然而,这会带来很多噪音并且引入许多错误的结果。
    为了解决这个问题,本项目提出了基于群体智能的规则学习方法,来学习高覆盖率且高质量的规则。 这种方法首先生成一系列候选规则,并利用生成对抗网络(GAN)对每条规则学习一个置信度, 之后本项目使用一个基于博弈的众包框架来提炼规则,并且开发了一种受预算约束的众包算法,在可以负担的成本之内来进行规则提纯。 最后,程序利用规则来生成一系列高质量的数据集成结果。

科研成果

  • Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng:
    Crowd-Powered Data Mining.
    KDD 2018. [website]
  • Ju Fan, Guoliang Li:
    Human-in-the-loop Rule Learning for Data Integration.
    IEEE Data Engineering Bulletin, Volume 41. [paper]
  • Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, K.Y. Ngiam, Beng Chin Ooi:
    Fine-grained Concept Linking using Neural Networks in Healthcare.
    ACM SIGMOD 2018. [paper]
  • Yuchen Li, Ju Fan*, Yanhao Wang, Kian-Lee Tan:
    Influence Maximization on Social Graphs: A Survey.
    IEEE Transaction on Knowledge and Data Engineering (TKDE), to appear. [paper]
  • Chengliang Chai, Ju Fan*, Guoliang Li:
    Incentive-Based Entity Collection using Crowdsourcing.
    34th IEEE International Conference on Data Engineering (ICDE), 2018. [paper]
  • Ju Fan, Jiarong Qiu, Yuchen Li, Qingfei Meng, Dongxiang Zhang, Guoliang Li, Kian-Lee Tan, Xiaoyong Du:
    OCTOPUS: An Online Topic-Aware Influence Analysis System for Social Networks (demo).
    34th IEEE International Conference on Data Engineering (ICDE), 2018. [paper] [video]
  • Guoliang Li, Chengliang Chai, Ju Fan, et. al:
    CDB: Optimizing Queries with Crowd-Based Selections and Joins.
    ACM SIGMOD, 2017, to appear. [paper]
  • Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng:
    Crowdsourced Data Management: Overview and Challenges (tutorial).
    ACM SIGMOD, 2017. [paper] [slides]
  • Dongxiang Zhang, Yuchen Li, Ju Fan, Lianli Gao, Fumin Shen, Heng Tao Shen:
    Processing Long Queries Against Short Text: Top-k Advertisement Matching in News Stream Applications.
    ACM Transactions on Information Systems (TOIS), to appear. [paper]
  • Yuchen Li, Ju Fan, Dongxiang Zhang, Kian-Lee Tan:
    Discovering Your Selling Points: Personalized Social Influential Tag Exploration.
    ACM SIGMOD, 2017, to appear. [paper]
  • Ju Fan, Zhewei Wei, Dongxiang Zhang, Jingru Yang, and Xiaoyong Du:
    Distribution-Aware Crowdsourced Entity Collection.
    IEEE Transaction on Knowledge and Data Engineering (TKDE), to appear. [paper]
  • Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, and Beng Chin Ooi:
    CrowdOp: Query Optimization for Declarative Crowdsourcing Systems (Extended Abstract).
    32nd IEEE International Conference on Data Engineering (ICDE), 2016. [paper]
  • Xiaojie Lin, Ye Gu, Rui Zhang, Ju Fan:
    Linking News and Tweets.
  • ADC 2016: 467-470. [paper]
  • Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, and Beng Chin Ooi:
    CrowdOp: Query Optimization for Declarative Crowdsourcing Systems.
    IEEE Transaction on Knowledge and Data Engineering (TKDE), 2015. [paper]
  • Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, Jianhua Feng:
    iCrowd: An Adaptive Crowdsourcing Framework.
    ACM SIGMOD, 2015: 1015-1030. [paper] [slides]
  • Shuo Chen, Ju Fan, Guoliang Li, Jianhua Feng:
    Online Topic-Aware Influence Maximization.
    41st International Conference on Very Large Data Bases (VLDB), 2015: 666-677. [paper] [slides]
  • Kuang Mao, Lidan Shou, Ju Fan, Gang Chen, Mohan S. Kankanhalli:
    Competence-Based Song Recommendation: Matching Songs to One's Singing Skill.
    IEEE Transactions on Multimedia (TMM) 17(3), 2015: 396-408. [paper]
  • Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang:
    A Hybrid Machine-Crowdsourcing System for Matching Web Tables.
    30th IEEE International Conference on Data Engineering (ICDE), 2014: 976-987. [paper] [slides]
  • Kuang Mao, Ju Fan, Lidan Shou, Gang Chen and Mohan Kankanhalli:
    Song Recommendation for Social Singing Community.
    ACM Multimedia 2014: 127-136. [paper]
  • Zheng Jye Ling, Quoc Trung Tran, Ju Fan, Gerald C.H. Koh, Thi Nguyen, Chuen Seng Tan, James W. L. Yip, Meihui Zhang:
    GEMINI: An Integrative Healthcare Analytics System.
    40th International Conference on Very Large Data Bases (VLDB), 2014: 1766-1771. [paper]