|
|
Ju Fan
|
|
|
Professor |
|
|
|
|
Open Positions:
If you are a highly motivated student and interested in working with me on big data anlaytics and crowdsourcing, please email me your resume and a short statement of your research interests, with the following subject: [PhD/Master/Visiting Application] Name+Major+School.
欢迎感兴趣的同学加入DBAI研究小组:小组目前的研究重点是数据准备以及更广泛的数据治理技术与系统,请参阅DBAI小组介绍。
|
My research interest is on AI4DB, with a special focus on developing deep learning algorithms and systems for Data Preparation.
Data preparation, which is the process of turning big data into good data, is a crucial step of data science and machine learning. A well-known statistic is that data scientists spend at least 80% of their time on data preparation. Recently, there has been a lot of studies on data preparation using deep learning models.
Please refer to the slides for more details.
|
22 Sep, 2022
|
Three research papers, one tutorial paper and one demo paper have been accepted at SIGMOD 2023. These papers focus on developing deep learning based solutions for data preparation. Please find them below.
|
21 July, 2022
|
Two papers about Domain Adaptation for Entity Resolution accepted at SIGMOD 2022: "Domain Adaptation for Deep Entity Resolution." (research) and VLDB 2022 "DADER: Hands-Off Entity Resolution with Domain Adaptation" (demo). These two papers explore how to re-use multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset in zero-shot or few-shot settings!
|
21 Jun, 2022
|
I gave a talk, entitled "Tailoring Pretrained Transformer-Based Models for Data Preparation" at the Gray Systems Lab of Microsoft Research (GSL Lab).
Please find the slides here.
|
|
2023
|
-
Jianhong Tu, Ju Fan*, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao:
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration.
SIGMOD 2023.[paper]
[Github Repo]
-
Sibei Chen, Nan Tang, Ju Fan*, Xuemi Yan, Chengliang Chai, Guoliang Li, Xiaoyong Du:
HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation.
SIGMOD 2023.[paper]
[Github Repo]
-
Zihui Gu, Ju Fan*, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, Xiaoyong Du:
Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning.
SIGMOD 2023.[paper]
-
Chengliang Chai, Nan Tang, Ju Fan*, Yuyu Luo:
Demystifying Artificial Intelligence for Data Preparation.
SIGMOD 2023 (Tutorial).[paper]
-
Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du:
Pay “Attention” to Chart Images for What You Read on Text.
SIGMOD 2023 (Demo).[paper]
|
2022
|
-
Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du:
PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training.
EMNLP 2022. [paper]
[Github Repo]
-
Jianhong Tu, Ju Fan*, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, Xiaoyong Du:
Domain Adaptation for Deep Entity Resolution.
SIGMOD 2022. [paper]
[Github Repo]
[Video]
[Slides]
-
Jianhong Tu, Xiaoyue Han, Ju Fan*, Nan Tang, Chengliang Chai, Guoliang Li, Xiaoyong Du:
DADER: Hands-Off Entity Resolution with Domain Adaptation.
VLDB, 2022 (Demo). [paper]
[PyPI]
[Video]
[Datasets]
-
Zihui Gu, Ruixue Fan, Xiaoman Zhao, Meihui Zhang, Ju Fan, Xiaoyong Du:
OpenTFV: An Open Domain Table-Based Fact Verification System.
SIGMOD 2022 (Demo). [paper]
[Video]
[Slides]
-
Ziyue Zhong, Meihui Zhang, Ju Fan, Chenxiao Dou:
Semantic Driven Embedding Learning for Effective Entity Alignment.
ICDE 2022. [paper]
|
2021
|
-
Tongyu Liu, Ju Fan*, Yinqing Luo, Nan Tang, Guoliang Li, Xiaoyong Du:
Adaptive Data Augmentation for Supervised Learning over Missing Data.
VLDB 2021. [paper]
[Github Repo]
[Slides]
-
Nan Tang, Ju Fan*, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Samuel Madden, Mourad Ouzzani:
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation.
VLDB, 2021 (Vision). [paper]
[Slides]
-
Jingru Yang, Xiaoman Zhao, Ju Fan*, Gong Chen, Chong Peng, Sheng Yao, Xiaoyong Du:
A Human-in-the-loop Approach to Social Behavioral Targeting.
ICDE 2021. [paper]
-
Yanhao Wang, Yuchen Li, Ju Fan*, Chang Ye, Mingke Chai:
A survey of typical attributed graph queries.
World Wide Web 2021. [paper]
-
Mingke Chai, Zihui Gu, Xiaoman Zhao, Ju Fan, Xiaoyong Du:
TFV: A Framework for Table-Based Fact Verification.
IEEE Data Engineering Bulletin, Volume 44. [paper]
-
Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo:
CrowdChart: Crowdsourced Data Extraction From Visualization Charts.
IEEE Transaction on Knowledge and Data Engineering (TKDE). [paper]
|
2020
|
-
Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, Xiaoyong Du:
Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration.
VLDB 2020. [paper][video]
[Github Repo]
-
Xiaoman Zhao, Ju Fan, Iccha Basnyat, Baijing Hu:
Online Health Information Seeking Using "#COVID-19 Patient Seeking Help" on Weibo in Wuhan, China: Descriptive Study.
Journal of Medical Internet Research (JMIR) 2020. [paper] [1183位求助者的数据画像:不是弱者,而是你我]
-
Jingru Yang, Ju Fan*, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du:
A game-based framework for crowdsourced data labeling.
VLDB Journal, 2020. [paper]
-
Hongyang Wang, Qingfei Meng, Ju Fan*, Yuchen Li, Laizhong Cui, Xiaoman Zhao, Chong Peng, Gong Chen, Xiaoyong Du:
Social Influence Does Matter: User Action Prediction for In-Feed Advertising.
AAAI 2020. [paper] [code]
-
Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo:
Crowdsourcing-based Data Extraction from Visualization Charts.
ICDE 2020 (Demo). [paper]
-
Wentao Huang, Yuchen Li, Yuan Fang, Ju Fan*, Hongxia Yang:
BiANE: Bipartite Attributed Network Embedding.
SIGIR 2020. [paper] [slides] [code]
[中文介绍]
-
柴茗珂; 范举*; 杜小勇:
学习式数据库系统:挑战与机遇.
软件学报 2020. [paper]
|
2019
|
- Jingru Yang, Ju Fan*, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du:
Cost-Effective Data Annotation using Game-Based Crowdsourcing.
VLDB 2019. [paper][slides]
[datasets & code]
- Ju Fan, Zhewei Wei, Dongxiang Zhang, Jingru Yang, and Xiaoyong Du:
Distribution-Aware Crowdsourced Entity Collection.
IEEE Transaction on Knowledge and Data Engineering (TKDE), 2019. [paper]
- Tongyu Liu, Jingru Yang, Ju Fan*, Zhewei Wei, Guoliang Li, Xiaoyong Du:
CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling.
SIGMOD 2019 (Demo). [paper]
-
Yuchen Li, Ju Fan, George V. Ovchinnikov, Panagiotis Karras∂:
Maximizing Multifaceted Network Influence.
ICDE 2019. [paper] [slides]
-
Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng:
Crowdsourcing Database Systems: Overview and Challenges.
ICDE 2019. [paper]
|
2018
|
- Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng:
Crowd-Powered Data Mining.
KDD 2018. [website]
- Ju Fan, Guoliang Li:
Human-in-the-loop Rule Learning for Data Integration.
IEEE Data Engineering Bulletin, Volume 41. [paper]
- Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, K.Y. Ngiam, Beng Chin Ooi:
Fine-grained Concept Linking using Neural Networks in Healthcare.
ACM SIGMOD 2018. [paper]
- Yuchen Li, Ju Fan*, Yanhao Wang, Kian-Lee Tan:
Influence Maximization on Social Graphs: A Survey.
IEEE Transaction on Knowledge and Data Engineering (TKDE). [paper]
- Chengliang Chai, Ju Fan*, Guoliang Li:
Incentive-Based Entity Collection using Crowdsourcing.
34th IEEE International Conference on Data Engineering (ICDE), 2018.
[paper]
- Ju Fan, Jiarong Qiu, Yuchen Li, Qingfei Meng, Dongxiang Zhang, Guoliang Li, Kian-Lee Tan, Xiaoyong Du:
OCTOPUS: An Online Topic-Aware Influence Analysis System for Social Networks (demo).
34th IEEE International Conference on Data Engineering (ICDE), 2018.
[paper] [video]
|
2017
|
- Guoliang Li, Chengliang Chai, Ju Fan, et. al:
CDB: Optimizing Queries with Crowd-Based Selections and Joins.
ACM SIGMOD, 2017. [paper]
- Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng:
Crowdsourced Data Management: Overview and Challenges (tutorial).
ACM SIGMOD, 2017. [paper] [slides]
- Dongxiang Zhang, Yuchen Li, Ju Fan, Lianli Gao, Fumin Shen, Heng Tao Shen:
Processing Long Queries Against Short Text: Top-k Advertisement Matching in News Stream Applications.
ACM Transactions on Information Systems (TOIS). [paper]
- Yuchen Li, Ju Fan, Dongxiang Zhang, Kian-Lee Tan:
Discovering Your Selling Points: Personalized Social Influential Tag Exploration.
ACM SIGMOD, 2017. [paper]
|
2016
|
- Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, and Beng Chin Ooi:
CrowdOp: Query Optimization for Declarative Crowdsourcing Systems (Extended Abstract).
32nd IEEE International Conference on Data Engineering (ICDE), 2016.
[paper]
- Xiaojie Lin, Ye Gu, Rui Zhang, Ju Fan:
Linking News and Tweets.
ADC 2016: 467-470.
[paper]
|
2015
|
- Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, and Beng Chin Ooi:
CrowdOp: Query Optimization for Declarative Crowdsourcing Systems.
IEEE Transaction on Knowledge and Data Engineering (TKDE), 2015.
[paper]
- Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, Jianhua Feng:
iCrowd: An Adaptive Crowdsourcing Framework.
ACM SIGMOD, 2015: 1015-1030.
[paper] [slides]
- Shuo Chen, Ju Fan, Guoliang Li, Jianhua Feng:
Online Topic-Aware Influence Maximization.
41st International Conference on Very Large Data Bases (VLDB), 2015: 666-677.
[paper] [slides]
- Kuang Mao, Lidan Shou, Ju Fan, Gang Chen, Mohan S. Kankanhalli:
Competence-Based Song Recommendation: Matching Songs to One's Singing Skill.
IEEE Transactions on Multimedia (TMM) 17(3), 2015: 396-408.
[paper]
|
2014
|
- Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang:
A Hybrid Machine-Crowdsourcing System for Matching Web Tables.
30th IEEE International Conference on Data Engineering (ICDE), 2014: 976-987.
[paper] [slides]
- Kuang Mao, Ju Fan, Lidan Shou, Gang Chen and Mohan Kankanhalli:
Song Recommendation for Social Singing Community.
ACM Multimedia 2014: 127-136.
[paper]
- Zheng Jye Ling, Quoc Trung Tran, Ju Fan, Gerald C.H. Koh, Thi Nguyen, Chuen Seng Tan, James W. L. Yip, Meihui Zhang:
GEMINI: An Integrative Healthcare Analytics System.
40th International Conference on Very Large Data Bases (VLDB), 2014: 1766-1771.
[paper]
|
2013
|
- Jun Han, Ju Fan, Lizhu Zhou:
Crowdsourcing-Assisted Query Structure Interpretation.
23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013: 2092- 2098.
[paper]
- Yang Cao, Ju Fan, Guoliang Li:
A User-Friendly Patent Search Paradigm.
IEEE Transaction on Knowledge and Data Engineering (TKDE) 25(6), 2013: 1439-1443.
[paper]
- Guoliang Li, Nan Zhang, Ruicheng Zhong, Sitong Liu, Weihuang Huang, Ju Fan, Kian-Lee Tan, Lizhu Zhou, Jianhua Feng:
TsingNUS: a location-based service system towards live city.
ACM SIGMOD 2013:957-960 (Demo).
[paper]
|
2012
|
- Ju Fan, Guoliang Li, Lizhu Zhou, Shanshan Chen, Jun Hu:
SEAL: Spatio-Textual Similarity Search.
38th International Conference on Very Large Data Bases (VLDB), 5(9), 2012: 824-835.
[paper]
- Ruicheng Zhong, Ju Fan, Guoliang Li, Kian-Lee Tan, Lizhu Zhou:
Location-aware Instant Search.
21st ACM International Conference on Information and Knowledge Management (CIKM), 2012:385-394.
[paper]
|
2011
|
- Ju Fan, Guoliang Li, and Lizhu Zhou:
Interactive SQL Query Suggestion: Making Databases User-Friendly.
27th IEEE International Conference on Data Engineering (ICDE), 2011: 351-362.
[paper]
- Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, and Jianhua Feng:
DBease: Making Databases User-Friendly and Easily Accessible.
5th Biennial Conference on Innovative Data Systems Research (CIDR), 2011: 45-56.
[paper]
- Ju Fan, Guoliang Li, and Lizhu Zhou:
An Effective Approach for Searching Closest Sentence Translations from The Web.
Database Systems for Advanced Applications (DASFAA), 2011: 47-57.
[paper]
|
|
-
Human-in-the-Loop Data Preparation,
Symposium on Frontiers in Database (SiftDB), May 29 2021. [slides]
-
Synthetic Data Generation: Challenges and Techniques, Nov 28 2020. [slides]
-
Influence Maximization on Big Social Graphs,
Tsinghua University, Jan 7 2017. [slides]
|
- Associate Managing Editor of Data Science and Engineering
- Proceedings co-chair of VLDB 2023
-
Web and information chair of SIGMOD 2021
- Program committee member in
- SIGMOD 2023, 2022, 2021, 2020
-
VLDB 2023, 2021, 2020, 2018
-
ICDE 2023, 2022, 2021
-
KDD 2022, 2021, 2020, 2019
|
- Introduction to Data Science
- Distributed Database Systems
- Introduction to Programming
|
|
|