Professor
Research Area: Database Systems, Big Data Analytics
Open Positions: If you are a highly motivated student and interested in working with me on big data analytics and crowdsourcing, please email me your resume and a short statement of your research interests, with the following subject:
[PhD/Master/Visiting Application] Name+Major+School
欢迎感兴趣的同学加入DBAI研究小组:小组目前的研究重点是数据准备以及更广泛的数据治理技术与系统,请参阅DBAI小组介绍
My research interest is on AI4DB, with a special focus on developing deep learning algorithms and
systems for Data Preparation.Data preparation, which is the process of turning big data into good
data,is a crucial step of data science and machine learning. A well-known statistic is that data scientists
spend at least 80% of their time on data preparation. Recently, there has been a lot of studies on data
preparation using deep learning models.Please refer to the slides for more details.
Jan 2, 2025
The deep learning-based database testing and query generation research project has been selected as an excellent case in the first CCF Industry-Academia Cooperation Fund project evaluation.
Dec 28, 2024
Professor Ju Fan Re-elected as Vice Chairman of YOCSEF AC.
Nov 11, 2024
One paper “Automatic Database Configuration Debugging using Retrieval-Augmented Language Models” has been accepted by SIGMOD 2025.
2024
- Controllable Tabular Data Synthesis Using Diffusion Models.
Tongyu Liu, Ju Fan*, Nan Tang, Guoliang Li, Xiaoyong Du
SIGMOD 2024
- Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations.
Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri
SIGMOD 2024
- CodeS: Towards Building Open-source Language Models for Text-to-SQL.
Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen
SIGMOD 2024
- MisDetect: Iterative Mislabel Detection using Early Loss.
Chengliang Chai, Lei Cao, Nan Tang, Jiayi Wang, Ju Fan, Ye Yuan, Guoren Wang
VLDB 2024
- Improving Graph Compression for Efficient Resource-Constrained Graph Analytics.
Qian Xu, Juan Yang, Feng Zhang, Zheng Chen, Jiawei Guan, Kang Chen, Ju Fan, Youren Shen, Ke Yang, Yu Zhang, Xiaoyong Du
VLDB 2024
- Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL.
Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, Nan Tang
VLDB 2024
- Unicorn: A Unified Multi-Tasking Matching Model.
Ju Fan, Jianhong Tu, Guoliang Li, Peng Wang, Xiaoyong Du, Xiaofeng Jia, Song Gao, Nan Tang
SIGMOD 2024
- Tabular data synthesis with generative adversarial networks: design space and optimizations.
Tongyu Liu, Ju Fan*, Guoliang Li, Nan Tang, Xiaoyong Du
VLDB Journal 2024
- DINGO: Towards Diverse and Fine-Grained Instruction-Following Evaluation.
Zihui Gu, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Chengzhong Xu, Ju Fan*
AAAI 2024
- VerifAI: Verified Generative AI
Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Y. Halevy
CIDR 2024
- Representation Learning for Entity Alignment in Knowledge Graph: A Design Space Exploration.
Peng Huang, Meihui Zhang, Ziyue Zhong, Chengliang Chai, Ju Fan
ICDE 2024
- Mitigating Data Scarcity in Supervised Machine Learning Through Reinforcement Learning Guided Data Generation.
Chengliang Chai, Kaisen Jin, Nan Tang, Ju Fan*, Lianpeng Qiao, Yuping Wang, Yuyu Luo, Ye Yuan, Guoren Wang
ICDE 2024
- A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data.
Xu Yang, Meihui Zhang, Ju Fan, Zeyu Luo, Yuxin Yang
ICDE 2024
- IDE: A System for Iterative Mislabel Detection.
Yuhao Deng, Deng Qiyan, Chengliang Chai, Lei Cao, Nan Tang, Ju Fan, Jiayi Wang, Ye Yuan, Guoren Wang,2
SIGMOD 2024
- Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration.
Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du
ICDE 2024
- ChatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions.
Sibei Chen, Hanbing Liu, Waiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan*, Xiaoyong Du, Nan Tang
SIGMOD 2024
2023
- Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration.
Jianhong Tu, Ju Fan*, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao
SIGMOD 2023
- HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation.
Sibei Chen, Nan Tang, Ju Fan*, Xuemi Yan, Chengliang Chai, Guoliang Li, Xiaoyong Du
SIGMOD 2023
- Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning.
Zihui Gu, Ju Fan*, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, Xiaoyong Du
SIGMOD 2023
- GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data.
Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao, Jiayi Wang, Yuyu Luo, Guoliang Li
SIGMOD 2023
- Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes.
Zui Chen, Zihui Gu, Lei Cao, Ju Fan, Samuel Madden, Nan Tang
CIDR 2023
- Demystifying Artificial Intelligence for Data Preparation.
Chengliang Chai, Nan Tang, Ju Fan*, Yuyu Luo
SIGMOD 2023
- Pay "Attention" to Chart Images for What You Read on Text.
Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, Xiaoman Zhao, Ju Fan, Xiaoyong Du
SIGMOD 2023
2022
- DADER: Hands-Off Entity Resolution with Domain Adaptation.
Jianhong Tu, Xiaoyue Han, Ju Fan*, Nan Tang, Chengliang Chai, Guoliang Li, Xiaoyong Du
VLDB 2022
- PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training.
Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du
EMNLP 2022
- Semantics Driven Embedding Learning for Effective Entity Alignment.
Ziyue Zhong, Meihui Zhang, Ju Fan, Chenxiao Dou
ICDE 2022
- Local Clustering over Labeled Graphs: An Index-Free Approach.
Yudong Niu, Yuchen Li, Ju Fan, Zhifeng Bao
ICDE 2022
- Domain Adaptation for Deep Entity Resolution.
Jianhong Tu, Ju Fan*, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, Xiaoyong Du
SIGMOD 2022
- OpenTFV: An Open Domain Table-Based Fact Verification System.
Zihui Gu, Ruixue Fan, Xiaoman Zhao, Meihui Zhang, Ju Fan, Xiaoyong Du
SIGMOD 2022
2021
- TFV: A Framework for Table-Based Fact Verification.
Mingke Chai, Zihui Gu, Xiaoman Zhao, Ju Fan, Xiaoyong Du
IEEE Data Engineering Bulletin, Volume 44.
- Adaptive Data Augmentation for Supervised Learning over Missing Data.
Tongyu Liu, Ju Fan*, Yinqing Luo, Nan Tang, Guoliang Li, Xiaoyong Du
VLDB 2021
- RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation.
Nan Tang, Ju Fan*, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li
VLDB 2021
- CrowdChart: Crowdsourced Data Extraction From Visualization Charts.
Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo
IEEE Transaction on Knowledge and Data Engineering (TKDE) 2021
- A survey of typical attributed graph queries.
Yanhao Wang, Yuchen Li, Ju Fan*, Chang Ye, Mingke Chai
World Wide Web 2021
- A Human-in-the-loop Approach to Social Behavioral Targeting.
Jingru Yang, Xiaoman Zhao, Ju Fan*, Gong Chen, Chong Peng, Sheng Yao, Xiaoyong Du
ICDE 2021
2020
- Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration.
Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, Xiaoyong Du
VLDB 2020
- A game-based framework for crowdsourced data labeling.
Jingru Yang, Ju Fan*, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du
VLDB Journal 2020
- Social Influence Does Matter: User Action Prediction for In-Feed Advertising.
Hongyang Wang, Qingfei Meng, Ju Fan*, Yuchen Li, Laizhong Cui, Xiaoman Zhao, Chong Peng, Gong Chen, Xiaoyong Du
AAAI 2020
- Crowdsourcing-based Data Extraction from Visualization Charts.
Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo
ICDE 2020
- BiANE: Bipartite Attributed Network Embedding.
Wentao Huang, Yuchen Li, Yuan Fang, Ju Fan*, Hongxia Yang
SIGIR 2020
- Online Health Information Seeking Using "#COVID-19 Patient Seeking Help" on Weibo in Wuhan, China: Descriptive Study.
Xiaoman Zhao, Ju Fan, Iccha Basnyat, Baijing Hu
Journal of Medical Internet Research (JMIR) 2020
2019
- Distribution-Aware Crowdsourced Entity Collection.
Ju Fan*, Zhewei Wei, Dongxiang Zhang, Jingru Yang, Xiaoyong Du
IEEE Transaction on Knowledge and Data Engineering (TKDE) 2019
- Maximizing Multifaceted Network Influence.
Yuchen Li, Ju Fan, George V. Ovchinnikov, Panagiotis Karras
ICDE 2019
- Crowdsourcing Database Systems: Overview and Challenges.
Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng
ICDE 2019
- CrowdGame: A Game-Based Crowdsourcing System for Cost-Effective Data Labeling.
Tongyu Liu, Jingru Yang, Ju Fan*, Zhewei Wei, Guoliang Li, Xiaoyong Du
SIGMOD 2019
2018
- Human-in-the-loop Rule Learning for Data Integration.
Ju Fan, Guoliang Li
IEEE Data Engineering Bulletin, Volume 41.
- Trajectory Simplification: An Experimental Study and Quality Analysis.
Dongxiang Zhang, Mengting Ding, Dingyu Yang, Yi Liu, Ju Fan, Heng Tao Shen
VLDB 2018
- CDB: A Crowd-Powered Database System.
Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, Jian Li, Yudian Zheng, Yuanbing Li, Xiang Yu, Xiaohang Zhang, Haitao Yuan
VLDB 2018
- Cost-Effective Data Annotation using Game-Based Crowdsourcing.
Jingru Yang, Ju Fan, Zhewei Wei, Guoliang Li, Tongyu Liu, Xiaoyong Du
VLDB 2018
- Influence Maximization on Social Graphs: A Survey.
Yuchen Li, Ju Fan*, Yanhao Wang, Kian-Lee Tan
IEEE Transaction on Knowledge and Data Engineering (TKDE) 2018
- Incentive-Based Entity Collection Using Crowdsourcing.
Chengliang Chai, Ju Fan*, Guoliang Li
ICDE 2018
- OCTOPUS: An Online Topic-Aware Influence Analysis System for Social Networks.
Ju Fan, Jiarong Qiu, Yuchen Li, Qingfei Meng, Dongxiang Zhang, Guoliang Li, Kian-Lee Tan, Xiaoyong Du
ICDE 2018
- Fine-grained Concept Linking using Neural Networks in Healthcare.
Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, Kee Yuan Ngiam, Beng Chin Ooi
SIGMOD 2018
2017
- Processing Long Queries Against Short Text: Top-k Advertisement Matching in News Stream Applications.
Dongxiang Zhang, Yuchen Li, Ju Fan, Lianli Gao, Fumin Shen, Heng Tao Shen
ACM Transactions on Information Systems (TOIS) 2017
- Discovering Your Selling Points: Personalized Social Influential Tags Exploration.
Yuchen Li, Ju Fan, Dongxiang Zhang, Kian-Lee Tan
SIGMOD 2017
- CDB: Optimizing Queries with Crowd-Based Selections and Joins.
Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, Jian Li, Yudian Zheng, Yuanbing Li, Xiang Yu, Xiaohang Zhang, Haitao Yuan
SIGMOD 2017
- Crowdsourced Data Management: Overview and Challenges.
Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng
SIGMOD 2017
2016
2015
- Online Topic-Aware Influence Maximization.
Shuo Chen, Ju Fan, Guoliang Li, Jianhua Feng, Kian-Lee Tan, Jinhui Tang
VLDB 2015
- CrowdOp: Query Optimization for Declarative Crowdsourcing Systems.
Ju Fan, Meihui Zhang, Stanley Kok, Meiyu Lu, Beng Chin Ooi
IEEE Transaction on Knowledge and Data Engineering (TKDE) 2015
- Competence-Based Song Recommendation: Matching Songs to One's Singing Skill.
Kuang Mao, Lidan Shou, Ju Fan, Gang Chen, Mohan S. Kankanhalli
IEEE Transactions on Multimedia (TMM) 2015
- iCrowd: An Adaptive Crowdsourcing Framework.
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-Lee Tan, Jianhua Feng
SIGMOD 2015
2014
- GEMINI: An Integrative Healthcare Analytics System.
Zheng Jye Ling, Quoc Trung Tran, Ju Fan, Gerald Choon Huat Koh, Thi Nguyen, Chuen Seng Tan, James Wei Luen Yip, Meihui Zhang
VLDB 2014
- A hybrid machine-crowdsourcing system for matching web tables.
Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, Meihui Zhang
ICDE 2014
- Song Recommendation for Social Singing Community.
Kuang Mao, Ju Fan, Lidan Shou, Gang Chen, Mohan S. Kankanhalli
ACM Multimedia 2014
2013
- A User-Friendly Patent Search Paradigm.
Yang Cao, Ju Fan, Guoliang Li
IEEE Transaction on Knowledge and Data Engineering (TKDE) 2013
- Crowdsourcing-Assisted Query Structure Interpretation.
Jun Han, Ju Fan, Lizhu Zhou
International Joint Conference on Artificial Intelligence (IJCAI) 2013
- TsingNUS: a location-based service system towards live city.
Guoliang Li, Nan Zhang, Ruicheng Zhong, Sitong Liu, Weihuang Huang, Ju Fan, Kian-Lee Tan, Lizhu Zhou, Jianhua Feng
SIGMOD 2013
2012
2011
- Associate Managing Editor of Data Science and Engineering
- Associate Editor of ICDE 2025 and ICDE 2024
- Proceedings co-chair of VLDB 2023 and VLDB 2024
- Web and Information Chair of SIGMOD 2021
- Program committee member in
- SIGMOD 2023, 2022, 2021, 2020
- VLDB 2026, 2025, 2024, 2023, 2021, 2020, 2018
- ICDE 2023, 2022, 2021
Research Awards
- CCF Industry-Academia Cooperation Fund Excellent Project Award, 2025
- World Artificial Intelligence Conference Youth Outstanding Paper Nomination Award, 2024
- ACM SIGMOD Research Highlight Award, 2024
- DataSpace Summit Excellent Technological Achievement Award, 2024
- CCF-Huawei Populus Grove Fund Excellent Project Award, 2023
- Best of SIGMOD Award, 2023
- ACM China Rising Star Award, 2018
- Sa Shixuan Outstanding Student Paper Award (NDBC), 2018
- CCF-Tencent Rhino-Bird Fund, 2017
Teaching Awards
- Outstanding Lecturer of Specialized (Public) Courses in Beijing Higher Education Institutions, 2024
- Baosteel Outstanding Teacher Award, 2023
- Excellent University Computer Science Teacher Award, 2023
- Renmin University of China Teaching Model Award, 2020
- Renmin University of China Undergraduate "Excellence in Course Teaching Award", 2020
- Second Prize of Renmin University of China Teaching Achievement Award, 2021
- First Prize for Outstanding Undergraduate Thesis (Design), 2022
Student Mentorship Awards
- Wu Yuzhang Scholarship, 2023
- Champion of the OceanBase Database Competition, 2021
- Meritorious Winner in the MCM, 2018
Service Awards
- Outstanding Service as Publication Editor, 2024
- EDistinguished Program Commitee Member, 2024
- Outstanding Meta-Reviewer for ICDE, 2024
RUC-DataLab