Guodong Jin

PhD candidate in CS at Renmin University of China.

Bio. I am a second year Ph.D. candidate in the DBIIR Lab at Renmin University of China (RUC) working on database systems. I am supervised by Professor Yueguo Chen. I earned my bachelor's degree from Sichuan University in the year of 2015, then I joined the successive master-doctor program of RUC, and spent two years as a master student in computer science. During these two years, I investigated Hadoop and popular SQL-on-Hadoop systems, such as Hive, Spark SQL and Presto, and I built Paraflow.

My current work investigates how to build an adaptive column store for big tables, including adaptive physical layout optimization and columnar caching. In the future, I plan to continue my research with a focus on investigating cross-model storage and computation integrating big relational tables and large dynamic graphs.

Besides daily researches, I enjoy coding and playing with open source projects.

Moreover, I'm the founder and co-organizer of DBIIR Weekly Meetup.

RESEARCH AGENDA

The challenge of Big Data has shifted the design of data analytical systems from single machines to large-scale distributed systems. My research focuses on key techniques of big data analytics to improve the performance of distributed analytical systems over big data.

Nowadays, many big data analysis systems share HDFS (Hadoop Distributed File System) as their common underlying storage, and relational tables are stored as columnar files to speed up query executions. The physical layouts of columnar files play a fundamental and critical role in system I/O performance, which is critical to the performance of existing analytical systems on HDFS. My current work investigates how to optimize physical layouts of columnar files adaptive to various workloads and storage devices.

In the future, I plan to continue my research in the field of big data analytics, with a foucus on building analytical systems to support cross-model storage and computation integrating big relational tables and large dynamic graphs, and exploit potential benefits of emerging new hardwares (such as NVM, GPU and FPGA). In my dissertation work, I hope to build an open source analytical system which is optimized efficiently and ready to be used.

PUBLICATIONS

Towards Real-Time Analysis of ID-Associated Data. International Conference on Conceptual Modeling (Demonstration Track), Oct 2018.
Guodong JIN, Yixuan Wang, Xiongpai QIN, Yueguo CHEN, Xiaoyong DU. [paper] [poster]
Rainbow: Adaptive Layout Optimization for Wide Tables. IICDE, International Conference on Data Engineering (Demonstrantion Track), Apr 2018.
Haoqiong BIAN, Youxian TAO, Guodong JIN, Yueguo CHEN, Xiongpai QIN, Xiaoyong DU. [paper] [poster] [code]
Entity Fiber based Partitioning, No Loss Staging and Fast Loading of Log Data. PDCAT, Parallel and Distributed Computing, Applications and Technologies, Dec 2016.
Xiongpai QIN, Yueguo CHEN, Guodong JIN, Yang LIU, Yiming CONG, Xiaoyong DU. [paper] [code]
No Loss Staging and Fast Loading of Log Data (Written in Chinese). NDBC'16 Demo
Xiongpai QIN, Guodong JIN, Yang LIU, Yiming CONG, Xiaoyong DU. [code]

PROJECTS

Pixels. A flexible column storage format with adaptive optimization techniques embedded.
This project will open source soon.
Rainbow. A data layout optimization framework for wide tables stored on HDFS.

Paraflow. A real-time analytical system for ID-associated data.

Pard. A parallel database running like a leopard. This is a course project of Distributed Database Systems.
This project is under active development.
Claims. A distributed in-memory database system, which I was involved during my internship at InfoSys Bangalore.

SKILLS

Good understandings of Java as a system developing language.

Familiar with source code of Facebook Presto and Apache ORC

Good communication skills both in English (TOEFL 103) and Mandarin Chinese.

Good at cooking, still improving.

Good team work spirit and considerable system design experiences.

TA

Principles and Design of Database System (for graduate students). 2017.09 - 2018.01
A hard-core course on the principles of database systems for graduate students. During the course, each group of students is requried to implement a toy DBMS.
The Practice of Programming (for undergraduate students). 2016.09 - 2017.01
A startup course for undergraduate students to learn about programming languages and practice them! Javascript and PHP are covered.

NEWS

Nov 2018: Great talk by Feng Zhang at Weekly Meetup 3!

Nov 2018: Weekly Meetup 2 is ON! Jingru Yang gave an excellent talk on her newly accepted VLDB paper!

Nov 2018: Weekly Meetup 1 is ON! Great talk by Jun Chen, sharing lots of researching experience with us.

Nov 2018: Our first meetup is organized successfully. Thanks to our organizing commitee members!

Oct 2018: I'm demonstrating Towards Real-Time Analysis of ID-Associated Data at ER 2018 (Xi'an, China).

Oct 2018: Excited to attend NDBC'2018 at Dalian.

Apr 2018: I'm demonstrating Rainbow: Adaptive Layout Optimization for Wide Tables at ICDE 2018 (Paris France).

Sep 2017: I'm TA'ing the Principles and Design of Database System (for graduate students). Working hard!

Jul 2017: We started a new project called Pixels.

Jul 2017: Attending Strata Beijing 2017. Happy to meet new friends!

Dec 2016: I'm joining InfoStep (at Bangalore, India), an internship program hosted by InfoSys, to develop the distributed in-memory database system (called Claims). Three months in India!

Aug 2016: We got a seventeen position in the second round of Midleware Development Performance Challenge hosted by Alibaba Group.
The contest requires to develop a system from scratch (without any other library dependencies except for JDK) to load and support queries over 100GB relational dataset as efficiently as possible on a single cheap server with only 4GB memroy. And Java is the only choice as the programming language.
Aug 2016: I'm attending Strata+Hadoop World Beijing. Excited to meet Doug Cutting, and learn about excellent open source projects.