This is me

Naiqing Guan

PhD Student

University of Toronto

gnaiqing@gmail.com


About Myself

Hello, I am a PhD student in the computer science department of University of Toronto. My supervisor is Prof. Nick Koudas. I conduct research in the areas of data management, big data analysis and applied machine learning. In particular, I am interested in applying mathematical models to analyze data management problems arise in machine learning pipelines. You can find a list of my publications here.

I received B.S. degree from Peking University with honor in 2020. During my undergraduate years, I was fortunate to cooperate with Prof. Yun Liang, Prof. Wenfei Fan and Prof. Lei Zou.

In my spare time, I like reading novels and traveling. My favorite writer is Italo Calvino, among his works I enjoy The Baron in the Trees most.


Projects

Robust Data Programming
Nov. 2021 - Current

Current machine learning models are powerful, but they require massive training data sets, which are labour costly to get by manual labeling. Recently, there is an emerging framework named data programming, which let users write program snippets (called label functions) to label data automatically, and leverage ground truth inference techniques to aggregate dirty labels. I am working on improving the framework by improving the robustness of the data programming framework.


Effective Accuracy Estimation for Machine Learning Pipelines
Mar 2021 - Nov 2021

When a Machine Learning model is deployed in practice, it is important to evaluate its accuracy to make sure it works as expected. However we usually do not have labels at hand for serving data, so we may direct the predictions to humans to assess whether they are correct or not. In this project we designed an effective sampling method for selecting predictions to be assessed by humans, and a novel estimator for evaluating model accuracy. Our paper has been accepted by SIGMOD 2022. Paper and code will be available soon.


Tensor Dataflow Modeling Framework Based on Relation-centric Notation
Jan 2020 - Aug 2020

Accelerating tensor applications on spatial architectures provides high perfrmance and energy efficiency, but requires accurate performance models for evaluating various dataflow alternatives. In this project we propose a framework TENET that models hardware dataflow of tensor applications based on a novel notation named relation-centric notation. The relation-centric notation is more expressive than previous notations and support accurate metrics estimation. Our paper has been accepted by ISCA 2021.   [Paper Link] [Code Link]


Education

University of Toronto
2020 - 2025(expected)

PhD student - Computer Science


Peking Univerisity
2016 - 2020

Bachelor Degree - Computer Science (with honor)