Naiqing Guan
PhD Student
University of Toronto
gnaiqing@gmail.com
About Myself
Hello, I am a PhD student in the computer science department of University of Toronto.
My supervisor is Prof. Nick Koudas.
I conduct research in the areas of data management, big data analysis and applied machine learning.
In particular, I am interested in applying mathematical models to analyze
data management problems arise in machine learning pipelines. You can find a list of my publications
here.
I received B.S. degree from Peking University with honor in 2020. During my undergraduate years,
I was fortunate to cooperate with Prof. Yun Liang, Prof. Wenfei Fan
and Prof. Lei Zou.
In my spare time, I like reading novels and traveling. My favorite writer is Italo Calvino, among his works I enjoy The Baron in the Trees most.
Current machine learning models are powerful, but they require massive training data sets, which are labour costly to get by manual labeling. Recently, there is an emerging framework named data programming, which let users write program snippets (called label functions) to label data automatically, and leverage ground truth inference techniques to aggregate dirty labels. I am working on improving the framework by improving the robustness of the data programming framework.
When a Machine Learning model is deployed in practice, it is important to evaluate its accuracy to make sure it works as expected. However we usually do not have labels at hand for serving data, so we may direct the predictions to humans to assess whether they are correct or not. In this project we designed an effective sampling method for selecting predictions to be assessed by humans, and a novel estimator for evaluating model accuracy. Our paper has been accepted by SIGMOD 2022. Paper and code will be available soon.
Accelerating tensor applications on spatial architectures provides high perfrmance and energy efficiency, but requires accurate performance models for evaluating various dataflow alternatives. In this project we propose a framework TENET that models hardware dataflow of tensor applications based on a novel notation named relation-centric notation. The relation-centric notation is more expressive than previous notations and support accurate metrics estimation. Our paper has been accepted by ISCA 2021. [Paper Link] [Code Link]
PhD student - Computer Science
Bachelor Degree - Computer Science (with honor)