Kevyn's picture

Kevyn Collins-Thompson


Associate Professor of Information
Associate Professor of Electrical Engineering and Computer Science
University of Michigan
School of Information and College of Engineering (affiliate)
Phone: +1-734-615-2132
Fax: +1-734-615-3587
Email: kevynct AT umich . edu

Mailing Address:
School of Information
3344 North Quad
105 S. State Street
Ann Arbor, MI 48109-1285


I am an associate professor with tenure at the University of Michigan (Ann Arbor), with appointments in the School of Information and College of Engineering, Dept. of Electrical Engineering and Computer Science (affiliate, CSE Division). I am also an affiliate faculty member of the Artificial Intelligence Lab and the Michigan Institute for Data Science (MIDAS).

As of June 1, 2022 I am also the academic director of the Masters of Applied Data Science program, the largest online degree program at the University of Michigan.

Since 2019 I have been a regular visiting researcher at INRIA, France's national research institute for computer science, first in Bordeaux and then also Universite Cote d'Azur, SophiaTech campus

Spanning 20 years of academic and industry research, my work blends information retrieval, machine learning, natural language processing, and large-scale data mining to optimally connect people with information, especially to help them learn and discover.

If you're interested in joining my lab as a graduate student, please apply through the regular admissions process (in either Information or Computer Science) and mention my name and your connection to my lab's research areas in your graduate school application. Unfortunately, I cannot respond to individual emails.


Older news

Major contributions

Research directions

My general research approach focuses on human-centered scenarios in machine learning, like systems that can help improve a child's literacy skills. These human-centered problems are often not well-solved by traditional machine learning frameworks whose optimization objectives ignore (for example) human cognitive abilities in processing information. To address this, my collaborators and I formalize new types of learning optimization problems that can also account for important external goals, constraints or sources of difficulty or uncertainty, develop practical algorithms to solve the theoretical problem (approximately) that are tractable and deployable, and then conduct user lab studies or large-scale online experiments to assess the real-world success of the approach (and iterating if necessary, based on what we learned).

Example products of my research have included include search engines that can deliver the right kind of personalized information at the right time, and intelligent tutoring systems that learn when and how to be most helpful in teaching a particular student. Building effective, reliable systems like these will require new theoretical, algorithmic, and methodological advances in multiple research areas, including machine learning, optimization, information retrieval, and human-computer interaction. My current research is centered on education, but I'm also interested in mobile and health-related applications.

Brief bio

Before joining the University of Michigan in Fall 2013, I was a researcher at Microsoft Research, in the Context, Learning, and User Experience for Search (CLUES) Group.

My Ph.D. is from the School of Computer Science at Carnegie Mellon University, where my advisor was Jamie Callan I was a member of the Language Technologies Institute. My undergraduate degree is a B.Math. in Computer Science from the University of Waterloo. Apparently, I'm not the only one who thinks that CMU and Waterloo are a great combination!

























Other Activities


  • WWW 2020 reading/QA study factoid question list. Four sets of manual and auto-generated factoid questions corresponding to four topics covered in our reading study. If you use this data, please cite: Rohail Syed, Kevyn Collins-Thompson, Paul Bennett, Mengqiu Teng, Shane Williams, Wendy Tay and Shamsi Iqbal. Improving Learning Outcomes with Gaze Tracking and Automatic Question Generation. Proceedings of The Web Conference 2020 (WWW 2020). Taipei, Taiwan.

  • WSDM 2013 Crowdsourced Pairwise Preferences for Readability (.csv file, 9.1Mb): 13857 judged pairs (trusted and untrusted), ~50-word text passages, grades 1-12. Column descriptions are here.
    If you use this dataset, please cite: X. Chen, P.N. Bennett, K. Collins-Thompson, E. Horvitz. Pairwise Ranking Aggregation in a Crowdsourced Setting. Proceedings of WSDM 2013. 193-202.

  • HLT 2004 Readability: Unigram Language Models. Because a significant part of the corpus used for this paper (Web pages or text passages labeled by grade level) contains licensed copyrighted content, we are unable to redistribute the dataset in its original form. However, this folder contains files with frequency counts computed on the entire dataset for the twelve categories of labeled documents used in the HLT 2004 paper, corresponding to material at each of the U.S. elementary school grades 1 through 12 (indexed as 0 to 11). There is also a background English model file. With these raw unigram counts, plus some smoothing as described in the paper, the grade-level language models used for the classifier can be derived. If you use this data, please cite:
    K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. Proceedings of HLT / NAACL 2004, Boston, USA, May 2004. 193-200.
  • Other Links

    Context, Learning, and User Experience for Search Group's home page.

    DBLP Bibliography

    free hit counters