Department of Informatics
Date 31 January 2014
Time 3:00 PM - 5:00 PM
Location 6011 Donald Bren Hall
Contact Adriana Avina [Email this contact]

Padhraic Smyth

Professor, Department of Computer Science 

Director, Center for Machine Learning and Intelligent Systems

University of California, Irvine



Title: Statistical Machine Learning with Count Data 


 Abstract: Data represented in the form of sets of counts is easy to acquire and can be surprisingly useful in practice. For example, a simple way to represent a set of documents is as a "bag of words" where each document is represented just by the counts of words that occur in the document, a representation that has been the basis for many successful applications of machine learning to text data. In this talk we will review some important developments over the past 10 years in modeling data represented in the form of counts, combining ideas from statistics and machine learning. The talk will describe the general principles involved and then illustrate how these ideas can be applied to text documents, email communications, and social networks, including recent work in my research group. The talk will conclude with some speculative comments on future directions.


Bio: Padhraic Smyth is a Professor in the Department of Computer Science (with a joint appointment in Statistics) and Director of the Center for Machine Learning and Intelligent Systems at the University of California, Irvine. His research interests include machine learning, data mining, pattern recognition, and applied statistics. He received a first class honors degree in Electronic Engineering from University College Galway (National University of Ireland) in 1984, and the MSEE and PhD degrees from the Electrical Engineering Department at the California Institute of Technology in 1985 and 1988 respectively. From 1988 to 1996 he was a Technical Group Leader at the Jet Propulsion Laboratory, Pasadena, and has been on the faculty at UC Irvine since 1996. Dr. Smyth is an ACM Fellow, a AAAI Fellow, and recieved the ACM SIGKDD Innovation Award in 2009. He is co-author of two well-known research texts in data mining: Modeling the Internet and the Web: Probabilistic Methods and Algorithms (with Pierre Baldi and Paolo Frasconi in 2003), and Principles of Data Mining, MIT Press, August 2001, co-authored with David Hand and Heikki Mannila. He has served in editorial positions for journals such as the Journal of the American Statistical Association, the IEEE Transactions on Knowledge and Data Engineering, and the Journal of Machine Learning Research. His research has been funded by a variety of government agencies such as NSF, NIH, ONR, DARPA and DOE, as well by companies such as Google, IBM, Microsoft, and Yahoo! In addition to his academic research he is also active in consulting, working with companies such as Samsung, Netflix, eBay, Oracle, Microsoft, Yahoo!, Nokia, and AT&T.