Invited Speakers
Title
Optimal Online Prediction in Adversarial Environments
(slides in pdf)
Abstract
In many prediction problems, including those that arise in computer
security and computational finance, the process generating the data is
best modeled as an adversary with whom the predictor competes. The
predictor's aim is to minimize the regret, or the difference between the
predictor's performance and the best performance among some comparison
class, whereas the adversary aims to maximize the predictor's regret.
Even decision problems that are not inherently adversarial can be
usefully modeled in this way, since the assumptions are sufficiently
weak that effective prediction strategies for adversarial settings are
very widely applicable.
The first part of this talk presents an example of online decision
problems of this kind: a resource allocation problem from computational
finance. We describe an efficient strategy with nearoptimal
performance.
The second part of the talk presents results on the regret of optimal
strategies. These results are closely related to finite sample analyses
of prediction strategies for probabilistic settings, where the data are
chosen iid from an unknown probability distribution. In particular, we
show that the optimal online regret is closely related to the behavior
of empirical minimization in a probabilistic setting, but with a noniid
stochastic process generating the data. This allows the application of
techniques from the analysis of the performance of empirical
minimization in an iid setting, which relates the optimal regret to a
measure of complexity of the comparison class that is similar to the
Rademacher averages that have been studied in the iid setting.
Biography
Peter Bartlett is a professor in the Computer Science Division and the
Department of Statistics at the University of California at Berkeley. He
is the coauthor, with Martin Anthony, of the book Learning in Neural
Networks: Theoretical Foundations, has edited three other books, and has
coauthored many papers in the areas of machine learning and statistical
learning theory. He has served as an associate editor of the journals
Machine Learning, Mathematics of Control Signals and Systems, the
Journal of Machine Learning Research, the Journal of Artificial
Intelligence Research, and the IEEE Transactions on Information Theory,
as a member of the editorial boards of Machine Learning, the Journal of
Artificial Intelligence Research, and Foundations and Trends in Machine
Learning, and as a member of the steering committees of the Conference
on Computational Learning Theory and the Algorithmic Learning Theory
Workshop. He has consulted to a number of organizations, including
General Electric, Telstra, Polaris Wireless and SAC Capital Advisors. In
2001, he was awarded the Malcolm McIntosh Prize for Physical Scientist
of the Year in Australia, for his work in statistical learning theory.
He was a Miller Institute Visiting Research Professor in Statistics and
Computer Science at U.C. Berkeley, a fellow, senior fellow and professor
in the Research School of Information Sciences and Engineering at the
Australian National University's Institute for Advanced Studies, and an
honorary professor in the School of Information Technology and
Electrical Engineering at the University of Queensland. His research
interests include machine learning, statistical learning theory, and
adaptive control.
Title
Learning without Search
(slides in pdf)
Abstract
Machine learning is classically conceived as search through a
hypothesis space for a hypothesis that best fits the training data. In
contrast, naive Bayes performs no search, extrapolating an estimate of
a highorder conditional probability by composition from lowerorder
conditional probabilities. In this talk I show how this searchless
approach can be generalised, creating a family of learners that
provide a principled method for controlling the bias/variance
tradeoff. At one extreme very low variance can be achieved as
appropriate for small data. Bias can be decreased with larger data in
a manner that ensure Bayes optimal asymptotic error. These algorithms
have the desirable properties of
 training time that is linear with respect to training set size,
 supporting parallel and anytime classification,
 allowing incremental learning,
 providing direct prediction of class probabilities,
 supporting direct handling of missing values, and
 robust handling of noise.
Despite being generative, they deliver classification accuracy competitive
with stateoftheart discriminative techniques.
Biography
Geoff Webb holds a research chair in the Faculty of Information
Technology at Monash University, where he heads the Centre for
Research in Intelligent Systems. Prior to Monash he held appointments
at Griffith University and then Deakin University, where he received a
personal chair. His primary research areas are machine learning, data
mining, and user modelling. He is known for the development of
numerous methods, algorithms and techniques for machine learning, data
mining and user modelling. His commercial data mining software,
Magnum Opus, incorporates many techniques from his association
discovery research. Many of his learning algorithms are included in
the widelyused Weka machine learning workbench. He is
editorinchief of the highest impact data mining journal, Data Mining
and Knowledge Discovery, coeditor of the Encyclopedia of Machine
Learning (to be published by Springer), a member of the advisory board
of Statistical Analysis and Data Mining and a member of the editorial
boards of Machine Learning and ACM Transactions on Knowledge Discovery
in Data.
Title
Kernel Method for Bayesian Inference
(slides in pdf)
Abstract
Since the proposal of support vector machine, various kernel methods have
been extensively developed as nonlinear extensions or "kernelization" of
classical linear methods. More recently, however, it has become clear that
a potentially more reaching use of kernels is a linear way of dealing higher
order statistics by embedding distributions as the form of means in
reproducing kernel Hilbert spaces (RKHS) and by considering linear operators
among them.
This talk will present how general Bayesian inference can be realized
based on this recent recognition of the kernel method. First, I will
explain the kernel method for expressing conditional probabilities by the
kernel covariance operators of the distributions. Second, it will be shown
that the general Bayes' rule, which is the center of Bayesian inference, is
realized by operations on the kernel expression of the conditional
probability and the prior represented as the mean in RKHS. The kernel mean
of the posterior is obtained by Gram matrix computations to realize the
procedure of Bayes' rule: constructing the joint probability and its
normalization. The rate of convergence of the empirical kernel estimate to
the true posterior is also derived.
As an application, I will discuss kernel nonparametric HMM, in which the
conditional probabilities to define the HMM model are neither given in a
specific form nor estimated with a parametric model, but given in the form
of finite samples. By sequential application of the kernel Bayes' rule, it
will be shown with some experiments that the hidden states can be
sequentially estimated nonparametrically.
Biography
Kenji Fukumizu is a professor in the Department of Statistical Modeling at
The Institute of Statistical Mathematics, where he serves as director of the
Research Innovation Center. Prior to the current institute, he worked as a
researcher in the Research and Development Center, Ricoh Co., Ltd. and the
Institute of Physical and Chemical Research (RIKEN). He was a visiting
scholar at the Department of Statistics, UC Berkeley, and a Humboldt fellow
at Max Planck Institute for Biological Cybernetics. He serves as an
associate editor of the journals, Annals of the Institute of Statistical
Mathematics, Neural Networks, and Foundations and Trends in Machine
Learning. His research interests include machine learning and mathematical
statistics. He has coauthored a book on singular statistical models, and
has authored a book on kernel methods (to be published in 2010).
