20th International Conference on
COMPUTATIONAL STATISTICS (COMPSTAT 2012)
27-31 August 2012, Amathus Beach Hotel, Limassol, Cyprus



TUTORIALS

The tutorials will take place during the conference and in parallel with the invited, organized and contributed sessions.



TUTORIAL by ARS of IASC: Bayesian Computing and applications
Cathy W.S. Chen, Feng Chia University, Taiwan.

The objective of this tutorial is to introduce the Bayesian approach to statistical inference with applications and to describe effective approaches for Bayesian modeling and computation. Modern Bayesian inference relies heavily on computational algorithms, and hence a substantial part of the tutorial will be focused on posterior simulation. We describe Markov chain Monte Carlo (MCMC) methods in detail. Commonly used posterior simulators such as Gibbs sampling and random walk and independent kernel Metropolis-Hastings algorithms will be briefly reviewed. MCMC diagnostics will also be discussed including several approaches to monitoring convergence. We will illustrate Bayesian estimation with some popular models, such as regression models with change points, threshold autoregressive models, and GARCH models. All material will be illustrated by example datasets and computer code.



TUTORIAL: Numerical methods and optimization in statistical finance
Manfred Gilli, University of Geneva, Switzerland.

Many optimization problems in theoretical and applied science are difficult to solve: they exhibit multiple local optima or are not well-behaved in other ways (e.g., have discontinuities in the objective function). The still-prevalent approach to handling such difficulties -- other than ignoring them -- is to adjust or reformulate the problem until it can be solved with standard numerical methods. Unfortunately, this often involves simplifications of the original problem; thus we obtain solutions to a model that may or may not reflect our initial problem. But there is yet another approach: the application of optimization heuristics like Simulated Annealing or Genetic Algorithms. These methods have been shown to be capable of handling non-convex optimization problems with all kinds of constraints, and should thus be ideal candidates for many optimization problems. In this talk we motivate the use of such methods by first presenting some examples from finance for which optimization is required, and where standard methods often fail. We briefly review some heuristics, and look into their application to finance problems. We will also discuss the stochastics of the solutions obtained from heuristics, in particular we compare the randomness generated by the optimization methods with the randomness inherent to the problem.



TUTORIAL by IFCS: Mixture models for high-dimensional data
Geoff McLachlan, University of Queensland, Australia.

Finite mixture distributions are being increasingly used to model heterogeneous data and to provide a clustering of such data. For multivariate (continuous) data attention is focussed on mixtures of normal distributions and extensions of the latter, including mixtures of t-distributions for data with longer tails than the multivariate normal. For clustering purposes, the fitting of a g-component mixture model provides a probabilistic clustering of the data into g clusters in terms of the estimated posterior probabilities of component membership of the mixture for the individual data points. An outright clustering is obtained by assigning each data point to the component to which it has the highest (estimated) posterior probability of belonging. There has been a proliferation of applications in which the number of experimental units n is comparatively small but the underlying dimension p is extremely large as, for example, in microarray-based genomics and other high-throughput experimental approaches. Hence there has been increasing attention given not only in bioinformatics and machine learning, but also in mainstream statistics, to the analysis of complex data in this situation where n is small relative to p. In this tutorial, we focus on the clustering of high-dimensional data, using normal mixture models. Their use in this context is not straightforward, as the normal mixture model is a highly parameterized one with each component-covariance matrix consisting of p(p+1)/2 distinct parameters in the unrestricted case. Hence some restrictions must be imposed and/or a variable selection method applied beforehand. We shall focus on the use of factor models that reduce the number of parameters in the specification of the component-covariance matrices, usually after some initial dimension reduction. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature.



TUTORIAL: Knowledge extraction through predictive path modeling and probabilistic networks
Vincenzo Esposito Vinzi, ESSEC Business School, France.

This tutorial will initially focus on the predictive modelling of relationships between latent variables in a multi-block framework by referring to the component-based approach of Partial Least Squares Path Modelling and its most recent variants and alternatives. We will consider several theoretical and methodological issues related to each modelling step, from measurement and structural model specification to model estimation, from the assessment of model quality to the interpretation of results. We will also provide some insights on the statistical criteria optimized by the presented approaches and their practical relevance enriched by a specific discussion on the outer weights and the dimensionality of latent variables. The difficulty of analysis is often due to the complexity of the network of hypothesized (but often hidden) and presumably causal (but mostly predictive) relationships between tangible phenomena or intangible concepts. Therefore, we will discuss the problem of extracting knowledge from uncertain models as compared to modelling the uncertainty in a specific model defined on a priori information. Finally, we will show how further knowledge may be extracted if induction by automatic learning is merged to the evaluation of probabilistic networks. All material will be illustrated by examples and software to understand the results in practice.