Saturday 11 February 2012

Curious new word: Modellisation (Modelization)

I came across, what I thought was a strange new word: modellisation (AmE, modelization)

The word appeared in a BBC News article, and was used by one of the programme managers of the European space agency, saying: "Of course, we understand more about [the way rockets perform today] - we have more modellisation capability, computers, etc, ...". As far as I can discern, modellisation is being used where we would more commonly use the word modelling (AmE, modeling)

The word modellisation has certainly not made it into the statistics and machine learning literature. A quick trawl through the Oxford dictionary (OED), Fowler, and the British national corpus (BNC) didn't come up with anything. But there is one entry in the corpus of contemporary American English (COCA) for a paper in the journal of sports behaviour entitled, 'Modelization of table tennis exchanges'. Another similar paper title (found using Google) is, 'Modellisation of the car design process'.

Modellisation is the French word for modelling, and this might explain its appearance and usage. But for now, I don't suspect we will be making much use of it. Another curious word (but this is in the OED), modellising (AmE, modelizing) ... .

Wednesday 8 February 2012

A stick-breaking likelihood for categorical data analysis

We have a new paper appearing in the forthcoming AISTATS2012 - you can get the paper and code here:
M. E. Khan, S. Mohamed, B. M. Marlin and K. P. Murphy. A stick-breaking likelihood for categorical data analysis with latent Gaussian models, AISTATS, April 2012.
In this paper we look at building models for the analysis of categorical (multi-class) data -- we try to be as general as possible, and look at both multi-class Gaussian process classification and categorical factor analysis. Emtiyaz will soon be on the post-doc trail, so you might here about this live in a lab near you soon. Existing models look at probit and logit link functions, and here we look at a third, new likelihood function, which we call the stick-breaking likelihood (related to the stick-breaking you know from Bayesian non-parametrics). We combine this likelihood with variational inference and show convincing results in favour of our new likelihood. One of the key messages is that this likelihood, in combination with the variational EM algorithm proposed, gives better correspondence between the marginal likelihood and the prediction error. Thus choosing hyperparameters by optimising the marginal likelihood will also give good prediction accuracy, where this is not the case with other approaches. The paper has all the details - all the Matlab code is online as well, so feel free to play around with it and let us know what you think.

Tuesday 7 February 2012

A spectral parameterisation of log-linear models

We have a new paper appearing in the forthcoming AISTATS2012 - you can get the paper here:
D. Buchmann, M.Schmidt, S. Mohamed, D. Poole, N. de Freitas. On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models. AISTATS, April 2012.
David has done some great work and the paper provides a nice new way of studying the natural statistics of binary data, in a similar way in which we study the natural statistics of other data, such as images. The paper shows a neat spectral representation of log-linear models and some useful results. It also provides a nice empirical argument for using lower order potentials in such models.