The Spectator.

Saturday, 11 February 2012

Curious new word: Modellisation (Modelization)

I came across, what I thought was a strange new word: modellisation (AmE, modelization)

The word appeared in a BBC News article, and was used by one of the programme managers of the European space agency, saying: "Of course, we understand more about [the way rockets perform today] - we have more modellisation capability, computers, etc, ...". As far as I can discern, modellisation is being used where we would more commonly use the word modelling (AmE, modeling).

The word modellisation has certainly not made it into the statistics and machine learning literature. A quick trawl through the Oxford dictionary (OED), Fowler, and the British national corpus (BNC) didn't come up with anything. But there is one entry in the corpus of contemporary American English (COCA) for a paper in the journal of sports behaviour entitled, 'Modelization of table tennis exchanges'. Another similar paper title (found using Google) is, 'Modellisation of the car design process'.

Modellisation is the French word for modelling, and this might explain its appearance and usage. But for now, I don't suspect we will be making much use of it. Another curious word (but this is in the OED), modellising (AmE, modelizing) ... .

Wednesday, 8 February 2012

A stick-breaking likelihood for categorical data analysis

We have a new paper appearing in the forthcoming AISTATS2012 - you can get the paper and code here:

M. E. Khan, S. Mohamed, B. M. Marlin and K. P. Murphy. A stick-breaking likelihood for categorical data analysis with latent Gaussian models, AISTATS, April 2012.

In this paper we look at building models for the analysis of categorical (multi-class) data -- we try to be as general as possible, and look at both multi-class Gaussian process classification and categorical factor analysis. Emtiyaz will soon be on the post-doc trail, so you might here about this live in a lab near you soon. Existing models look at probit and logit link functions, and here we look at a third, new likelihood function, which we call the stick-breaking likelihood (related to the stick-breaking you know from Bayesian non-parametrics). We combine this likelihood with variational inference and show convincing results in favour of our new likelihood. One of the key messages is that this likelihood, in combination with the variational EM algorithm proposed, gives better correspondence between the marginal likelihood and the prediction error. Thus choosing hyperparameters by optimising the marginal likelihood will also give good prediction accuracy, where this is not the case with other approaches. The paper has all the details - all the Matlab code is online as well, so feel free to play around with it and let us know what you think.

A spectral parameterisation of log-linear models

We have a new paper appearing in the forthcoming AISTATS2012 - you can get the paper here:

D. Buchmann, M.Schmidt, S. Mohamed, D. Poole, N. de Freitas. On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models. AISTATS, April 2012.

David has done some great work and the paper provides a nice new way of studying the natural statistics of binary data, in a similar way in which we study the natural statistics of other data, such as images. The paper shows a neat spectral representation of log-linear models and some useful results. It also provides a nice empirical argument for using lower order potentials in such models.

Research Visits to the CSIR and UJ

Part of my trip home this year included research visits and talks at the University of Johannesburg, and the Council for Scientific and Industrial Research (CSIR).

I visited the the University of Johannesburg on the 12th, and met with Tshilidzi Marwala who is the Dean of Engineering, and involved in research in various areas from missing data imputation, the analysis of interstate conflict and condition monitoring. I also met with Bhekisipho Twala, who also works in machine learning and AI in areas including tree based methods, missing data imputation, and applications in credit risk and biometric analysis.

The Council for Scientific and Industrial Research (CSIR), located on am impressive and leafy campus in Pretoria, is the arm of the government of South Africa mandated to conduct research and development that promotes socio-economic growth in the country. I visited the Modelling and Digital Science division and gave a talk on probabilistic models for multi-class classification (look out for the paper on this soon). The talk was well attended (I was happy about this, especially since it's December) and I had the opportunity to chat to people working in Robotics and Biometrics. I even had the good fortune of meeting some of the directors of the CSIR at lunch.

The robotics research group is currently looking to hire an additional post-doc (details on the CSIR's website). Overall a great set of meetings - and hopefully not the last with these particular groups.

Sunday, 20 November 2011

A Review of The Canadian Science Policy Conference

Parliament Hill, Ottawa

I had the opportunity this past week, to attend the 3rd Canadian science policy conference that was held in Ottawa. I attended to learn about science policy and explore what a career in science policy might look like.

Overall the conference was well done and very well attended. The speakers/attendees included the Minister of State for Science and Technology, the chief scientists of Australia and Quebec, the former Premier of British Columbia, other leading politicians with backgrounds in science, the heads of research councils such as CIHR, NSERC, MITACS, people from various Canadian agencies (health, environment, natural resources, atomic energy), journalists, and of course a good representation of academics (with quite a few postdocs) from various institutions.

The conference looked at various topics, like the current Science, Technology and Innovation policy (STIC) in Canada, the recently released Jenkin's report on research and development

Welcome to The Spectator.

I have been wanting to write a blog for quite some time, and thought I'd finally take the plunge and get something going. I hope I will add posts regularly, mainly as a mechanism to exercise my writing skills, and to share my thoughts on various aspects of machine learning and statistics research, science communication, and reports on any other interesting events. Hopefully anyone who finds this will find something of interest.

As an aside, I use the title The Spectator, after the popular London newsletter that circulated between 1711-12. I think it's a great read, but I'll let wikipedia tell you more.