autoclass for Debian
----------------------
AutoClass and multimix are both clustering programs which have been
packaged for Debian. Here is a comparison written by the Multimix
authors:
"AutoClass is a Bayesian clustering program developed by Peter
Cheeseman and colleagues at NASA Ames Research Center. The models
fitted by AutoClass are very similar to those fitted by Multimix,
although both programs were developed independently. Two obvious
differences are
"1. AutoClass has automated the process of model selection as well as
that of parameter estimation but Multimix leaves model-specification
to the user;
"2. AutoClass uses Maximum Posterior estimation in place of Maximum
Likelihood estimation.
"In fact the first is the more crucial difference, because the EM
algorithm at the basis of both programs accommodates both ML and MAP
estimation. AutoClass compares different models by calculating an
approximation to the marginal density of the observed data after the
model parameters have been integrated out. In usual EM language the
approximation used is analogous to taking observed data likelihood to
be proportional to complete data likelihood with the constant of
proportionality to be evaluated at the maximum likelihood estimates.
"The models currently available in AutoClass for attributes within a
component are as follows. Categorical attributes are modelled by
general discrete distributions (multi-category Bernoulli) as in
Multimix. Continuous attributes may be taken to have uniform or
normal distributions, possibly after transformation. Poisson
distributions are available for count attributes. Cheeseman and Stutz
report that von Mises-Fisher distributions for circular and spherical
attributes are under development. At present it appears that
AutoClass does not offer facilities for modelling within cluster
dependencies, the is, all models assume within-cluster independence of
attributes. Missing values are treated as a special kind of value in
some attribute models, but there has been no implementation of the
Little and Rubin methodology for data missing at random."
For more details, including references and comparison with Snob and
Mclust, see these articles:
"Mixture Model Clustering with the Multimix Program" by Jorgensen
and Hunt, in /usr/share/doc/PPAPER.ps.gz.
"Mixture Model Clustering using the Multimix Program" by Hunt and
Jorgensen, in /usr/share/doc/paper.ps.
For an example problem solved by both programs, see
/usr/share/doc/multimix/examples/simple.*
/usr/share/doc/autoclass/examples/simple.*
-- James R. Van Zandt , Sun Dec 9 15:19:50 EST 2001