autoclass for Debian ---------------------- AutoClass and multimix are both clustering programs which have been packaged for Debian. Here is a comparison written by the Multimix authors: "AutoClass is a Bayesian clustering program developed by Peter Cheeseman and colleagues at NASA Ames Research Center. The models fitted by AutoClass are very similar to those fitted by Multimix, although both programs were developed independently. Two obvious differences are "1. AutoClass has automated the process of model selection as well as that of parameter estimation but Multimix leaves model-specification to the user; "2. AutoClass uses Maximum Posterior estimation in place of Maximum Likelihood estimation. "In fact the first is the more crucial difference, because the EM algorithm at the basis of both programs accommodates both ML and MAP estimation. AutoClass compares different models by calculating an approximation to the marginal density of the observed data after the model parameters have been integrated out. In usual EM language the approximation used is analogous to taking observed data likelihood to be proportional to complete data likelihood with the constant of proportionality to be evaluated at the maximum likelihood estimates. "The models currently available in AutoClass for attributes within a component are as follows. Categorical attributes are modelled by general discrete distributions (multi-category Bernoulli) as in Multimix. Continuous attributes may be taken to have uniform or normal distributions, possibly after transformation. Poisson distributions are available for count attributes. Cheeseman and Stutz report that von Mises-Fisher distributions for circular and spherical attributes are under development. At present it appears that AutoClass does not offer facilities for modelling within cluster dependencies, the is, all models assume within-cluster independence of attributes. Missing values are treated as a special kind of value in some attribute models, but there has been no implementation of the Little and Rubin methodology for data missing at random." For more details, including references and comparison with Snob and Mclust, see these articles: "Mixture Model Clustering with the Multimix Program" by Jorgensen and Hunt, in /usr/share/doc/PPAPER.ps.gz. "Mixture Model Clustering using the Multimix Program" by Hunt and Jorgensen, in /usr/share/doc/paper.ps. For an example problem solved by both programs, see /usr/share/doc/multimix/examples/simple.* /usr/share/doc/autoclass/examples/simple.* -- James R. Van Zandt , Sun Dec 9 15:19:50 EST 2001