http://www.cs.washington.edu/homes/segal/brute.html
Brute
Principal Investigators:
Richard Segal and
Oren Etzioni .
Overview
Brute is an inductive system for performing both data mining and
classification tasks. At the core of Brute, is an algorithm for
efficiently searching the space of conjunctive rules up to a user specified
depth. When used for data mining, this core search can identify rules that
meet a specific search criteria. When used for classification, the rules
found by Brute's core search must be combined to form a classifier. Brute
supports several mechanisms for building a classifier.
Brute differs from existing data mining and classification algorithms in
that it uses massive search rather than greedy search. Massive search can
avoid many of the pitfalls of greedy search, albeit at additional cost. Empirical analysis shows that brute
performs much better than greedy algorithms for data mining and has similar
performance when used for classification. Surprisingly, Brute's running time is often quite reasonable.
Efficiency
Brute's efficiency is important because of its reliance on massive search.
Brute was implemented in C to achieve maximum search speed. Brute can
process 100,000 rules per second on a 500 item database when run a SPARC-10
processor. Brute's running time grows linearly with the number of training examples.
Brute uses several pruning axioms to reduce the size of the space it must
search. These axioms are sound in that they only remove portions of the
search space guaranteed not to contain useful rules. Brute can commonly
reduce the search space by a factor of a 1,000 or more.
Flexibility
Brute is a highly parameterized algorithm that allows the exploration of
several different learning algorithms. Brute's options allow it to simulate algorithms such
as BruteDL, Greedy3, Kdl, 1R, and many variations. This flexibility makes
Brute a valuable tool for learning research.
Brute supports a wide variety of data formats. Brute can be used with
minimal effort on databases from the UCI
repository, C4.5 databases, and IND databases. A program is provided
for automatically creating attribute description files that makes it easy
to use Brute on new data sets.
Availability
Brute is available for both research and commercial uses. Please contact
Oren Etzioni (etzioni@cs.washington.edu) for licensing details.
References
P. Riddle, O. Etzioni, C. Pearson, and R. Segal.
Process improvement through automated feedback (preliminary report). In
Proceedings of the Machine Learning Workshop on Integrated Learning in
Real-World Domains, July 1992.
P. Riddle, R. Segal, and O. Etzioni.
Representation design and brute-force induction in a Boeing manufacturing
domain. Applied Artificial Intelligence, 8:125-147, 1994.
R. Segal and O. Etzioni. Learning
decision lists using homogeneous rules. In Proceedings of the
Twelfth National Conference on Artificial Intelligence, July, 1994.
segal@cs.washington.edu