next up previous
Next: Numerical Results Up: APPLICATION OF ESOP Previous: Towards Improved Approaches to

ESOP Minimization for Machine Learning

In the past, EXORCISM-mv-2 was tested very successfully on industrial circuits that have up to 80% of the values as don't cares [24]. Comparisons with EXMIN [23] and MINT [13] demonstrates that EXORCISM either finds the best solution, or finds one that is nearly the best one of the three. EXMIN does not allow for don't cares so it is of little use to machine learning, and MINT is slower than EXORCISM. Its approach to handling don't cares is very similar to the older variant of EXORCISM. One can then assume that the following critical remarks about EXORCISM apply to all current ESOP minimizers.

Testing EXORCISM on ``machine learning'' applications, we found that its performance on learning benchmarks is still unsatisfactory on larger examples. Analyzing these cases, we came to the conclusion, that the inferior behavior of the approach is caused predominantly by the extremely high percent of don't cares that is typical for machine learning benchmarks, and can be as high as 99.99%. Espresso beats EXORCISM in cases where functions had a high percentage of don't cares.

However, EXORCISM is able to find a pattern of EXOR (parity) or similar functions, even when it is corrupted by ``unknowns". This is a difficult problem in machine learning. To explain this case on an example, let us assume that we recognize the even/odd parity function. For a completely specified function and a relatively small (less than 16) number of variables, the AFD minimizer finds the exact minimal result (EXOR of input variables) quite fast. When we add some ``unknowingness'' to this function by replacing some ones and zeros with ``don't cares'', we should still be able to find the EXOR of inputs solution, since the underlying principle function did not change, only its pattern has been corrupted, ``hidden'' by the unknown values. This seemed to work, but when the percentage of don't cares increases and the number of variables increases, the average error increases and the method yields poor results. First it ceases to recognize the ``EXOR of variables'' pattern, and second, on 96-variable functions, it finds no EXOR's at all and looses track of any patterns (so would a human on this case). As seen in tests, EXORCISM deals better with these kinds of problems, but is still unable to solve the 96-variable problem.

Another positive property of EXORCISM is simultaneous classification of patterns to more than two categories (you want not only to distinguish a ``friend from foe'' airplane, but you want to learn its orientation, speed, etc.). In terms of logic synthesis, this property corresponds to concurrent minimization of switching functions with many outputs. Currently FLASH operates on single-output functions, but EXORCISM works with multi-output functions. There are many decomposers that decompose multi-output functions, but all of them have been designed for circuit design. One needs a minimizer for strongly unspecified, multi-valued input, multi-output functions. EXORCISM (as well as Espresso) both satisfy this requirement, but they both can be improved for strongly unspecified functions.

What is also missing in both ``industrial circuit'' and ``machine learning'' decomposing systems, is the decomposition of multiple-valued input, multiple-valued-output functions.

Why is this important? In theory, which is also the approach of the PT group, any multiple-valued variable can be encoded by a vector of binary variables. What happens, however, in learning situations is, that the learning system inferences rules that depend on the encoding of multiple-valued variables with binary variables. To give an example, if the system would infer from a large set of data that people who live close to power lines develop cancer, we would perhaps treat such ``invention'' with due care. If the system would, however, infer that people whose third bit of encoded distance from the line is 1 develop cancer, we would treat such inference as a ``coding-related'' artifact. Therefore, the best approach to the learning system would be not to use coding at all, but perform the inference on the variables that are natural for any given feature; e.g. either binary (man, woman), or multiple-valued (distance in yards). EXORCISM has this property.


next up previous
Next: Numerical Results Up: APPLICATION OF ESOP Previous: Towards Improved Approaches to

Marek Perkowski
Tue Nov 11 17:11:29 PST 1997