Next: ESOP Minimization for Machine Up: APPLICATION OF ESOP Previous: Summary of DFC Measurements

Towards Improved Approaches to Logic Minimizers for Machine Learning

Although the AFD approach of FLASH gives very robust results, it is slow, which essentially restricts its practicality to 20 or even less, variables. It is, therefore, important, to be able to compare the FLASH decompositional logic approach to other logic approaches that use the same DFC measure, but introduce some restricted bias resulting from the assumed network's structure. Such approaches are then faster and can be used for larger variable functions. In this respect, the well-known circuit minimizer Espresso and the standard machine-learning program C4.5 were tested together with EXORCISM. These programs have the following structure/gate-type biases: Espresso assumes a two-level AND-OR network, EXORCISM assumes a two-level AND-EXOR network, C4.5 assumes an ordered tree. (The input variables can be multiple-valued).

The questions arise:

How much of the network's simplicity is lost by assuming these structures?
How much is gained in the speed of the program with respect to a bias-free decomposer? Is this speedup worth an increased DFC and thus a more limited extrapolation capability?
Is the method with a biased structure still robust enough for practical applications?

Other important question that must be faced with while developing improved minimizers for machine learning applications is the following:

What are the reasons that machine learning using logic synthesis is not exactly the same as circuit design using logic synthesis?

This question is very important practically. Improving the performance of the FLASH system orders of magnitude without sacrificing much of its robustness (DFC) is required for making it useful for such important military applications as High Resolution Radar, for example.

The data (switching functions) used in learning and algorithm design applications by the PT group are arbitrary switching functions. Thus, the standard and generally applicable minimization procedures of ``logic synthesis' can be applied. An extremely important observation is that these functions have quite different properties than the data taken from industrial companies on which the programs are tested in the ``logic synthesis'' community (MCNC benchmarks). In theory, the algorithm should work well on any type of data. However, since all practical network minimization problems are NP-hard, all practical algorithms, by necessity, are heuristic in nature. Thus, they are very dependent on the type of data. Taking into account the data characteristics (such as closeness to unate Boolean functions) was, in principle, the main reason of the commercial success of two-level logic minimizers in circuit-design applications.

What is it that distinguishes the machine learning data from the circuit design data? Our preliminary answer is the following:

ML problems have an extremely high percent of don't cares (Don't cares are combination of argument value for which the function value is not specified.) The missing data can be represented as don't cares.
Arguments (variables) in ML problems are naturally multiple-valued, or continuous.
ML problems involve missing values in inputs (missing fields) and conflicting data for discrete as well as continuous fields.

Next: ESOP Minimization for Machine Up: APPLICATION OF ESOP Previous: Summary of DFC Measurements

Marek Perkowski
Tue Nov 11 17:11:29 PST 1997