## Quantitative Methods for Performance Evaluation

When the responses of an expert system can be quantified (numerically), we may employ quantitative (statistical) tools for its performance evaluation. Generally, for quantitative evaluation of performance, a confidence interval for one or more measure is considered, and the performance of the expert system w.r.t. the expert is ascertained for a given level of confidence (95% or 99%). If the confidence interval obtained, for an expert system for a given level of confidence, is satisfactory the system requires no adjustment; otherwise we need to retune the knowledge base and the inference engine of the system. An alternative way of formulation is to define a hypothesis H0 and test whether we would accept or reject the hypothesis w.r.t. a predefined performance range. The hypothesis H0 may be of the following form.

H0: The expert system is valid for the acceptable performance range under the prescribed input domain.

Among the quantitative methods, the most common are paired t-test  and Hotelling's one sample T2 test.

Paired t-test: Suppose xi e X and y e Y are two members of the sets of random numbers X and Y. Let X and Y be the inferences derived by the expert system and the expert respectively. Also assume that for each xi there exists a unique corresponding yi. The paired t-test computes the deviation di = xi - yi, Vi and evaluates the standard deviation Sd and mean d_ for di; 1< i< n, where there exist n samples of di. A confidence interval for derived d is now constructed as follows:

d - t n - 1, a < d < d + t n - 1, a where t n _ 1, a is the value of the t-distribution with n-degrees of freedom and a level of confidence a. The hypothesis H0 is accepted, if d = 0 lies in the above interval.

One main difficulty of realizing the t-test is that the output variables of the system have to be quantified properly. The method of quantification of the output variables for many systems, however, is difficult. Secondly, paired t-test should be employed to expert systems having a single output variable xi (and yi for the expert), which may be obtained for n cases. It may be added that for multivariate (i.e., systems having more than one variable) responses, t-test should not be used, as the variables xi may be co-related and thus the judgement about performance evaluation may be erroneous. Hotelling's one sample T2-test may be useful for performance evaluation for multivariate expert systems.

Hotelling's one sample T2-test: Suppose that the expert system has m output variables. Thus for k set of input variables, there must be k output vectors, each having m scaler components. The expert, based on whose reference the performance of the system will be evaluated, should also generate k output vectors, each having m components. Let the output vectors of the expert system be [Xi]m x 1,1< i <k and the same generated by the expert be [Yi ]m x 1,1<i <m. We now compute error vector Ei = Xi - Yi , 1<Vi <k and the mean ( E ) of error vectors Ei , 1<=i<=k . The one sample T2-test is then employed to check whether the E is significantly different from the null vector . If the difference is not significant, then the hypothesis H0 is acceptable.

0 0