RULES EXTRACTION WITH AN IMMUNE ALGORITHM

In this paper, a method of extracting rules with immune algorithms from information systems is proposed. Designing an immune algorithm is based on a sharing mechanism to extract rules. The principle of sharing and competing resources in the sharing mechanism is consistent with the relationship of sharing and rivalry among rules. In order to extract rules efficiently, a new concept of flexible confidence and rule measurement is introduced. Experiments demonstrate that the proposed method is effective.


INTRODUCTION
Classified rule extraction is an important aspect in the data mining.Usually, a rule extraction method is based on sample learning by using some classification method to obtain the classified rules.The techniques used in classification rule methods mainly include decision trees, neural networks, and the genetic algorithm.However, by using the decision tree method, it is difficult to mine large data sets.Sometimes, the meaning of rules produced by the neural network method is hard to be understood.The standard genetic algorithm can possibly have degenerated phenomenon in the evolution process, and it is unable to express relationships of complementing and competing among rules.When dealing with multi-peaks search issues, it is often converged to the point of sub-optimal solution.In order to extract rules better, a method based on an immune algorithm with flexible confidence is proposed in this article.Experimental results have confirmed the validity of the new algorithm.

SIGNIFICANCE MEASURE METHOD OF THE RULES
In this article, a rule is expressed in the form of B A ⇒ , where A is the conjunction of condition attribute values called the rule's antecedent, and B is the value of a decision attribute called the consequent.If some condition attributes are unimportant to the decision, that is to say, whatever values the condition attribute takes do not affect the decision, these positions are expressed with zeros.
When extracting rules, it is extremely important to select an appropriate rule measure.There are many kinds of rule measure methods previously proposed.Let | | represent the cardinal number of the set.In the literature (Khoo & Zhai, 2001) a genetic algorithm was used to extract rules, and the formula to measure rule is ( ) The square operator appears to ensure rapid convergence.He et al. (2005)   When the decision class distribution is imbalanced, the rules of the small class frequently are neglected, because they contain few samples.This factor enables the rules of a small class would be extracted.The bigger 1 x is, the more important the rule is to class B .
Coverage.The coverage of the rule, i.e. the proportion of the samples satisfied by the rule antecedent A to the total number of the samples, is given by M A / , denoted by 2 x .It indicates the induction ability of the rule in the sample sets.The bigger 2 x is, the stronger the induction ability is.Such rule is the important one.
The scale of rules.The scale of rules is the number of samples satisfied by the rule antecedent A in the sample space, given by * A ( ) , denoted by N .This factor indicates the predictive ability of the rule.When we extract rules, a rule that has the bigger value of N is desired.The bigger N is, the stronger the predictive ability of the rule is.
Confidence.The confidence of rule refers to accuracy, denoted by imp .The common confidence measure method is given by denoted by 3 x .The bigger 3 x is, the more confident the rule is.When we extract rules, a rule that has the bigger value of 3 x is expected.However, the request of excessively high confidence causes the probability of the small extracted rules to increase.On the other hand, it cannot indicate the ability of accommodating noise when dealing with the data which contains little noise.Therefore, in order to extract rules with a higher summary ability, the confidence of a rule may be reduced properly, especially when high confidence is not essential.
Generally, there are two methods in dealing with confidence of a rule.The first method is using primitive confidence as one factor of rules measure (Khoo & Zhai, 2001;He, 2005).A rule with low confidence could still participate in the competition.Although its ability within the competition is weakened, the effect is slight.This .Thus, the rule 1 r will lose its competitive ability completely.
In fact, the rule 1 r is not too bad.By inspecting whole rule space, one can see that near any threshold, the primitive confidences of the rules are all extremely close.It is not reasonable to use the bi-value function.
Therefore, it is very important to measure the confidence of rules and determine the significance of the rules with high confidence and to process the confidence flexibly.Processing the rules with slightly low confidence, not only decreases their significance, but also allow them to retain some competitive ability.
Here we introduce a new flexible confidence function: , where, a is called the gradient factor.The bigger the value of a , the more obviously the value of imp changes nearby Tr .
By adjusting the value of a , values of imp can be improved.The span of the values takes the threshold Tr as the center point.The interval of the span increases as a is reduced.This span is called a flexible confidence interval.Taking a map from the interval to ( ) 1 0, , if confidence of a rule is within the interval, then we can get a value in ( ) 1 0, , and this value is taken as new flexible confidence.A rule for which primitive confidence is very high ( ) Tr >> has the new flexible confidence still very high, approaching to 1.A rule for which primitive confidence is very low ( ) Tr << has the new flexible confidence still very low, approaching to 0. A rule for which primitive confidence is near the threshold has the new flexible confidence distributed over the entire interval ( ) In the precondition of approximately satisfying the demand of a threshold, regarding the confidence threshold as a non-compulsory factor becomes a part of the rule significant measure formula.
From the discussion at above, we give the formula to measure the significance of the rule as: (1)

SIMILARITY
Similarity refers to the similar degree of rules.In this paper, similarity is used to determine the degree of similarity of antibodies in the immune memory system.When we extract rules, the rules with a large degree of similarity are controlled, and the rules covered by others are inhibited.The similarity function is defined as: )) , ( max( j i s sim = , where, ) , ( j i s is the similarity between the rule i and the rule j (the rule j is in the Data Science Journal, Volume 6, Supplement, 18 November 2007 immune memory system).Suppose n is the number of the condition attributes, we have: is the value of the rule i on the condition attribute k .k f is the similarity between rule i and rule j where, k a is the number of values on the condition attribute k .If two rules have the same value on a condition attribute, we consider that they are completely similar on this attribute.If their values are different, and neither of them is 0 (any other value), they are completely dissimilar on this attribute.On a condition attribute, if the value of rule i is 0 and the value of rule j is not 0, they are similar with probability of k a / 1 . Because the value of rule i is 0 (i.e.rule i contains rule j on this attribute), the similarity decreases.In the contrary situation, it increases.The similarity of the rule is the average similarity of other rule on all the condition attributes.The similarity between a rule and the immune system is defined as the maximum value of similarities of the rule to other rules in the immune system.
For instance, suppose there are one decision attribute and three condition attributes in the decision For dataset MONK1, the training data set is MONK1.train.The sum of the samples is M=124, the sum of the condition attributes is C=6, and the sum of the decision attributes is D=1.There are two decision classes.Let the number of elements in the seed population be P=20, the number of elements in memory pool be N=M (supposing the pool is big enough), the number of elements in the propagation pool be Q=100, and the number of the maximum evolution generation be GG=50.We adopt the flexible confidence function: , which are adjustable according to the actual situation.When evaluating the significance of the rule, we consider the following several factors:Completeness.The completeness of the rule, i.e. the proportion of the samples satisfied by the rule antecedent A and the consequent B to the samples satisfied by the rule consequents, is given by B obviously does not conform to the original intention of rule extraction.The second method is limiting the confidence of the rule by using the threshold Tr ( ) the confidence lower than Tr are not allowed to compete, namely taking a bi-value function too strict, however, when processing the boundary with bi-value function.If there are two rules 1 r and 2 r , one pass through, the correct ratios of the extracting rule both for training data and testing data are 100%.Dataset MONK3 is with 5% noisy data, and the training data set is MONK3.train.The sum of the samples is M=122, the sum of the condition attributes is C=6, and the sum of the decision attributes is D=1.Let the number of elements in seed population be P=20, the number of elements in the memory pool be N=M (suppose the pool is big enough), the number of elements in the propagation pool be Q=100, and the number of the maximum evolution generation be GG=50.We adopt flexible confidence function with different values of threshold Tr and gradient factor a to extract rules.The contrasting results with flexible confidence function and bi-value function are shown in Figure 1.In the figure, the highest position of the line indicates the maximum recognition ratio.The results show that our algorithm does a good job in extracting correct rules for this kind of data.

Figure 1 .
Figure 1.Accuracy of rule extraction for data MONK3