This Electronics Engineering Seminar Topic deals with the following:
In this section we present genetic programming, being the fourth member of the evolutionary algorithm family. Besides the particular representation (using trees as chromosomes) it differs from other EA strands in its application area. While the EAs are typically applied to optimization problems, GP could be rather positioned in machine learning. In terms of nature of this deferent problem types, most other EAs are used for finding some input realizing maximum payoff, whereas GP is used to seek models with maximum fit. Clearly, once maximization is introduced, modelling problems can be seen as special cases of optimization. This, in fact, is the basis of using evolution for such tasks: models are treated as individuals, their fitness being the model quality to be maximized.
We will consider a credit scoring problem within a bank that lends money and keeps a track on how its customers are paying back their loan. This information about the clients can be used to develop a model describing good, respectively bad customers. Later on, this model can be used to predict customer’s behavior and thereby assist in evaluating future loan applicants. Technically, the classification model will be developed based on (historical) data holding personal information along with a credibility index (good or bad) of customers. The model will have personal data as input and produce a binary output. For instance, the annual salary, the marriage status, and the number of children can be used as input. In Table 2 a small data set is shown. A possible classification model using these data might be the following:
IF (Number of children = 2) AND (Salary > 80000) THEN good ELSE bad In general, the model will look like this:
IF formula THEN good ELSE bad.
Notice, that formula is the only unknown in this rule, all other elements are fixed. Our goal is thus to find the optimal formula, forming optimal rule that classifies a maximum number of known clients correctly.
At this point we have formulated our problem as a search problem in the space of possible formulas 1 , where the quality of a formula _ can be defined as the percentage of customers correctly classified by the model IF _ THEN good ELSE bad. In evolutionary terms we have defined the phenotypes (formulas) and the fitness (classification accuracy). In accordance with the typical GP approach we use parse trees as genotypes representing formulas. Figure 1 shows the parse tree of the formula above. This representation differs from the ones used in GAs or ES in two important aspects.