UPDATE: This article has now been integrated into the GAF documentation. The documentation can be found here.

## Introduction

A question often asked of me is how to determine the encoding of data variables using a binary string, when they are all different units and different ranges of values.

This question is often followed by a question relating to how to ensure that when decoding a binary string, you only have sensible, valid, values.

The questions above can be answered by using something I refer to as normalisation.

## Encoding and Decoding

The first point about binary based GAs is that you only need to decode the binary elements to the real values. There is no concept of encoding. If you are not used to GAs this seems rather odd. However, the GA creates all of the solutions based on fitness or some other random effect and is therefore responsible for creating the binary strings. The GA has no knowledge of the values or how they are encoded. All that we have to do is tell the GA how long the binary strings should be. We then, at each generation, take the GA generated binary strings and decode them into real domain values. Once we have the real values, we can evaluate the fitness of each solution and leave the GA to do its work until the next generation.

What normalisation does is simplify the decoding process.

Lets say we have three variables, x, y and z, with the following ranges;

```
x: -2.5 + 9.5 (range is 12 with an offset of -2.5)
y: 0 to 1000 (range is 1000 with an offset of 0)
x:-50 to 0 (range is 50 with an offset of -50)
```

Assuming that we are not working with integers we need to determine the resolution i.e. how many values we need between the range. For example, if we are happy with say a million values between the upper and lower values in the ranges, 20 bits per variable will suffice (2^20 = 1048576). Based on this we can determine how many bits each variable will use.

Each variable can have any appropriate number of bits, however, let’s say we use the same bit length of 20 bits for x, y and z. This gives us three values in our chromosome each with a bit length of 20, therefore, we will need a chromosome that is 60 bits long.

The GA will continually generate solutions (chromosomes) 60 bits long. In order to retrieve the real domain values from our chromosome, we use a range constant.

This is calculated as follows;

```
rangeConstant = range /(System.Math.Pow(2, numberOfBits) - 1);
```

Here is an example of the range constant approach for x, y and z.

```
// range constant for x
var rcx =12/(System.Math.Pow(2,20)-1);
// range constant for y
var rcy =1000/(System.Math.Pow(2,20)-1);
// range constant for z
var rcz =50/(System.Math.Pow(2,20)-1);
// when evaluating the fitness simply retrieve the 20 bit values as integers
// from the chromosome e.g.
var x1 =Convert.ToInt32(chromosome.ToBinaryString(0, 20), 2);
var y1 =Convert.ToInt32(chromosome.ToBinaryString(20, 20), 2);
var z1 =Convert.ToInt32(chromosome.ToBinaryString(40, 20), 2);
// multiply by the appropriate range constant and adjust for any offset
// in the range to get the real values
double x = (x1 * rcx) - 2.5;
double y = (y1 * rcy);
double y = (y1 * rcz) - 50;
```

This simple approach ensures that any given binary string will always decode to valid, within range, values for each of our variables. Once we have the real values, we can test for fitness and the GA will do the rest. If you are not used to GAs it can seem a little like magic, to be honest, it probably is.

Enjoy!

Hello. I use Genetic Algorithm Framework (GAF), and I would like to know GAF is supports Uniform Mutation and if so. How can I apply Uniform Mutation to an Integer type genes.

P.S. Uniform Mutation This operator replaces the value of the chosen gene with a uniform random value selected between the user-specified upper and lower bounds for that gene. This mutation operator can only be used for integer and float genes.

Hi Alex, the GAF doesn’t support this ‘out of the box’ however, it would be a simple thing to create a custom operator to undertake this mutation. For an introduction to custom operators have a look at this.

https://gaframework.org/wiki/index.php/Genetic_Operators#Custom_Operators

For a more concrete example check this out.

https://gaframework.org/wiki/index.php/Creating_an_Auto-Mutate_Operator

I hope that helps.

Hi John,

Thanks for your documentation – This is really great.

I have a question regarding integer chromosomes. I want to generate a chromosome of length n with each gene being a whole number between 0 and 10 for example. e.g. {54300135033…n).

I currently have a binary generation done: e.g. {11001010010011110…n}

What is the best way to do this using GAF?

Regards

Darshik

Its impossible to say which is best without understanding the problem and the operators you intend to use. In binary encoding, it is normal for each parameter you are encoding to span several genes, this is how crossover works. But with other operators it may be appropriate to have a single object/integer in a single gene. The Travelling salesman problem is an example of this.

Sorry that this isn’t of much help. Why not share more about what you are doing.

Hi John,

Many thanks for the quick reply. Here is a bit more information. I effectively have projects that I switch on and off with the binary code. Each binary Gene (1 or 0) is assigned to a project to switch it on or off. The combination of projects determines the fitness function. I am getting great results with this.

Now, I want to up the simulation a bit. I want to use the same/similar method to set the start year of a project – this determines the value of the project and allows me to optimize to some criteria for each year. The premise is to optimize over a period of, for example, 10 years, with a value of 0 meaning the project is off.

Hence, before I had a binary code of say {1101110…n} where each gene of 1 or 0 is an on/off switch and n is the number of projects in my group. Now I would like to create a chromosome of say {90312110…n} where each gene is set as the start year and if its 0 it is off, for n projects. I know this is a completely different method of using the solver. Wanted to check if it was possible first before i do the approach of generating n number of chromosomes, converting it to an integer of max 10, then applying it to each project.

Is this possible? I hope my explanation sheds light on my use case.

Please feel free to email me rather than commenting here as you should have my email address from this post.

Regards Darshik