A perceptron is one of the classifiers used in the field of Machine Learning. It is used for the supervised learning of binary classifiers meaning the algorithm will tell us whether something is part of a class or not. The idea of a perceptron can be based on a biological neuron. When enough “inputs” accumulate to change the potential of the neuron cell above a certain threshold, an action potential occurs which fires a signal (output) down the axon.

Before reading on I would just like to say that I am only getting started with Machine Learning, so if you spot any mistakes let me know.

With our perceptron the inputs are weighted, but we do not know the values of these weights. The algorithm “learns” the optimum weight values as it trains on training data, one at a time. Once it has found these optimum weights it can then use them to classify unknown test cases as either being part of the class or not (hence why it is known as a binary classifier).

Whenever a new set of inputs arrive, multiplying each input by their respective weight and summing it up together gives you the weighted sum. This value will then either be higher or lower than a threshold which when higher than the threshold fires the output. This is much like the all-or-none law when it comes to neurons. It either fires or it does not. There are no partial outputs (much like there is no partial action potential). The reason this is a machine learning algorithm is because the weights are constantly adjusted when training so that a linear decision boundary can be made that allows us to then separate unknown test cases into two distinct classes; either it is a part of the class or it is not.

Given a set of inputs (features) and weights, we have the following:

Where t is our threshold.

However, the algorithm should “learn” these weights when using it against training data so that it will be able to successfully classify unknown test cases as the weights are what shift the decision boundary (think of it as the gradient of a 2D plot).

**Handwritten Digits**

A perceptron can tackle any problem given a set of features (each input is represented by a vector of features). All it then does is change it’s weights for particular features in accordance to what it is trying to detect, whether it is pixels or other attributes. In this example we are going to use handwritten digits.

Assume we are given a dataset containing only digits 5 and 9, where each photo of a digit has dimensions 16 by 16. We want our perceptron to be able to detect whether a particular set of pixels (a photo of a digit) is part of the 5 class or not.

**The Data**

We will use 500 different handwritten versions for each of the digits 5 and 9. In order to pack everything into a 2D array we will split each 16 by 16 photo into a 1 by 256 row. That way each row will represent one digit. This gives us an array of 1000 rows by 256 columns.

Each row in the Data array will also have a corresponding row in the Labels array (where it’s true class label is, 1 for 5 and 0 for 9). The label array ensures we know what a particular vector of features should be labelled as.

**Learning The Weights**

Now that we have our data, we need to be able to let the perceptron learn the optimal weight values. The training perceptron algorithm works as follows (I will be using MATLAB, you can download the handwritten digits data here):

```
clearvars
load data
```

The first step is to load the data in. This should give you a data and a labels matrix. The perceptron does not need to make use of the labels matrix, instead we will make our own (where the first 100 rows are set to 1 and the next 100 are set to 0).

```
randomIntegers = datasample(2001:2500,100, 'Replace', false);
trainingDataSet(1:100, :) = data(randomIntegers, :);
trainingLabelSet(1:100) = 1;
data(randomIntegers, :) = [];
randomIntegers = datasample(3901:4400,100, 'Replace', false);
trainingDataSet(101:200, :) = data(randomIntegers, :);
trainingLabelSet(101:200) = 0;
data(randomIntegers, :) = [];
```

We sample 100 random 5’s and 100 random 9’s (remember the data set includes digits 0 to 9, so we have to make sure we only pick from the 5 and 9 range). Then we remove them from the overall data set (so we don’t choose them again during testing).

```
weights = rand(1, 256);
threshold = 0;
learningRate = 0.1;
numberOfIterations = 30;
countMistakes(1:numberOfIterations) = 0;
```

We then randomise the 256 weights (1 for each pixel) to a number between 0 and 1. We set the threshold at 0 (the threshold determines whether the output is fired or not. In this instance we get a negative number when we dot product the weights against the pixels of a class that is not 5, and we get a positive number when it is 5, hence the threshold is okay to be 0). The learning rate should be set between 0 and 1, and we’ve set it at 0.1. If the learning rate is too low the weights won’t reach an optimal level, and if it is too high the weights will constantly over-correct. The number of iterations is the amount of times we repeat the learning of the weights against the training data. After a finite number of iterations the perceptron should no longer make any mistakes in identifying the correct class (as the weights should now be at optimal level). The countMistakes variable will let us keep track of how many mistakes we have made at each iteration.

```
for n = 1:numberOfIterations
for t = 1:200
result = dot(weights, trainingDataSet(t, :));
if result > threshold
output = 1;
else
output = 0;
end;
mistake = trainingLabelSet(t) - output;
countMistakes(n) = countMistakes(n) + abs(mistake);
for w = 1:256
weights(w) = weights(w) + learningRate * mistake * trainingDataSet(t, w);
end;
threshold = threshold + learningRate * mistake * -1;
end;
end;
```

Above is the training algorithm itself. For n iterations and for each training example we calculate the activation/result (dot product between weights and the pixel values of that particular digit). We then compare the result to the threshold. A positive number outputs a 1 (as all of our digit 5’s are labelled as 1, the class we wish to identify). A negative number outputs a 0 because all of our 9’s are labelled as 0, because the digit 9 is not part of our 5 class. We can then calculate whether a mistake was made or not. If the labels match up we get 0, if they do not we get either -1 or 1 (because of trueLabel – output). For instance if we are on the first row of our digit 5 data set and the result we get is a negative number the output will be set to 0. The value of mistake will be 1 – 0 = 1, because the label is set to 1 and the output should have been 1, but it was 0 due to the weights not having the right values yet. What then follows is every weight value is updated accordingly by the following piece of code:

weights(w) = weights(w) + learningrate * mistake * trainingDataSet(t, w);

The new value of that particular weight will be multiplied by the value of mistake. If no mistake is made (mistake = 0,) then the weight values are not updated (because anything multiplied by 0 is still 0). If the value of mistake is either 1 or -1, then the weight is updated by the value of mistake multiplied by the learning rate and whatever the value of the feature vector was at that particular point (so whatever the value of the pixel brightness is in the 0-255 range at that point). This will ultimately give the weights that correspond to pixels that make the shape 5 very large positive values, and the other weights negative values. That means after training a perceptron you can give it a test data input vector (any digit 5 or 9 that was not in the training data) and it will correctly classify it as either a 5 or not a 5 because it’s weights have been altered to do so. We also update the threshold but multiply it by -1 each time rather than the value of the pixel. This will slightly alter the threshold each time the weight values are updated during training (unless no mistake is made).

```
plot(countMistakes, '-bo');
xlim([1, numberOfIterations]);
```

Lastly we can plot the mistakes made at each iteration.

As you can see, after 9 iterations the perceptron no longer makes any mistakes in correctly classifying whether a given vector of features is or is not part of the digit 5 class.

Now that we have trained our perceptron, we should test it against some unknown cases to see how it fares. The following code uses the remaining 800 input vectors as the test data set.

```
testDataSet(1:400, :) = data(2001:2400, :);
testLabelSet(1:400) = 1;
testDataSet(401:800, :) = data(3901:4300, :);
testLabelSet(401:800) = 0;
mistakes = 0;
for t = 1:800
result = dot(weights, testDataSet(t, :));
if result > threshold
output = 1;
else
output = 0;
end;
mistakes = mistakes + abs(testLabelSet(t) - output);
end;
accuracy = 100 - mistakes/800 * 100;
```

Running the above gives me 6 mistakes made in a test data set of 800. That means the perceptron has 99.25% accuracy.

Running all of the above multiple times gives us the following results:

**During Training**

On average it takes 8.1 iterations till the perceptron has gained optimal weight values and no longer makes mistakes.

**During Testing**

During testing there were an average of 10.2 mistakes made out of 800 unknown test cases.

This gives an average accuracy of 98.725%.

Hopefully this gives you an idea of how perceptrons work. I have only just learnt about them so feel free to leave a comment below if you spot any errors 🙂