[Neural Networks I] Neural Network Basics
--
Modern technology is in high demand of intelligence, pressing for advances in Machine Learning to solve bigger and more complex problems. Neuron Networks have become popular solutions for new devices, presenting themselves as good fit for handling large sets of complex data while providing great performance.
Several frameworks are available to facilitate a faster development of the models, providing multiple abstraction layers between the Neural Network units and the user. In order to design the most efficient Neural Network model for the target problem the user must be familiar, and proficient, with the a Neural Network structure and each unit’s role.
Neural Network Structure
A Neural Network is a Machine Learning algorithm, inspired by the human brain, that uses interconnected neurons in a layered structure. Each layer is designed for a particular purpose within the model, implementing a specific operation.
Neurons
A Neuron is the basic unit of a Neural Network, drawn in figure 1, implements a mathematical function that computes the weighted average of its input. The result is then processed by an Activation Function, which determines the output of the Neuron.
For a given artificial neuron k, let there be m inputs with signals X0 through Xm and weights Wk through Wm. The output of the kth neuron is:
It is common for one input, denominated Bias, to be added to each neuron. The Bias is a constant value to shift the Activation Function by a known amount. The neuron formula is then transformed into:
The Bias value enables the shifting of the Activation Function, which might be essencial for the success of both the training routines and, consequently, the performance of the resulting model.
An example of impact of the Bias is presented in figure 2, where an Activation Function is drawn with and without a Bias.
The example’s Activation Function, equation 3, originally is designed to filter out negative values and linearly output positive values as it can be apprehended by its graph.
But for a specific case, the user might find it better to only linearly output values above a certain threshold. This is done by setting the Bias to the desired value for the threshold.
Taking figure 2 as reference, if the Bias is set to the value 2, the Activation Function is shifted 2 units, filtering all numbers bellow and linearly outputting all above it. Its formula is, therefore, transformed into equation 4.
Hidden Layers
A Hidden Layer is a layer between the input and the output of the Neural Network, composed of a set of neurons that perform transformations of the inputs feeded into the network.
A neural network can implement multiple intra-connected, where all neurons within the layer are connected, as well as inter-connected Hidden Layers, in which the outputs of one layer are the inputs of another, as depicted in figure 2. These Hidden Layers can be designed for difference purposes and applications.
Hidden Layers enable the break-down of the function of Neural Networks into specific transformations of the model’s data, with each layer function designed to produce a specific output.
For example, Hidden Layers for the detection of human eyes can be used in combination with subsequent layers to increase the capability of the model, enabling the detection of faces. Hidden Layers for the detection of human eyes alone are not enough for the detection of faces.
Activation functions
The Activation Function is a node within a Neural Network that decides the output of that neuron for a given set of inputs. The Activation Function can be Linear or Non-Linear, depending on the Neural Network model and which fits best.
Linear
The Linear Activation Function, figure 3, is based on the equation f(x) = x, where its output is equal to the weighted input received. Since there is no transformation done to the input this type o Activation Function is denominated “No Activation” or “Identity Function”.
Non-linear
Non-Linear Activation Functions can be designed to better fit the data used in the Neural Network, achieving a better performance.
The ReLU (Rectified Linear Unit), is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero, as seen in figure 4. The ReLU is currently the most used Activation Function in Neural Networks because it offers the better performance in the training routines as well as when deployed into the target system.
The ReLU function has some issues with the transformation of negative values to 0, since this decreases the ability of the Neural Network model to fit or train from the training dataset properly. Several advanced variations of the ReLU were developed to provide a better approach to negative values.
Example of a Neural Network model
The Addition operation can be implemented as a basic form of a Neural Network model, although its implementation is only advantageous for high order numbers to avoiding multiples Adder units, as well as other required hardware, to handle big numbers.
For the Addition problem we can consider the most basic form of a Neural Network, represented in figure 6, where a neuron is connected to 2 inputs, without any bias, and the respective activate function.
Weight Configuration
For the Addition Use Case, the weights must be set to the same value, i.e., w0 = w1, since both inputs have the same importance in this operation’s result.
Considering the Neuron’s function, equation 1, w0 and w1 should be set to 1. This way the multiplication is eliminated and the formula transformed into an Addition of the inputs x0 and x1.
Activation Function
The Neuron’s output is the Addition’s result of the inputs x0 and x1, therefore the Activation Function must output this value unchanged. As stated in the Activation Section above, the Linear function is the best solution for this model, as it simply outputs the Neuron’s value.
Figure 7 presents an example of the behaviour of the Addition Neuron Network for two given inputs, for the said Weight and Activation Function configurations.