 | Neural network: Encyclopedia II - Neural network - Neural Networks and Artificial Intelligence
Neural network - Neural Networks and Artificial Intelligence
Main article: Artificial Neural Network
Neural network - Background
Neural network models in artificial intelligence are usually referred to as artificial neural networks (ANNs); these essentially simple mathematical models defining a function . The epithet network is used because this function is decomposable into a number of simpler, interconnected elements.
A particular type of ANN model corresponds to a class of such functions. What has attracted the most interest in neural networks is the possibility of learning.
Given a specific task to solve, and a class of functions F, learning means using a set of observations, in order to find which solves the task in an optimal sense.
This entails defining a cost function such that, for the optimal solution f * ,
The cost function C is an important concept in learning, as it is a measure of how far away we are from an optimal solution to the problem that we want to solve. Learning algorithms search through the solution space in order to find a function that has the smallest possible cost.
Neural network - Learning paradigms
There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks.
In supervised learning, we are given a set of example pairs and the aim is to find a function f in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data.
In unsupervised we are given some data x, and the cost function to be minimised can be any function of the data x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modelling, compression, filtering, blind source seperation and clustering.
in reinforcement learning, data x is usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimises some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.
Neural network - Learning algorithms
There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation.
Most of the algorithms used in training artificial neural networks are employing some form of gradient descent. This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction.
Evolutionary methods, simulated annealing, and expectation maximisation and non-parametric methods are among other commonly used methods for training neural networks. See also machine learning.
Neural network - Theoretical properties
Certain theoretical models of neural networks have been analysed in a way that allows properties such as their maximum storage capacity to be calculated independently of any learning algorithm. Various techniques originally developed for studying disordered magnetic systems (spin glasses) have been successfully applied to simple neural network architectures, such as the perceptron. Influential work by E. Gardner and B. Derrida has revealed many interesting properties about perceptrons with real-valued synaptic weights, while later work by W. Krauth and M. Mezard has extended these principles to binary-valued synapses.
Neural network - Generalisation and statistics
In applications where the goal is to create a system that generalises well in unseen examples, the problem of overtraining has emerged. This is arises in overcomplex or overspecified systems. There are two schools of thought for avoiding this problem: The first is to use cross-validation and similar techniques to check for the presence of overtraining and optimally select hyperparameters such as to minimise the generalisation error. The second is to use some form of regularisation. This is a concept that emerges naturally in a probabilistic (Bayesian) framework, where the regularisation can be performed by putting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimise over two quantities: the 'empirical risk' and the 'structural risk', which roughly correspond to the error over the training set and the predicted error in unseen data due to overfitting.
Neural network - Types of artificial neural networks
See artificial neural network for a discussion on the various types of neural networks.
Other related archives1950s, Artificial Neural Network, Artificial intelligence, Artificial neural network, Artificial neural networks, Biological cybernetics, Biologically-inspired computing, Cognitive architecture, Connectionism, Friedrich Hayek, Hebbian learning, Hopfield's network, Neuro-fuzzy, Parallel distributed processing, Perceptron, acetylcholine, artificial intelligence, artificial neural network, artificial neural networks, artificial neurons, autonomous robots, axons, backpropagation, biological neural network, biological neurons, central nervous system, clustering, cognitive modeling, cognitive modelling, cognitive models, compression, computer and video games, connectionism, control, cost function, dendrites, distributed representation, dopamine, filtering, games, gradient descent, gradient-related, linearly separable, long term potentiation, machine learning, neuromodulators, neuroscience, neurotransmitter, non-parametric methods, optimization, parallel distributed processing, peripheral nervous system, plexus, reinforcement learning, simulated annealing, software agents, speech recognition, statistical estimation, supervised, supervised learning, synapses, unsupervised, unsupervised learning
 Adapted from the Wikipedia article "Neural Networks and Artificial Intelligence", under the G.N U Free Docmentation License. Please also see http://en.wikipedia.org/wiki |