Choosing an appropriate learning rule directly impact the performance of the networks. For an SNN, the most common learning rule is the Hebbian rule, which will be explained in the following section.
Finally, after training the SNN successfully, it should be validated in other scenarios and be optimized if necessary. Figure 3. General design framework for learning-inspired SNN-based robot control. At the very beginning of the construction of an SNN for robot control, an appropriate SNN control model should be decided on. The basic task is to determine the general topological structure of the SNN, as well as the neuron models in each layer of the SNN.
Generally, neuron models can be expressed in the form of ordinary differential equations. In the literature, many different mathematical descriptions of spiking neural models have been proposed, processing excitatory and inhibitory inputs using internal state variables. To find an appropriate one among existing diverse neuron models, there is usually a trade-off to be balanced between the biological plausibility and complexity.
A detailed comparison of the neuro-computational properties of spiking and bursting models can be found in Izhikevich One of the most widely used models is the so-called Leaky-Integrate-and-Fire LIF model Stein, that can be easily explained by the principles of electronics. These models are based on the assumption that the timing of spikes, rather than the specific shape, carries neural information Andrew, The sequences of firing times are called spike trains and can be described as.
Passing a simplified synapse model, the incoming spike train will trigger a synaptic electric current into the postsynaptic neuron. This input signal i t induced by a presynaptic spike train S j t can, in a simple form, be described by the exponential function Ponulak and Kasinski, :.
This synaptic transmission can be modeled by low-pass filter dynamics. The postsynaptic current then charges the LIF neuron model increasing the membrane potential u according to.
Usually, this spiking event is followed by a refractory period in which the neuron stays inactive and can't be charged again. It is worth pointing out that biological studies highlight the presence of another operational unit cell assemblies Braitenberg, in the brain, which are defined as a group of neurons with strong mutual excitatory connections and tend to be activated as a whole.
A deeper review of spiking neuron models can be found in Andrew The term neural encoding refers to representing information from the physical world such as direction of a moving stimulus in the activity of a neuron such as its firing rate. On the other hand, information decoding is a reverse process to interpret from neuron activity to electrical signal for actuators such as muscle or motor.
How the brain encodes information is to think of two spaces: the physical space and neural space. The physical space can be the physical properties of objects, such as color, speed, and temperature. Neural space consists of properties of a neuron, such as firing rate in most cases. Due to is simplicity, this coding mechanism was used in early-stage implementations. Besides, binary coding is also used to represent pixel value of an image Meschede, For rate coding, it is inspired by the observation that neurons tend to fire more often for stronger sensory or artificial stimulus.
Scientists usually use a concept in probability theory known as the Poisson process to simulate spike trains that have characteristics close to real neurons. As the most intuitive and simple coding strategy, rate-coding has been adopted by most robotic implementations. For temporal coding, it is motivated by the evidence founded in neuroscience that spike-timing can be remarkably precise and reproducible Gerstner et al.
With this encoding strategy, information is represented with the timing when the spike occurs. However, the underlying mechanism is still not so clear. The aforementioned coding solutions is usually for one single neuron. However, sometime a population of neurons is used as a whole to encode information. This is strongly supported by the brain of living creature, where functions are controlled by one area of neuron populations.
The goal of neural decoding is to characterize how the electrical activity of neurons elicit activity and responses in the brain. The most common used scheme for decoding is rate-based, where stronger neuron activity usually means higher motor speed or force.
In Kaiser et al. Once the neuron model is decided on, the synapse model should be carefully chosen to connect those neurons inside and among the layers of SNNs. By influencing the membrane potentials of each connected neuron, synaptic plasticity was first proposed as a mechanism for learning and memory on the basis of theoretical analysis Hebb, Up to this day, the synaptic plasticity models used for practical implementations are typically very simple.
Based on an input-output relationship between neuronal activity and synaptic plasticity, they are roughly classified into two types, which are rate-based and spike based, that differ in the type of their input variables. The first and most commonly used definition of a firing rate refers to a spike-count average over time Andrew, The rate-based model is a popular approach for converting conventional ANNs into a spiking neural network that can still be trained by backpropagation.
It has been successfully used in many aspects, especially in experiments on the sensory or motor system Adrian, ; Bishop, ; Kubat, ; Kandel et al.
Spike-based learning rules were developed in Gerstner et al. Experiments showed that the synaptic plasticity is influenced by the exact timing of individual spikes, in particular, by their order Markram et al. If a presynaptic spike preceded a postsynaptic spike, a potentiation of the synaptic strength could be observed, while the reversed order caused a depression. In other words, neural inputs that are likely to have contributed to the neurons' excitation are strengthened, while inputs that are less likely to have contributed are weakened.
As for neuro-engineering, STDP has demonstrated to be successfully implemented as the underlying neural learning mechanism in robots and other autonomous systems in both simulated and real environments. In the past, different mathematical models of STDP have been proposed, e. For this work, the weight update rule under STDP as a function of the time difference between pre and postsynaptic spikes was defined as.
The SNN network model resembles the synapse model in that it simulates synaptic interactions among neurons. Typical examples of neural networks consisting of neurons of these types are classified into two general categories:.
As the first and simplest type of network topology, information in feed-forward networks always travels from the input nodes, through hidden nodes if any , to the output nodes and never goes backwards. In the biological nervous system, abstracted feed-forward networks are mainly found to acquire and transmit external information. Therefore, similarly, networks of this type are usually adopted for low-level sensory acquisition in robotic systems, such as vision Perrinet et al.
For example, inspired by the structures and principles of primate visual cortex, Qiao et al. Taking the work from Meschede as an example, a two-layer feed-forward SNN was trained for a lane keeping vehicle.
The control scheme is shown in Figure 4. In this work, the dynamic vision sensors DVS was used to detect the land markers by generating a sequence of events. The learning phase was conducted by repeatedly training and switching the robot from the start positions in the inner and outer lanes. Figure 4. Control architecture of feed-forward SNN. Different from the feed-forward networks, recurrent neural networks RNNs transmit their information with a directed cycle and exhibit dynamic temporal behaviors.
It is worth pointing out that recurrent neural networks are recursive neural networks Wikipedia, d with a certain structure such as a linear chain. Living organisms seem to use this mechanism to process arbitrary sequences of inputs with their internal memory stored inside RNNs. In Rueckert et al. In their finite horizon planning task, the agent spatial position is controlled by nine state neurons.
The context neurons produce spatiotemporal spike patterns that represent high-level goals and context information. In this case, its average firing rate represents the target spatial position at different time step.
They show that the optimal planning policy can be learned using the reward modulated update rule in a network where the state neurons follow winner-take-all WTA dynamics. Due to the probability, in each time step exactly one state neuron is active and encodes the current position of the agent. Their results demonstrated a successful planner trajectory planning task using a recurrent SNN.
Figure 5. Control architecture of recurrent-SNN. A recurrent layer of state neurons is used to control the state of the agent and receives signals from the content population, which decides the target position according to different time step.
Changes in the strength of synaptic connections between neurons are thought to be the physiological basis of learning Vasilaki et al. These changes can either be gated by neuromodulators that encode the presence of reward or inner co-activation among neurons and synapses. In control tasks presented in this section, the network is supposed to learn a function that maps some state input to a control or action output.
When successfully learned, the network is able to perform simple tasks such as wall following, obstacle avoidance, target reaching, lane following, taxi behavior, or food foraging. In most cases, the network input directly comes from the robot's sensors, which range from binary sensors, e.
In other cases, the input can be pre-processed data, e. Similarly, the output can range from one-dimensional, binary behavior control to multi-dimensional continuous output values, e. Initially, solving simulated control tasks was done by manually setting network weights, e. However, this approach is limited to solving simple behavioral tasks such as wall following Wang et al.
Therefore, a variety of training methods for SNNs in control tasks has been researched and published. Instead of focusing on criteria such as field of research, biological plausibility or the specific task, this section is meant to serve as a classification of published algorithms into the basic underlying training mechanisms from a robotics and machine learning perspective.
In the first part of this section, some implementations of SNN control are introduced that use some form of Hebbian-based learning. In the second part, publications are shown that try to bridge the gap between classical reinforcement learning and spiking neural networks.
Finally, some alternative methods on how to train and implement spiking neural networks are discussed. One of the earliest theories in neuroscience explaining the adaption of synaptic efficacies in the brain during the learning process was introduced by Donald Hebb in his book The Organization of Behavior Hebb, Hebbian-based learning rule that rely on the precise timing of pre and post-synaptic spikes play a crucial part in the emergence of highly non-linear functions in SNNs.
Learning based on Hebbs rule has been successfully applied to problems such as input clustering, pattern recognition, source separation, dimensionality reduction, formation of associative memories, or formation of self-organizing maps Hinton and Sejnowski, Furthermore, different biologically plausible learning rules have been used for using Spiking Neural Networks in robot control tasks.
However, as the basic underlying mechanism stays the same, training these networks can be achieved in different ways as follows see Table 1. In the table, the two-wheel vehicle means a vehicle with two active wheels. Because of the absence of direct goals, correction functions or a knowledgeable supervisor, this kind of learning is usually categorized as unsupervised learning Hinton and Sejnowski, Learning based on STDP rule has been successfully applied to many problems such as input clustering, pattern recognition, and spatial navigation and mental exploration of the environment.
Wang et al. Compared with other classical NNs, they demonstrated that SNN needs fewer neurons and is relatively simple. Afterwards, they Wang et al. In a similar research, Arena et al. Their controller allowed the robot to learn high-level sensor features, based on a set of basic reflexes, depending on some low-level sensor inputs by continuously strengthening the association between the unconditioned stimuli contact and target sensors and conditioned stimuli distance and vision sensors.
In non-spiking neural networks, many successes in recent years can be summarized as finding ways to efficiently learn from labeled data. This type of learning, where a neural network mimics a known outcome from given data is called supervised learning Hastie et al. A variety of different neuroscientific studies has shown that this type of learning can also be found in the human brain Knudsen, , e. But despite the extensive exploration of these topics, the exact mechanisms of supervised learning in biological neurons remain unknown.
Accordingly, a simple way of training SNNs for robot control tasks is by providing an external training signal that adjusts the synapses in a supervised learning setting. As shown in Figure 6 , when an external signal is induced into the network as a post-synaptic spike-train, the synapses can adjust their weights, for example, using learning rules such as STDP. After an initial training phase, this will cause the network to mimic the training signal with satisfactory precision.
Even though this approach provides a simple, straight-forward way for training networks, it is dependent on an external controller. Especially for control tasks involving high-dimensional network inputs, this may not be feasible. Figure 6. Supervised Hebbian training of a synapse: The weight of the synapse between pre and post-synaptic neurons, N pre and N post , is adjusted by the timing of the pre-synaptic spike-train s syn and external post-synaptic training signal s train.
Several models have been proposed on how this might work, either by using activity templates to be reproduced Miall and Wolpert, or error signals to be minimized Kawato and Gomi, ; Montgomery et al. In the nervous system, these teaching signals might be provided by sensory feedback or other supervisory neural structures Carey et al.
One of these models that is primarily suitable for single-layer networks is called supervised Hebbian learning SHL. Based on the learing rule derived in 8, a teaching signal is used to train the postsynaptic neuron to fire at target times and to remain silent at other times.
It can be expressed as. Carrillo et al. The spiking cerebellum model is trained by simulating the robotics arm to seven different targets repeatedly.
In contrast to other STDP learning rules, only long-term depression was externally induced by a training signal, which relied on the motor error, namely the difference between the desired and actual state. In a similar experiment, Bouganis and Shanahan trained a single-layer network to control a robotic arm with 4 degrees of freedom in 3D space.
The training signal was computed using an inverse kinematics model of the arm, adjusting the synaptic weights with a symmetric STDP learning rule. More examples can be found in Table 1 with an order by descending year.
Classical conditioning Wikipedia, a refers to a learning procedure in which a biologically potent stimulus e. It will result that the neutral stimulus comes to elicit a response e. In the famous experiment on classical conditioning Pavlov and Anrep, , Pavlov's dog learns to associate an unconditioned stimulus US , in this case food, and a conditioned stimulus CS , a bell, with each other.
While, it is not clear how the high-level stimuli given in his experiment are processed within the brain, the same learning principle can be used for training on a neural level as well. Figure 7. The conditioned stimulus CS firing shortly before its associated US will adjust its weights so that N post will fire even in the absence of US.
Due to the Hebbian learning rule, the synaptic weight is unchanged when the other, unrelated stimulus causes N post to fire.
Following this principle, bio-inspired robots can learn to associate a CS, e. That way, robots can learn to follow the desired behavior based on sensory inputs. Arena et al. In an SNN with two output motor neurons, distance and vision sensors function as CS, while contact and target sensors work as US causing an unconditioned response.
By navigating the robot in a pre-designed enclosed environment, the robot successfully learned to associate the CS and the US together and reach the target without hitting obstacles.
In a similar experiment, Cyr and Boukadoum carried out different classical conditioning tasks in a controlled virtual environment using infrared, ultrasound and visual neurons as CS and vibration neurons as US. A single-layer SNN using proximity sensor data as CS input was then trained in tasks such as obstacle avoidance and target reaching.
Iwadate et al. Jimenez-Romero et al. The robot was able to learn to recognize rewarding and harmful stimuli as well as simple navigation in a simulated environment. Casellato et al. In all tasks, the robot learned to adjust timing and gain of the motor response and successfully reproduced human biological systems acquire, extinguish, and express knowledge in a noisy world.
In order to successfully learn such behavioral tasks, some unconditioned stimulus has to be given for every relevant conditioned stimulus that the robot should learn. This also means that the robot will learn to associate stimuli that are delayed in time. Taken together, using classical conditioning for robot control basically means constructing an external controller that provides unconditioned stimuli for every relevant state input, which may not be feasible in many tasks.
While classical conditioning is concerned with passively associating conditioned and unconditioned stimuli with each other, operant conditioning OC consists of associating stimuli with responses and actively changing behaviors thereafter. Conceptually, operant conditioning involves changing voluntary behaviors and is closely related to reinforcement learning and its agent-environment interaction cycle.
A behavior response is followed by either reinforcement or punishment. Reinforcement following a behavior will cause the behavior to increase, but if behavior is followed by punishment the behavior will decrease. Instead of developing a formal mathematical model, operant conditioning has been mainly researched in biological and psychological domains. Despite advances in the understanding of operant conditioning, it is still not clear how this type of learning is implemented on a neural level.
In this context, Cyr et al. With this simple basic architecture and learning rules such as habituation and STDP, they were able solve simple OC-related tasks in a simulated environment, such as pushing blocks.
In another publication by Dumesnil et al. The RGB camera was used to capture the color information which represented the cue or the reward in the maze environment. Eventually, the robot learned the association, if an action was frequently followed by a reward. In Figure 8 the learning rule for reward-based training is shown.
Using one or more chemicals emitted by a given neuron to regulate diverse populations of neurons is know as neuromodulation Hasselmo, As one of the neuromodulators, dopamine neurons forming the midbrain dopaminergic cell groups are crucial for executive functions, motor control, motivation, reinforcement, and rewards.
Most types of neurological rewards increase the level of dopamine in the brain, thus stimulating the dopamine neurons Schultz, Inspired by dopaminergic neurons in the brain, the effects of STDP events are collected in an eligibility trace and a global reward signal induces synaptic weight changes. In contrast to supervised training as discussed before, rewards can be attributed to stimuli, even if they are delayed in time.
This can be a very useful property for robot control, because it might simplify the requirements of an external training signal leading to more complex tasks. A simple learning rule combing models of STDP and a global reward signal was proposed by Florian and Izhikevich The eligibility trace of a synapse can be defined as,.
C 1 is a constant coefficient. Figure 8. Reward-modulated STDP synapse between N pre and N post : Depending on the post-synaptic output spike-train, a reward r is defined that modulates the weight change of the synapse.
In the literature, a variety of algorithms has been published using this basic learning architecture for training. Even though they are all based on the same mechanism, the rewards can be constructed in different ways. Rewarding Specific Events: The most straight-forward implementation of reward-based learning resembling classical reinforcement learning tasks uses rewards associated to specific events.
Evans trained a simple, single-layer SNN in several food foraging tasks consisting of 4 input sensor neurons and 4 output motor neurons. In a separate network, other reward-related sensor neurons stimulated a dopaminergic neuron that in turn modulated the synaptic weight change. With this simulation setup, the robot was able to learn food-attraction behavior and subsequently unlearn this behavior when the environment changed.
This was achieved by a training stage during which the robot were randomly droved to explore the environment effectively. By shifting the dopamine response from the primary to a secondary stimulus, the robot was able to learn, even with large temporal distance, between correct behavior and reward. Faghihi et al. In a simple task, the simulated fly learned to avoid getting close to an olfactory target emitting electric shocks.
Furthermore, the same behavior can be transferred to a secondary stimulus that is associated to the primary stimulus without emitting electric shocks itself. Control Error Minimization: As opposed to rewarding specific events, dopamine-modulated learning can also be used in an optimization task to minimize an objective function.
This is usually achieved by strengthening or weakening the connections that lead to changes in the objective function based on their eligibility traces.
Clawson et al. The network consisted of lateral state variables as inputs, a hidden layer and an output layer population decoding the lateral control output. Learning is achieved offline by minimizing the error between decoded actual and desired output, which is provided by an external linear controller. Therefore, an indirect approach to training SNNs was shown by Foderaro et al.
This external network was provided with control input as well as feedback signals and trained using a reward-based STDP learning rule. By minimizing the error between control output and optimal control law offline, it was able to learn adaptive control of an aircraft.
Similar ideas were presented by Zhang et al. Metric Minimization: The same principle can also be applied to minimize a global metric that might be easier to construct and calculate than an external controller.
Chadderdon et al. The model consisted of excitatory and 64 inhibitory neurons with proprioceptive inputs cells and output cells controlling the flexor and extensor muscles. A global reward or punishment signal was given depending on the change of hand-target distance during the learning phase, during which the robot was set with five different targets repeatedly.
Neymotin et al. Similarly, Dura-Bernal et al. With proprioceptive sensory input muscle lengths and muscle excitation output, the network was trained by minimizing the hand-target distance. Kocaturk et al. Extracellularly recorded motor cortical neurons provide the network inputs used for prosthetic control. By pressing a button, the user can reward desired movements and guide the prosthetic arm toward a target.
Using a miniaturized microprocessor with resistive crossbar memories implemented on a two-wheeled differential drive robot, Sarim et al. Although, in this case, learning was implemented using if-then rules that relied on distance changes from target and obstacles, it is conceptually identical to reward-modulated learning.
Reinforcing Associations: Chou et al. As in classical conditioning, a dopamine-modulated synaptic plasticity rule was used to reinforce associations between conditioned and unconditioned stimuli. In the previous subsection, a variety of approaches was presented for training SNNs based on Hebbian learning rules.
This was done either by providing a supervised training signal through an external controller or by using a reward-based learning rule with different ways of constructing the reward. The latter type of learning, however, was shown to successfully train SNNs in simple tasks solely based on delayed rewards.
In general, all of these approaches have been trained in tasks that don't require looking very far ahead, as reinforcement learning theories usually do.
In classical reinforcement learning theory, on the other hand, learning to look at multiple steps in advance in a Markov Decision Process MDP is one of the main concerns. Therefore, several algorithms have been published combining SNNs with classical reinforcement learning algorithms. The learning rule in which one looks at one or more steps forward in time was introduced as temporal difference TD learning. Hereby, Potjans et al. Both algorithms were able to learn to navigate in a simple grid-world after some training.
With a similar approach, Nichols et al. In a self-organizing, multi-layered network structure, sensory data coming from distance and orientation sensors was gradually fused into state neurons representing distinct combinations of sensory inputs. On top, each individual state neuron was connected to 3 output motor neurons. By fusing the sensory input into distinct state neurons and connecting them to action neurons, a simplified TD learning rule could be used to set each synaptic weight in the last layer individually, when the robot conducted a trial locomotion.
Performance of this controller was demonstrated in a wall-following task. While these state representations work very well for relatively small state spaces, they are usually bound to fail for larger, high-dimensional state spaces, since the TD method can only obtain the reward in several steps. Therefore, it is less stable and may converge to the wrong solution, especially for high-dimensional state spaces.
In fact, these approaches can conceptually be seen as an SNN implementation of table-based Q-learning. Although for robot control tasks, such as those shown in this paper, model-free reinforcement learning methods seem favorable, two recent publications are at least worth mentioning that presented SNN implementations of model-based reinforcement learning algorithms. Rueckert et al. Friedrich and Lengyel implemented a biologically realistic network of spiking neurons for decision making.
The network uses local plasticity rules to solve one-step as well as sequential decision making tasks, which mimics the neural responses recorded in frontal cortices during the execution of such similar tasks.
Their model reproduced behavioral and neuro-physiological data on tasks ranging from simple binary choice to multi-step sequential decision making. They took a two-step maze navigation task as an illustration. During each state, the rat was rewarded with different values according to its actions. The reward was modeled as an external stimuli.
The SNN learned a stable policy within 10 ms. Except for the two aforementioned major methods, there are also other training methods for SNNs in robot control tasks as follows see Table 2. In nature, evolution has produced a multitude of organisms in all kinds of shapes with survival strategies optimally aligned to environmental conditions.
Based on these ideas, a class of algorithms has been developed for finding problem solutions by mimicking elementary natural processes called evolutionary algorithms Michalewicz, Generally, evolutionary processes can be understood as some form of gradient-descent optimization. Therefore, a typical problem using these algorithms is getting stuck in local minima. In applications in robot control, evolving SNNs have been shown to work well in mostly static environments.
Due to the training principle of trial and error, there are usually difficulties in dynamically changing environments. Floreano and Mattiussi showed a vision-based controller in an irregularly textured environment that navigated without hitting obstacles. The predefined SNN consisted of 18 sensory-input receptors connected to 10 fully-connected hidden neurons and 2 motor-output neurons.
Using static synaptic weight values, the algorithm was used to search the space of connectivity by genetically evolving only signs of weights excitatory and inhibitory , when the robot was continuously driving around in the experiment setup.
With a population of 60 individuals, fitness was evaluated by summing up over motor speeds at every time step, and new generations were created using one-point crossover, bit mutation and elitism.
Hagras et al. They were able to evolve good SNN controllers in a small number of generations in a wall-following scenario. Howard and Elfes presented a quadrotor neurocontroller that performed a hovering task in challenging wind conditions. With a feed-forward network taking the differences between current position and target position as input and pitch, roll and thrust as output, weights and topology were evolved to minimize the spatial error. In a target-reaching and obstacle-avoidance task using binocular light sensors and proximity sensors, Batllori et al.
Markowska and Koldowski used a feed-forward network architecture of predefined size to control a toy car. Based on speed, localization and road boarder input signals, the network controlled speed regulation and turn direction, and evolved its weights using a genetic algorithm. Alnajjar and Murase formulated a synaptic learning rule that enforced connections between neurons depending on their activities.
During the learning phase, the robot gradually organized the network and the obstacle avoidance behavior was formed. With this self-organization algorithm that resembles other Hebbian-based learning methods, they were able to learn obstacle avoidance and simple navigation behavior. As a particular kind of SNN, an liquid state machine LSM usually consists of a large assemblage of neurons that receives time-varying input from external sources as well as from other neural units Yamazaki and Tanaka, All mixed and disorderly neuron units are randomly generated and then arranged under the activations of recurrent spatio-temporal patterns of the connections obtained from the time-varying input.
Hence, the LSM is regarded as a large variety of nonlinear functions which is able to compute the output as linear combinations of the input. LSMs seem to be a potential and promising theory to explain brain operation mainly because neuron activities are not hard coded and limited for specific tasks.
Burgsteiner , Probst et al. With the fast development of neuroscience and chip industry, large-scale neuromorphic hardware using spiking neural networks has been studied to achieve the same capabilities as animal brains in terms of speed, efficiency, and mechanism.
For examples, SpiNNaker Furber et al. TruthNorth Merolla et al. Other neuromorphic computing platforms such as Neural Grid Benjamin et al. Meanwhile, a growing number of dynamic simulators has been developed to assist robotic research Ivaldi et al.
Those simulators greatly facilitate the research process that involving mechanical design, virtual sensors simulation, and control architecture. Although adequate tools exist to simulate either spiking neural networks Brette et al.
Some existing platforms are listed in Table 3. It uses a biologically inspired approach to convert the robot's sensory information into spikes that are passed to the neural network simulator, and it decodes output spikes from the network into motor signals that are sent to control the robot.
A more generic system which permits dealing with simulated robotic platforms is AnimatLab Cofer et al. For the first time, it provides scientists with an integrated toolchain to connect pre-defined and customized brain models to detailed simulations of robot bodies and environments in in-silico experiments.
In particular, NRP consists of six key components, which are essential to construct neurorobotics experiments from scratch. It can be seen that the NRP provides a complete framework for the coupled simulation of robots and brain models. The Brain Simulator simulates the brain by bio-inspired learning algorithms such as a spiking neural network to control the robot in a silico neurorobotics experiment. The World Simulator simulates the robots and their interacting environment.
The Closed Loop Engine CLE is responsible for the control logic of experiments as well as for the data communication between different components. The Backend receives requests from the frontend for the neurorobotics experiment and distributes them to the corresponding component, mainly via ROS. The Frontend is a web-based user interface for neurorobotics experiments.
Users are able to design a new experiment or edit existing template experiments. In the previous sections, the state-of-the-art of SNN-based control for various robots has been surveyed in terms of learning methods. Although an increasing amount of work has been done to explore the theoretical foundations and practical implementations of SNNs for robotics control, many related topics need to be investigated, especially in the following areas.
Despite the extensive exploration of the functions and structure of the brain, the exact mechanisms of learning in biological neurons remain unknown. Some of those related to robotics applications are listed as: 1 How is diverse information coded in many neural activities other than the rates and timing of spikes?
As long as we can constantly address these unsolved mysteries of the brain, the robots of the future definitely can achieve more advanced intelligence. Therefore, there is no general design framework that could offer the functionalities of modeling and training, as well as those substantial tools for the conventional ANNs do, for instance, Tensorflow Allaire et al.
The nature of this situation is that training these kind of networks is notoriously difficult, especially when it comes to deep-network architectures. Since error backpropagation mechanisms commonly used in ANNs cannot be directly transferred to SNNs due to non-differentiabilities at spike times, there has been a void of practical learning methods. Moreover, training should strengthen the combination with the burgeoning technologies of reinforcement learning, for instance, extending SNN into deep architecture or generating continuous action space Lillicrap et al.
In the future, combining the R-STDP with a reward-prediction model could lead to an algorithm that is actually capable of solving sequential decision tasks such as MDPs as well.
Another important general issue that needs extensive research and is not clearly defined is how to integrate SNN-based controllers into neuromorphic devices, since they have the potential to offer fundamental improvements in computational capabilities such as speed and lower power consumption Hang et al. These are of vital importance for robot applications, especially in mobile applications where real-time responses are important and energy supply is limited.
An overview of how to program SNNs based on neuromorphic chips can be found Walter et al. SNNs computation can highly benefit from parallel computing, substantially more so than conventional ANNs. Unlike a traditional neuron in rate coding, a spiking neuron does not need to receive weight values from each presynaptic neuron at each compution step. Since at each time step only a few neurons are active in an SNN, the classic bottleneck of message passing is removed.
Moreover, computing the updated state of membrane potential is more complex than computing a weighted sum. Therefore communication time and computation cost are much more well-balanced in SNN parallel implementation as compared to conventional ANNs. Another barrier that needs to be removed comes from a dilemma for the researchers of neuroscience and robotics: Roboticists often use a simplified brain model in a virtual robot to make a real-time simulation, while neuro-scientists develop detailed brain models that are not possible to be embedded into the real world due to their high complexity.
Learning complex sensorimotor mapping of the robot generated in the interaction with dynamic and rich sensory environment is also required Hwu et al. An ongoing solution is the Neurorobotics Platform, which offers adequate tools to model virtual robots, high-fidelity environments, and complex neural network models for both neuroscientists and roboticists. By mimicking the underlying mechanisms of the brain much more realistically, spiking neural networks have showed great potential for achieving advanced robotic intelligence in terms of speed, energy efficiency, and computation capabilities.
Therefore in this article, we seek to offer readers a comprehensive review of the literature about solving robotic control tasks based on SNNs as well as the related modeling and training approaches, and meanwhile offer inspiration to researchers. Specifically, we retrospect the biological evidences of SNNs and their major impetuses for being adopted for the area of robotics at the beginning. Then, we present the mainstream modeling approaches for designing SNNs in terms of neuron, synapse, and network.
The learning solutions of SNNs are generally classified into two types based on Hebbian rule and reinforcement learning, illustrated and expounded with exhaustive robotic-related examples and summary tables.
Finally, some popular interfaces or platforms for simulating SNNs for robotics are preliminarily investigated. As indicated in the open topics, the biggest challenge for control tasks based on SNNs is a lack of a universal training method, as back-propagation is to the conventional ANNs. Therefore, more knowledge and interactions from the fields of neuroscience and robotics are needed to explore this area in the future.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Adrian, E. The impulses produced by sensory nerve endings. Google Scholar. Allaire, J. Allard, J. Alnajjar, F. Ambrosano, A. American Association for the Advancement of Science Booklet Brain-inspired intelligent robotics: the intersection of robotics and neuroscience sciences.
Science , CrossRef Full Text. Andrew, A. Spiking neuron models: single neurons, populations, plasticity. Kybernetes 32, 7—8. Arena, E. Motor-skill learning in an insect inspired neuro-computational control system. Arena, P. Learning anticipation via spiking networks: application to navigation control.
IEEE Trans. Neural Netw. Batllori, R. Evolving spiking neural networks for robot control. Bekolay, T. Nengo: a python tool for building large-scale functional brain models. Neuroinformatics Benjamin, B. Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. IEEE , — Bi, G. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type.
PubMed Abstract Google Scholar. Bing, Z. Towards autonomous locomotion: CPG-based control of smooth 3D slithering gait transition of a snake-like robot. Bishop, C. Neural Networks for Pattern Recognition. Bohte, S. Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer rbf networks.
Bouganis, A. Braitenberg, V. Brette, R. Simulation of networks of spiking neurons: a review of tools and strategies. Burgsteiner, H. Burkitt, A. A review of the integrate-and-fire neuron model: I. Carey, M. Instructive signals for motor learning from visual cortical area mt.
Carrillo, R. A real-time spiking cerebellum model for learning robot control. Biosystems 94, 18— Casellato, C. Adaptive robotic control driven by a versatile spiking cerebellar network. Cassidy, A. BioCAS A stochastic method to predict the consequence of arbitrary forms of spike-timing-dependent plasticity. Neural Comput. Chadderdon, G. Reinforcement learning of targeted movement in a spiking neuronal model of motor cortex. Cheung, K. Neuroflow: a general purpose spiking neural network simulation platform using customizable processors.
Chou, T. Learning touch preferences with a tactile robot using dopamine modulated stdp in a model of insular cortex. Chun, M. A two-stage model for multiple target detection in rapid serial visual presentation. Clawson, T. Cofer, D. Animatlab: a 3d graphics environment for neuromechanical simulations.
Methods , — Collobert, R. Cyr, A. Classical conditioning in different temporal constraints: an stdp learning rule for robots controlled by spiking neural networks. Operant conditioning: a minimal components requirement in artificial spiking neurons designed for bio-inspired robot's controller. Action selection and operant conditioning: a neurorobotic implementation. DasGupta, B. Diehl, P. Unsupervised learning of digit recognition using spike-timing-dependent plasticity.
Dong, Y. Drubach, D. The Brain Explained. Dumesnil, E. Dura-Bernal, S. Cortical spiking network interfaced with virtual musculoskeletal arm and robotic arm.
Eliasmith, C. A large-scale model of the functioning brain. Science , — Evans, R. This service is more advanced with JavaScript available. Advertisement Hide. Sensor-Based Reactive Robot Control. Authors Authors and affiliations C. Conference paper. This is a preview of subscription content, log in to check access. CrossRef Google Scholar. Google Scholar.
Luo, M-H. Lin, R. Robotics and Automation, Vol. Luo, M. Systems, Man, and Cybernetics, Vol. Robotics and Automation.
Rembold, P. Wang, S. Lyons, M.
0コメント