Deep Learning is one the last iterations on the evolution of Machine Learning. One evolution that started with something as simple as Linear Regression, and despite all the difficulties and winters that grassed for, it has been an area whose advances have generated a tsunami that has waves that reached into the most varied fields of knowledge, from Image Recognition to Natural Language Processing (NLP). And what was the path that brought us to this moment?

  • 1943: A Logical Calculus of the Ideas Immanent in Nervous Activity
    In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published “A Logical Calculus of the Ideas Immanent in Nervous Activity”, in their paper they gave a simplified model of a neuron, laying the foundations for artificial neural networks.
  • 1957: The Perceptron Algorithm
    Frank Rosenblatt’s, a psychologist, created the Perceptron based in the work of McCulloch-Pits. It is a simplified mathematical model of how neurons work, it takes a set of binary inputs (nearby neurons), multiplies each input by a weight (the synapse strength to each nearby neuron), and thresholds the sum of these weighted inputs to output a 1 if the sum is big enough and otherwise a 0 (in the same way neurons either fire or not).
  • 1969: The First AI Winter
    In 1969 Marvin Minsky and Seymour Papert wrote a book, Perceptrons: an introduction to computational geometry. The book was about the limitations of the Perceptron, and one of the most discussed limitations was the impossibility to learn to calculate the XOR function because it is not linearly separable. In this book, Minsk concluded that Perceptron’s approach to AI was a dead end. It is widely believed, that helped starting the first AI winter, a period in which there was no funding for research or publications.
  • 1974: Backpropagation
    In his book, Minsk alerted to the need to use multi layer perceptrions to solve the XOR problem. But no one had found a way to train multi-layer neural nets until in 1974, Paul Werbos describes training neural networks through back-propagation in his Harvard PhD thesis. In multi-layer neural nets exist hidden layers that can find features and pass them to the next layer instead of using the original data. With back-propagation, we can use calculus to assign some of the blame for any training set mistakes in the output layer to each neuron in the previous hidden layer, and then we can further split up blame if there is another hidden layer, and so on – we back propagate the error. Unfortunately, this publication went unnoticed, and would only be “rediscovered” after the end of the AI Winter in 1980.
  • 1980: The Neocognitron
    Inspired by discover of two types of cells in the primary visual cortex, simple cells and complex cells by David Hubel and Torsten Wiesel in 1959, Fukushima proposes a hierarchical multilayered neural network capable of visual pattern recognition through learning. It introduces the concept of Convolutional Neural Networks. In the neocognitron there are several layers where the cells are connected. Each cell receives inputs connections from cells on the preceding layer. Layers of S-cells (feature extracting cells) and C-cells (processing cells) are arranged alternately in a hierarchical network.
    In S-cells their input connections are variable and are modified through learning. After having finished learning, each S-cell come to respond selectively to a particular feature presented in its receptive field. The features extracted by S-cells are determined during the learning process. Generally speaking, local features, such as edges or lines in particular orientations, are extracted in lower stages. More global features, such as parts of learning patterns, are extracted in higher stages.
    The input connections of C-cells, which come from S-cells of the preceding layer, are fixed and invariable. Each C-cell receives input connections from a group of S-cells that extract the same feature, but from slightly different positions. The C-cell responds if at least one of these S-cells yield an output. C-cells make a blurring operation, because the response of a layer of S-cells is spatially blurred in the response of the succeeding layer of C-cells.


Fig. 1. The Neocognitron (from https://www.learnartificialneuralnetworks.com)


  • 1989: Backpropagation for Convolutional Neural Networks
    Combining the work of Werbos and Fukushima, Yann LeCun in 1989 demonstrated in his publication “Backpropagation Applied to Handwritten Zip Code Recognition” that backpropagation allowed neural nets to be used in the real world.
    The work of LeCun at Bell Labs resulted in a commercial use for check-reading in the mid-90s – he noted the fact that “At some point in the late 1990s, one of these systems was reading 10 to 20% of all the checks in the US.”

CNN’s (Convolutional Neural Networks) did not work well with many layers, because the back propagated error either shrink rapidly or grow out of bounds and the resulting neural net just does not work very well – the “vanishing or exploding gradient problem”.
In sequence of this problem, the AI research went through a period where support vector machines were preferred to neural networks because they had best performances (the second AI Winter).
But thanks to Hinton, Simon Osindero and Yee-Whye Teh, this would be a short living winter, as you will see in the next post.


  1.  Andrey Kurenkov, A brief history of neural nets and deep learning
  2. Scholarpedia, Deep Learning
  3. Nature, The Learning Machines
  4. LeCun, Y, Bengio, Y., Hinton, G.: Deep Learning. Nature. 521, 436–444 (2015)
  5. Nilsson, N. J.: The Quest for Artificial Intelligence. Cambridge University Press, New York (2010)
  6. Scholarpedia, Neocognitron
  7. KDnuggets, Exclusive yann lecun deep-learning facebook
  8. Schmidhuber, J.: Deep Learning in Neural Networks: An Overview. Cornell University (2014)
  9. Scholarpedia, Boltzmann machine