Course content

Here is the list of topics covered in the course, segmented over 10 weeks. Each week is associated with explanatory video clips and recommended readings.

To ask questions about the course's content or discuss neural networks in general, visit the course's Google group.

Week Content
0 Introduction and math revision


- Suggested readings -
- Other suggested video material -
1 Feedforward neural network

- Videos - Slides
 • Artificial neuron (7:50)
 • Activation function (5:56)
 • Capacity of single neuron (8:05)
 • Multilayer neural network (13:11)
 • Capacity of neural network (8:56)
 • Biological inspiration (14:21)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]

- Companion reading -

- Other suggested readings -
- Other suggested video material -
  • Videos from Andrew Ng's Coursera course, on neural networks: [1] [2] [3] [4] [5] [6] [7]
2 Training neural networks

- Videos - Slides
 • Empirical risk minimization (10:28)
 • Loss function (4:49)
 • Output layer gradient (12:03)
 • Hidden layer gradient (15:15)
 • Activation function derivative (4:37)
 • Parameter gradient (6:26)
 • Backpropagation (15:07)
 • Regularization (13:15)
 • Parameter initialization (6:10)
 • Model selection (13:48)
 • Optimization (23:40)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]

- Companion reading -

- Other suggested readings -
- Other suggested video material -
  • Videos from Andrew Ng's Coursera course, on training neural networks: [1] [2] [3] [4] [5] [6] [7]
3 Conditional random fields

- Videos - Slides
 • Motivation (5:19)
 • Linear chain CRF (9:58)
 • Context window (12:47)
 • Computing the partition function (24:34)
 • Computing marginals (9:08)
 • Performing classification (18:32)
 • Factors, sufficient statistics and linear CRF (11:37)
 • Markov network (11:37)
 • Factor graph (6:28)
 • Belief propagation (24:48)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]

- Companion readings -

- Other suggested readings -
- Other suggested video material -
4 Training CRFs

- Videos - Slides
 • Loss function (5:45)
 • Unary log-factor gradient (13:29)
 • Pairwise log-factor gradient (5:54)
 • Discriminative vs. generative learning (6:44)
 • Maximum-entropy Markov model (8:46)
 • Hidden Markov model (4:17)
 • General conditional random field (6:30)
 • Pseudolikelihood (5:11)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]

- Companion reading -

  • - Other suggested readings -
    - Other suggested video material -
  • 5 Restricted Boltzmann machine

    - Videos - Slides
     • Definition (12:17)
     • Inference (18:33)
     • Free energy (12:54)
     • Contrastive divergence (13:34)
     • Contrastive divergence (parameter update) (11:10)
     • Persistent CD (7:36)
     • Example (8:15)
     • Extensions (9:19)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings -
    - Other suggested video material -
    6 Autoencoders

    - Videos - Slides
     • Definition (6:15)
     • Loss function (11:52)
     • Example (2:54)
     • Linea autoencoder (19:47)
     • Undercomplete vs. overcomplete hidden layer (5:36)
     • Denoising autoencoder (14:16)
     • Contractive autoencoder (12:08)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings -
    7 Deep learning

    - Videos - Slides
     • Motivation (15:12)
     • Difficulty of training (8:24)
     • Unsupervised pre-training (12:52)
     • Example (12:41)
     • Dropout (11:18)
     • Deep autoencoder (7:34)
     • Deep belief network (13:22)
     • Variational bound (14:03)
     • DBN pre-training (20:00)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings -
    - Other suggested video material -
    8 Sparse coding

    - Videos - Slides
     • Definition (12:05)
     • Inference (ISTA algorithm) (12:36)
     • Dictionary update - projected gradient descent (5:04)
     • Dictionary update - block-coordinate descent (13:10)
     • Dictionary learning algorithm (5:31)
     • Online dictionary learning algorithm (9:05)
     • ZCA preprocessing (8:39)
     • Feature extraction (10:43)
     • Relationship wiht V1 (5:46)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings-
    - Other suggested video material -
    9 Computer vision

    - Videos - Slides
     • Motivation (5:25)
     • Local connectivity (4:20)
     • Parameter sharing (11:32)
     • Discrete convolution (15:27)
     • Pooling and subsampling (8:11)
     • Convolutional network (13:58)
     • Object recognition (8:00)
     • Example (14:20)
     • Data set expansion (7:32)
     • Convolutional RBM (10:46)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings -
    - Other suggested video material -
    10 Natural language processing

    - Videos - Slides
     • Motivation (2:16)
     • Preprocessing (9:46)
     • One-hot encoding (7:31)
     • Word representations (10:30)
     • Language modeling (9:23)
     • Neural network language model (16:08)
     • Hierarchical output layer (13:51)
     • Word tagging (10:48)
     • Convolutional network (16:44)
     • Multitask learning (16:03)
     • Recursive network (5:50)
     • Merging representations (3:40)
     • Tree inference (16:51)
     • Recursive network training (13:29)
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]
    [pdf]

    - Companion reading -

    - Other suggested readings -
    - Other suggested video material -