Opinionated translation of the first lecture of Andrej Karpathy’s course into a step-by-step guide where you’re encouraged to come up with solutions on your own.

Challenge

Build and train a neural network as binary classificator from scratch.

Questions

  • what is neural network from first principles?
  • what is backward propagation?
  • what is gradient descent?
  • how PyTorch works under the hood?
  • how to visualize data as graphs?

Prerequisites

  • Basic Python
  • Basic calculus

Key terms

  • GraphViz
  • Jupyter
  • PyTorch
  • Python
  • backpropagation
  • binary classification
  • data science
  • data visualization
  • expression graph
  • gradient descent
  • machine learning
  • neural networks

Milestones

Preparation

You can setup your workspace any way you want, as long as it provides an interactive environment to execute Python code and view generated images. Our suggestion would be a Jupyter notebook.

Install Python

Create Jupyter notebook

  • install Jupyter: pip install jupyter
  • run Jupyter server: jupyter notebook

Expression graph

Implement a general system which provides the ability to compose symbolic expressions and do compute over them. This will become the foundation of your neural network.

Create Value abstraction

  • it represents a float value
  • it supports addition and multiplication with other Values
      x = Value(1.0)
      y = Value(2.0)
      z = Value(3.0)
      (x + y) * z # Value(9.0)
    
  • non-leaf values store its operation and arguments
      x = Value(1.0)
      y = Value(2.0)
      z = x + y # Value(3.0, op=<addition>, args=<x and y>)
    

Visualize the resulting expression graph

  • install GraphViz: pip install graphviz
  • import relevant class: from graphviz import Digraph
  • draw the expression graph:
    • create graph object

          dot = Digraph()
      
    • add Value nodes to graph

          dot.node(
              name, # unique node identifier
              label, # contents of the node, format depends on the shape
              shape, # visual shape of the nod
          )
          # dot.node(name='a', label=f'{{ a | {a:.4f} }}', shape='record')
      
    • connect argument nodes to the output nodes

          dot.edge(
              name_from, # source node name
              name_to, # destination node name
          )
          # dot.edge('a', 'b')
      
    • given expression (x + y) * z

      • your graph can look like this
      • or like this

Implement gradient calculation

  • gradient is a partial derivative of the final expression with respect to the current expression
      x = Value(1.0)
      y = Value(2.0)
      z = Value(3.0)
      u = x + y
      v = u * z
    

\begin{align} \text{grad}(v) = \frac{dv}{dv} &= 1\\[5pt] \text{grad}(u) = \frac{dv}{du} &= \frac{d(u \cdot z)}{du}=z\\[5pt] \text{grad}(x) = \frac{dv}{dx} &= \frac{dv}{du} \cdot \frac{du}{dx} = z \cdot \frac{d(x + y)}{dx} = z \cdot 1 = z \end{align}

  • implement backward() method which computes the gradients of the whole graph when called on the root node
  • hints:
    • when creating a new Value as a result of some operation, define self._backward lambda which updates gradients of the argument nodes
    • consider a case when some Value is used twice

Implement more operations

  • subtraction: x - y = x + (y * -1)
  • power: x**k where k is a constant (not a Value)
  • division: x/y = x * (y**-1)
  • exp: x.exp()
  • tanh: x.tanh()

Create and test expression graph for a single neuron

x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
b = Value(6.8814, label='b')
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
o = n.tanh(); o.label = 'o'
o.backward()
# ==== expected gradients ====
# x1.grad = -1.5
# w1.grad = 1.0
# x2.grad = 0.5
# w2.grad = 0.0

Neural net

Implement a multi-layer neural network and test it on some data. This abstraction is the gist of the modern machine learning approaches and will help you better understand more advanced techniques.

Create Neuron abstraction

  • it is defined by a list of weights + bias
  • it is callable with a list of input values, producing a squashed output: \[ neuron([x_1, \ldots, x_n]) = tanh(\sum{w_i x_i} + b) \]

Create Layer abstraction

  • it is defined by a list of neurons
  • it is callable with a list of inputs, producing a list of neuron outputs: \[ layer([x_1, \ldots, x_n]) = [n_j([x_1, \ldots, x_n]) \,|\, n_j \in layer] \]

Create MLP (Multi-Layer Perceptron) abstraction

  • it is defined by a list of layers
  • it is callable with a list of inputs, producing a list of outputs of the last layer: \[ mlp([x_1, \ldots, x_n]) = mlp’(l_1([x_1, \ldots, x_n])) = \ldots = [y_1, \ldots, y_m] \]
  • for convenience, if the last layer consists only of one neuron return it instead of a list

Create a test dataset for binary classification

  • define some sample data, e.g.
      # sets of inputs
      xs = [
          [2.0,3.0,-1.0],
          [3.0,-1.0,0.5],
          [0.5,1.0,1.0],
          [1.0,1.0,-1.0],
      ]
      # ground truth (aka expected) outputs
      ys_gt = [1.0,-1.0,-1.0,1.0]
    
  • run your MLP on it
      # predicted (aka actual) outputs
      ys_pred = [mlp(x) for x in xs]
      # [Value(-0.79),Value(-0.29),Value(0.65),Value(0.23)]
    

Compute the loss

  • it indicates how good is the MLP prediction
  • there are different loss functions, but we will use Mean Squared Error (MSE)

\[ loss = \sum_j(y_{pred}^j - y_{gt}^j)^2 \]

Update MLP parameters

  • add parameters() method to MLP which returns the list of all weights and biases
  • compute the gradients starting from the loss
  • update parameters to decrease the loss
    • hint: nudge in the opposite direction to the gradient
            rate = 0.001
            for p in mlp.parameters():
                p.data += rate * -p.grad
      
  • compute the loss once again and see it getting smaller, which means predictions are getting closer to the ground truth

Create a cycle: Prediction-Loss-Backprop-Update

  • iterate N times:
    • compute the predictions
    • compute the loss
    • backprop gradients from the loss
    • update MLP parameters
  • look at predicted values to see how close they got to the ground truth

Conclusion

You’ve just created, trained and used a real neural network. Even though modern machine learning techniques are different, they have the same core ideas and understanding those will help you to advance in this field. That said, even this simple neural network can already be used for a variety of tasks, e.g. predicting housing prices or recognizing hand-written digits.

Project submission

Please reply to this tweet with a link to the project repository (e.g. on Github) to mark this project as complete and to build up your portfolio.

Self-assessment

  • What did you learn?
  • How did you like it?
  • Do you want to continue with similar projects?
  • How would you use acquired skills?
  • Do you have an idea for a project which would use these skills?

What’s next?