NN: From Zero to Hero, Lecture 1
Overview
Opinionated translation of the first lecture of Andrej Karpathy’s course into a step-by-step guide where you’re encouraged to come up with solutions on your own.
Challenge⌗
Build and train a neural network as binary classificator from scratch.
Questions⌗
- what is neural network from first principles?
- what is backward propagation?
- what is gradient descent?
- how PyTorch works under the hood?
- how to visualize data as graphs?
Prerequisites⌗
- Basic Python
- Basic calculus
Key terms⌗
- GraphViz
- Jupyter
- PyTorch
- Python
- backpropagation
- binary classification
- data science
- data visualization
- expression graph
- gradient descent
- machine learning
- neural networks
Milestones⌗
Preparation⌗
You can setup your workspace any way you want, as long as it provides an interactive environment to execute Python code and view generated images. Our suggestion would be a Jupyter notebook.
Install Python⌗
- install Miniconda3: https://docs.conda.io/en/latest/miniconda.html
- create a project directory:
mkdir my-project && cd my-project
- create local python environment:
conda create --prefix ./envs
- activate it:
conda activate ./envs
Create Jupyter notebook⌗
- install Jupyter:
pip install jupyter
- run Jupyter server:
jupyter notebook
Expression graph⌗
Implement a general system which provides the ability to compose symbolic expressions and do compute over them. This will become the foundation of your neural network.
Create Value abstraction⌗
- it represents a float value
- it supports addition and multiplication with other Values
x = Value(1.0) y = Value(2.0) z = Value(3.0) (x + y) * z # Value(9.0)
- non-leaf values store its operation and arguments
x = Value(1.0) y = Value(2.0) z = x + y # Value(3.0, op=<addition>, args=<x and y>)
Visualize the resulting expression graph⌗
- install GraphViz:
pip install graphviz
- import relevant class:
from graphviz import Digraph
- draw the expression graph:
-
create graph object
dot = Digraph()
-
add Value nodes to graph
dot.node( name, # unique node identifier label, # contents of the node, format depends on the shape shape, # visual shape of the nod ) # dot.node(name='a', label=f'{{ a | {a:.4f} }}', shape='record')
-
connect argument nodes to the output nodes
dot.edge( name_from, # source node name name_to, # destination node name ) # dot.edge('a', 'b')
-
given expression
(x + y) * z
- your graph can look like this
- or like this
-
Implement gradient calculation⌗
- gradient is a partial derivative of the final expression
with respect to the current expression
x = Value(1.0) y = Value(2.0) z = Value(3.0) u = x + y v = u * z
\begin{align} \text{grad}(v) = \frac{dv}{dv} &= 1\\[5pt] \text{grad}(u) = \frac{dv}{du} &= \frac{d(u \cdot z)}{du}=z\\[5pt] \text{grad}(x) = \frac{dv}{dx} &= \frac{dv}{du} \cdot \frac{du}{dx} = z \cdot \frac{d(x + y)}{dx} = z \cdot 1 = z \end{align}
- implement
backward()
method which computes the gradients of the whole graph when called on the root node - hints:
- when creating a new Value as a result of some operation, define
self._backward
lambda which updates gradients of the argument nodes - consider a case when some Value is used twice
- when creating a new Value as a result of some operation, define
Implement more operations⌗
- subtraction:
x - y = x + (y * -1)
- power:
x**k
wherek
is a constant (not a Value) - division:
x/y = x * (y**-1)
- exp:
x.exp()
- tanh:
x.tanh()
Create and test expression graph for a single neuron⌗
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
b = Value(6.8814, label='b')
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
o = n.tanh(); o.label = 'o'
o.backward()
# ==== expected gradients ====
# x1.grad = -1.5
# w1.grad = 1.0
# x2.grad = 0.5
# w2.grad = 0.0
Neural net⌗
Implement a multi-layer neural network and test it on some data. This abstraction is the gist of the modern machine learning approaches and will help you better understand more advanced techniques.
Create Neuron abstraction⌗
- it is defined by a list of weights + bias
- it is callable with a list of input values, producing a squashed output: \[ neuron([x_1, \ldots, x_n]) = tanh(\sum{w_i x_i} + b) \]
Create Layer abstraction⌗
- it is defined by a list of neurons
- it is callable with a list of inputs, producing a list of neuron outputs: \[ layer([x_1, \ldots, x_n]) = [n_j([x_1, \ldots, x_n]) \,|\, n_j \in layer] \]
Create MLP (Multi-Layer Perceptron) abstraction⌗
- it is defined by a list of layers
- it is callable with a list of inputs, producing a list of outputs of the last layer: \[ mlp([x_1, \ldots, x_n]) = mlp’(l_1([x_1, \ldots, x_n])) = \ldots = [y_1, \ldots, y_m] \]
- for convenience, if the last layer consists only of one neuron return it instead of a list
Create a test dataset for binary classification⌗
- define some sample data, e.g.
# sets of inputs xs = [ [2.0,3.0,-1.0], [3.0,-1.0,0.5], [0.5,1.0,1.0], [1.0,1.0,-1.0], ] # ground truth (aka expected) outputs ys_gt = [1.0,-1.0,-1.0,1.0]
- run your MLP on it
# predicted (aka actual) outputs ys_pred = [mlp(x) for x in xs] # [Value(-0.79),Value(-0.29),Value(0.65),Value(0.23)]
Compute the loss⌗
- it indicates how good is the MLP prediction
- there are different loss functions, but we will use Mean Squared Error (MSE)
\[ loss = \sum_j(y_{pred}^j - y_{gt}^j)^2 \]
Update MLP parameters⌗
- add
parameters()
method to MLP which returns the list of all weights and biases - compute the gradients starting from the loss
- update parameters to decrease the loss
- hint: nudge in the opposite direction to the gradient
rate = 0.001 for p in mlp.parameters(): p.data += rate * -p.grad
- hint: nudge in the opposite direction to the gradient
- compute the loss once again and see it getting smaller, which means predictions are getting closer to the ground truth
Create a cycle: Prediction-Loss-Backprop-Update⌗
- iterate N times:
- compute the predictions
- compute the loss
- backprop gradients from the loss
- update MLP parameters
- look at predicted values to see how close they got to the ground truth
Conclusion⌗
You’ve just created, trained and used a real neural network. Even though modern machine learning techniques are different, they have the same core ideas and understanding those will help you to advance in this field. That said, even this simple neural network can already be used for a variety of tasks, e.g. predicting housing prices or recognizing hand-written digits.
Project submission⌗
Please reply to this tweet with a link to the project repository (e.g. on Github) to mark this project as complete and to build up your portfolio.
Self-assessment⌗
- What did you learn?
- How did you like it?
- Do you want to continue with similar projects?
- How would you use acquired skills?
- Do you have an idea for a project which would use these skills?