Uncle Josh Scrapes His Knuckles on ML, Pt. 3

So I need to figure out how I’m going to make a neural network, and while I get the examples all over the web, I struggle with the implementation, which probably means I really don’t get them, but I’m happy to blame my lack of understanding of numpy.

I tried to build a simple model, with three inputs, four nodes in a single layer, and then a single output. This means the first set of synapses will have 12 values, and the second layer only four, and then I’ll need 4 weights for the inner layer, and a single weight for the output.

To calculate a single layer, multiply the synapses against the input and then add the weight, or biases, of the nodes, and then do some normalization function across the remaining column vector. I’m using a simple sigmoid function and yes I know there are better functions, but I’d rather understand how to build a working system before I generalize too much.

By that, you can infer that I’m probably generalizing too much already.

To calculate a single layer I get a formula like this:

\sigma \left( \begin{bmatrix}a_{11}& \cdots &a_{1n}\\ \vdots & \ddots & \vdots \\a_{m1} & \cdots & a_{mn}\end{bmatrix} \begin{bmatrix}i_1\\ \vdots \\i_m\end{bmatrix} + \begin{bmatrix}w_1\\ \vdots \\w_m\end{bmatrix} \right)

The result becomes the new inputs to the next group of synapses, so this becomes a rinse-and-repeat method.

And this makes sense to me on paper. It’s the code that’s giving me trouble.

# Basic structure 
input_count = 3
layers = [4]
output_count = 1

np.random.seed(42) # lets have some consistency here

# build some synapses and weights
synapses = []
weights = []

synapses.append(np.random.randint(-10, 10, (layers[0], input_count)))
weights.append(np.random.randint(-5, 5 (1, layers[0])).T)

synapses.append(np.random.randint(-10, 10, (layers[1], output_count)))
weights.append(np.random.randint(-5, 5, (output_count, 1))) 

When I print out the synapses and weights I get:

Synapses
[array([[-4,  9,  4],
        [ 0, -3, -4],
        [ 8,  0,  0],
        [-7, -3, -8]]), 
 array([[  1],
        [ -5],
        [ -9],
        [-10]])]
 Weights
 [array([[ 0],
         [-1],
         [-4],
         [ 2]]), 
  array([[ 4]]])]

Theoretically I should be able to take any column vector input and run it through a function like:

def think(inputs, synapse, weight):
    # ignore sigmoid squishification for now
    return np.matmul(synapse, inputs) + weight)

So if I define inputs as [1 1 1], then I should be able easily confirm the matrix multiplication going on under the hood. Here’s what I get:

Inputs
 [1 1 1]

Thinking about it
[[  9  -7   8 -18]
 [  8  -8   7 -19]
 [  5 -11   4 -22]
 [ 11  -5  10 -16]]

This is so, so wrong. I should end up with a column vector, or maybe a row vector. I’m also a little suspicious that I plugged a 4 by 3 matrix into a 1 by 3 matrix and got a 4 by 4 matrix. If I define my inputs as inputs = np.array([[1, 1, 1]]) I get an error, but inputs = np.array([[1, 1, 1]]).T seems to provide real, useful output.

But the second layer doesn’t work. I’ve added more output lines to help me see what’s going on.

Thinking ...
Synapse shape (4, 3)
Input shape (3, 1)
Weight shape (4, 1)
Result Shape (4, 1)
[[  9]
 [ -8]
 [  4]
 [-16]]

Thinking...
Synapse shape (4, 1)
Input shape (4, 1)
Weights shape (1, 1)
Traceback (most recent call last):
 ...
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, ...

I can see the error in the shapes. I wonder if I set up the last layer of synapses backwards, but even this seems strange. I always get confused regarding operand order in matrix multiplication.

But a few more tweaks of the code, even testing against two outputs, lets me think the form of the code works, assuming I don’t have more than one hidden layer, but I think I can generate those just as well.

Next step: Wrapping this logic up in a class that can save and recall.