 {"id":173,"date":"2019-08-17T18:09:52","date_gmt":"2019-08-18T01:09:52","guid":{"rendered":"https:\/\/joshuarenglish.com\/blog\/?p=173"},"modified":"2019-08-19T21:28:30","modified_gmt":"2019-08-20T04:28:30","slug":"uncle-josh-scrapes-his-knuckles-on-ml-pt-3","status":"publish","type":"post","link":"https:\/\/joshuarenglish.com\/blog\/2019\/08\/17\/uncle-josh-scrapes-his-knuckles-on-ml-pt-3\/","title":{"rendered":"Uncle Josh Scrapes His Knuckles on ML, Pt. 3"},"content":{"rendered":"\n<p>So I need to figure out how I&#8217;m going to make a neural network, and while I get the examples all over the web, I struggle with the implementation, which probably means I really don&#8217;t get them, but I&#8217;m happy to blame my lack of understanding of numpy. <\/p>\n\n\n\n<p>I tried to build a simple model, with three inputs, four nodes in a single layer, and then a single output. This means the first set of synapses will have 12 values, and the second layer only four, and then I&#8217;ll need 4 weights for the inner layer, and a single weight for the output.<\/p>\n\n\n\n<p>To calculate a single layer, multiply the synapses against the input and then add the weight, or biases, of the nodes, and then do some normalization function across the remaining column vector. I&#8217;m using a simple sigmoid function and yes I know there are better functions, but I&#8217;d rather understand how to build a working system before I generalize too much.<\/p>\n\n\n\n<p class=\"has-text-align-right\">By that, you can infer that I&#8217;m probably generalizing too much already.<\/p>\n\n\n\n<p>To calculate a single layer I get a formula like this:<\/p>\n\n\n\n<p class=\"has-text-align-center\"><span class=\"katex-eq\" data-katex-display=\"false\"> \\sigma \\left( \\begin{bmatrix}a_{11}&amp; \\cdots &amp;a_{1n}\\\\ \\vdots &amp; \\ddots &amp; \\vdots \\\\a_{m1} &amp; \\cdots &amp; a_{mn}\\end{bmatrix}  \\begin{bmatrix}i_1\\\\ \\vdots \\\\i_m\\end{bmatrix} + \\begin{bmatrix}w_1\\\\ \\vdots \\\\w_m\\end{bmatrix} \\right) <\/span><\/p>\n\n\n\n<p>The result becomes the new inputs to the next group of synapses, so this becomes a rinse-and-repeat method.<\/p>\n\n\n\n<p>And this makes sense to me <em>on paper<\/em>. It&#8217;s the code that&#8217;s giving me trouble.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Basic structure \ninput_count = 3\nlayers = [4]\noutput_count = 1\n\nnp.random.seed(42) # lets have some consistency here\n\n# build some synapses and weights\nsynapses = []\nweights = []\n\nsynapses.append(np.random.randint(-10, 10, (layers[0], input_count)))\nweights.append(np.random.randint(-5, 5 (1, layers[0])).T)\n\nsynapses.append(np.random.randint(-10, 10, (layers[1], output_count)))\nweights.append(np.random.randint(-5, 5, (output_count, 1))) \n<\/pre>\n\n\n\n<p>When I print out the synapses and weights I get:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Synapses\n[array([[-4,  9,  4],\n        [ 0, -3, -4],\n        [ 8,  0,  0],\n        [-7, -3, -8]]), \n array([[  1],\n        [ -5],\n        [ -9],\n        [-10]])]\n Weights\n [array([[ 0],\n         [-1],\n         [-4],\n         [ 2]]), \n  array([[ 4]]])]<\/pre>\n\n\n\n<p>Theoretically I should be able to take any column vector input and run it through a function like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def think(inputs, synapse, weight):\n    # ignore sigmoid squishification for now\n    return np.matmul(synapse, inputs) + weight)<\/pre>\n\n\n\n<p>So if I define inputs as <span class=\"katex-eq\" data-katex-display=\"false\"> [1 1 1]<\/span>, then I should be able easily confirm the matrix multiplication going on under the hood.  Here&#8217;s what I get:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Inputs\n [1 1 1]\n\nThinking about it\n[[  9  -7   8 -18]\n [  8  -8   7 -19]\n [  5 -11   4 -22]\n [ 11  -5  10 -16]]<\/pre>\n\n\n\n<p>This is so, so wrong. I should end up with a column vector, or maybe a row vector.  I&#8217;m also a little suspicious that I plugged a 4 by 3 matrix into a 1 by 3 matrix and got a 4 by 4 matrix.  If I define my inputs as <code>inputs = np.array([[1, 1, 1]]) <\/code>I get an error, but <code>inputs = np.array([[1, 1, 1]]).T<\/code> seems to provide real, useful output.<\/p>\n\n\n\n<p>But the second layer doesn&#8217;t work. I&#8217;ve added more output lines to help me see what&#8217;s going on.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Thinking ...\nSynapse shape (4, 3)\nInput shape (3, 1)\nWeight shape (4, 1)\nResult Shape (4, 1)\n[[  9]\n [ -8]\n [  4]\n [-16]]\n\nThinking...\nSynapse shape (4, 1)\nInput shape (4, 1)\nWeights shape (1, 1)\nTraceback (most recent call last):\n ...\nValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, ...<\/pre>\n\n\n\n<p>I can see the error in the shapes.  I wonder if I set up the last layer of synapses backwards, but even this seems strange. I always get confused regarding operand order in matrix multiplication.<\/p>\n\n\n\n<p>But a few more tweaks of the code, even testing against two outputs, lets me think the form of the code works, assuming I don&#8217;t have more than one hidden layer, but I think I can generate those just as well.<\/p>\n\n\n\n<p>Next step: Wrapping this logic up in a class that can save and recall.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So I need to figure out how I&#8217;m going to make a neural network, and while I get the examples all over the web, I struggle with the implementation, which probably means I really don&#8217;t get them, but I&#8217;m happy to blame my lack of understanding of numpy. I tried to build a simple model, &hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","_share_on_mastodon":"0"},"categories":[34],"tags":[91,95,92],"class_list":["post-173","post","type-post","status-publish","format-standard","hentry","category-boxes-that-go-bing","tag-machine-learning","tag-numpy","tag-python"],"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/posts\/173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/comments?post=173"}],"version-history":[{"count":5,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/posts\/173\/revisions"}],"predecessor-version":[{"id":189,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/posts\/173\/revisions\/189"}],"wp:attachment":[{"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/media?parent=173"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/categories?post=173"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/joshuarenglish.com\/blog\/wp-json\/wp\/v2\/tags?post=173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}