Why does this TensorFlow example not have a summation before the activation function?Does Python have a ternary conditional operator?Understanding Neural Network BackpropagationDoes Python have a string 'contains' substring method?How to update the bias in neural network backpropagation?Why does Python code run faster in a function?How to use the custom neural network function in the MATLAB Neural Network ToolboxHow to get bias and neuron weights in optimizer?Why are different bias values used in different types of layersTensorflow different activation functions for output layerQuestions on tf.layers .dense

Do liquid propellant rocket engines experience thrust oscillation?

How to fix folder structure in Windows 7 and 10

Why are some of the Stunts in The Expanse RPG labelled 'Core'?

How is the problem, ⟨G⟩ in Logspace?

Paradox regarding phase transitions in relativistic systems

Should the pagination be reset when changing the order?

When does removing Goblin Warchief affect its cost reduction ability?

Can Bless or Bardic Inspiration help a creature from rolling a 1 on a death save?

What do solvers like Gurobi and CPLEX do when they run into hard instances of MIP

Resolving moral conflict

Can Northern Ireland's border issue be solved by repartition?

Centrifugal force with Newton's third law?

How is underwater propagation of sound possible?

Why NASA publish all the results/data it gets?

How to make interviewee comfortable interviewing in lounge chairs

Escape the labyrinth!

Pandas aggregate with dynamic column names

Can planetary bodies have a second axis of rotation?

I reverse the source code, you negate the input!

Is there a builtin function to turn selective Echos off?

Is Zack Morris's 'time stop' ability in "Saved By the Bell" a supernatural ability?

Where Does VDD+0.3V Input Limit Come From on IC chips?

What can a pilot do if an air traffic controller is incapacitated?

Circle divided by lines between a blue dots



Why does this TensorFlow example not have a summation before the activation function?


Does Python have a ternary conditional operator?Understanding Neural Network BackpropagationDoes Python have a string 'contains' substring method?How to update the bias in neural network backpropagation?Why does Python code run faster in a function?How to use the custom neural network function in the MATLAB Neural Network ToolboxHow to get bias and neuron weights in optimizer?Why are different bias values used in different types of layersTensorflow different activation functions for output layerQuestions on tf.layers .dense






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.



picture



In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.



Here is an example of one of those snippets:



weights = 
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))



# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer


In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!



Any help would be really great!










share|improve this question


























  • It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

    – EdChum
    Mar 28 at 14:27












  • Okay -- the softmax layer does it. But the other nodes don't do it?

    – echo
    Mar 28 at 14:46











  • No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

    – EdChum
    Mar 28 at 15:01











  • It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

    – echo
    Mar 28 at 15:05

















0















I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.



picture



In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.



Here is an example of one of those snippets:



weights = 
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))



# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer


In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!



Any help would be really great!










share|improve this question


























  • It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

    – EdChum
    Mar 28 at 14:27












  • Okay -- the softmax layer does it. But the other nodes don't do it?

    – echo
    Mar 28 at 14:46











  • No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

    – EdChum
    Mar 28 at 15:01











  • It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

    – echo
    Mar 28 at 15:05













0












0








0








I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.



picture



In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.



Here is an example of one of those snippets:



weights = 
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))



# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer


In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!



Any help would be really great!










share|improve this question
















I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.



picture



In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.



Here is an example of one of those snippets:



weights = 
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))



# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer


In each layer, we first multiply the inputs with a weights. Afterwards, we add the bias term. Then we pass those to the tf.nn.relu. Where does the summation happen? It looks like we've skipped this!



Any help would be really great!







python tensorflow machine-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 15:04







echo

















asked Mar 28 at 14:25









echoecho

1104 bronze badges




1104 bronze badges















  • It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

    – EdChum
    Mar 28 at 14:27












  • Okay -- the softmax layer does it. But the other nodes don't do it?

    – echo
    Mar 28 at 14:46











  • No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

    – EdChum
    Mar 28 at 15:01











  • It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

    – echo
    Mar 28 at 15:05

















  • It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

    – EdChum
    Mar 28 at 14:27












  • Okay -- the softmax layer does it. But the other nodes don't do it?

    – echo
    Mar 28 at 14:46











  • No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

    – EdChum
    Mar 28 at 15:01











  • It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

    – echo
    Mar 28 at 15:05
















It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

– EdChum
Mar 28 at 14:27






It's done by softmax as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

– EdChum
Mar 28 at 14:27














Okay -- the softmax layer does it. But the other nodes don't do it?

– echo
Mar 28 at 14:46





Okay -- the softmax layer does it. But the other nodes don't do it?

– echo
Mar 28 at 14:46













No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

– EdChum
Mar 28 at 15:01





No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer

– EdChum
Mar 28 at 15:01













It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

– echo
Mar 28 at 15:05





It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.

– echo
Mar 28 at 15:05












2 Answers
2






active

oldest

votes


















1
















The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).



Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);



x = [2,3,1]
y = [3,
1,
2]



Then the result would be:



tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11



There you can see the weighted sum.



p.s: tf.multiply performs element-wise multiplication, which is not what we want here.






share|improve this answer
































    2
















    The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
    function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).



    Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.



    Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is



    enter image description here



    and first column of W is



    enter image description here



    The result of this dot product is given by



    enter image description here



    which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.






    share|improve this answer



























    • I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

      – echo
      Mar 28 at 15:04












    • @echo I've updated my answer

      – Vlad
      Mar 28 at 15:13













    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400041%2fwhy-does-this-tensorflow-example-not-have-a-summation-before-the-activation-func%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1
















    The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).



    Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);



    x = [2,3,1]
    y = [3,
    1,
    2]



    Then the result would be:



    tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11



    There you can see the weighted sum.



    p.s: tf.multiply performs element-wise multiplication, which is not what we want here.






    share|improve this answer





























      1
















      The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).



      Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);



      x = [2,3,1]
      y = [3,
      1,
      2]



      Then the result would be:



      tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11



      There you can see the weighted sum.



      p.s: tf.multiply performs element-wise multiplication, which is not what we want here.






      share|improve this answer



























        1














        1










        1









        The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).



        Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);



        x = [2,3,1]
        y = [3,
        1,
        2]



        Then the result would be:



        tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11



        There you can see the weighted sum.



        p.s: tf.multiply performs element-wise multiplication, which is not what we want here.






        share|improve this answer













        The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).



        Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);



        x = [2,3,1]
        y = [3,
        1,
        2]



        Then the result would be:



        tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11



        There you can see the weighted sum.



        p.s: tf.multiply performs element-wise multiplication, which is not what we want here.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 28 at 15:50









        WhynoteWhynote

        4744 silver badges5 bronze badges




        4744 silver badges5 bronze badges


























            2
















            The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
            function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).



            Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.



            Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is



            enter image description here



            and first column of W is



            enter image description here



            The result of this dot product is given by



            enter image description here



            which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.






            share|improve this answer



























            • I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

              – echo
              Mar 28 at 15:04












            • @echo I've updated my answer

              – Vlad
              Mar 28 at 15:13















            2
















            The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
            function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).



            Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.



            Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is



            enter image description here



            and first column of W is



            enter image description here



            The result of this dot product is given by



            enter image description here



            which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.






            share|improve this answer



























            • I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

              – echo
              Mar 28 at 15:04












            • @echo I've updated my answer

              – Vlad
              Mar 28 at 15:13













            2














            2










            2









            The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
            function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).



            Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.



            Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is



            enter image description here



            and first column of W is



            enter image description here



            The result of this dot product is given by



            enter image description here



            which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.






            share|improve this answer















            The last layer of your model out_layer outputs probabilities of each class Prob(y=yi|X) and has shape [batch_size, n_classes]. To calculate these probabilities the softmax
            function is applied. For each single input data point x that your model receives it outputs a vector of probabilities y of size number of classes. You then pick the one that has highest probability by applying argmax on the output vector class=argmax(P(y|x)) which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1).



            Consider network with a single layer. You have input matrix X of shape [n_samples, x_dimension] and you multiply it by some matrix W that has shape [x_dimension, model_output]. The summation that you're talking about is dot product between the row of matrix X and column of matrix W. The output will then have shape [n_samples, model_output]. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.



            Mathematically, the layer without bias can be described as enter image description here and suppose that the first row of matrix enter image description here (the first row is a single input data point) is



            enter image description here



            and first column of W is



            enter image description here



            The result of this dot product is given by



            enter image description here



            which is your summation. You repeat this for each column in matrix W and the result is vector of size model_output (which correspond to the number of columns in W). To this vector you add bias (if needed) and then apply activation.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 28 at 15:46

























            answered Mar 28 at 15:03









            VladVlad

            3,9465 gold badges14 silver badges31 bronze badges




            3,9465 gold badges14 silver badges31 bronze badges















            • I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

              – echo
              Mar 28 at 15:04












            • @echo I've updated my answer

              – Vlad
              Mar 28 at 15:13

















            • I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

              – echo
              Mar 28 at 15:04












            • @echo I've updated my answer

              – Vlad
              Mar 28 at 15:13
















            I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

            – echo
            Mar 28 at 15:04






            I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.

            – echo
            Mar 28 at 15:04














            @echo I've updated my answer

            – Vlad
            Mar 28 at 15:13





            @echo I've updated my answer

            – Vlad
            Mar 28 at 15:13


















            draft saved

            draft discarded















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400041%2fwhy-does-this-tensorflow-example-not-have-a-summation-before-the-activation-func%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript