Why does this TensorFlow example not have a summation before the activation function?Does Python have a ternary conditional operator?Understanding Neural Network BackpropagationDoes Python have a string 'contains' substring method?How to update the bias in neural network backpropagation?Why does Python code run faster in a function?How to use the custom neural network function in the MATLAB Neural Network ToolboxHow to get bias and neuron weights in optimizer?Why are different bias values used in different types of layersTensorflow different activation functions for output layerQuestions on tf.layers .dense
Do liquid propellant rocket engines experience thrust oscillation?
How to fix folder structure in Windows 7 and 10
Why are some of the Stunts in The Expanse RPG labelled 'Core'?
How is the problem, ⟨G⟩ in Logspace?
Paradox regarding phase transitions in relativistic systems
Should the pagination be reset when changing the order?
When does removing Goblin Warchief affect its cost reduction ability?
Can Bless or Bardic Inspiration help a creature from rolling a 1 on a death save?
What do solvers like Gurobi and CPLEX do when they run into hard instances of MIP
Resolving moral conflict
Can Northern Ireland's border issue be solved by repartition?
Centrifugal force with Newton's third law?
How is underwater propagation of sound possible?
Why NASA publish all the results/data it gets?
How to make interviewee comfortable interviewing in lounge chairs
Escape the labyrinth!
Pandas aggregate with dynamic column names
Can planetary bodies have a second axis of rotation?
I reverse the source code, you negate the input!
Is there a builtin function to turn selective Echos off?
Is Zack Morris's 'time stop' ability in "Saved By the Bell" a supernatural ability?
Where Does VDD+0.3V Input Limit Come From on IC chips?
What can a pilot do if an air traffic controller is incapacitated?
Circle divided by lines between a blue dots
Why does this TensorFlow example not have a summation before the activation function?
Does Python have a ternary conditional operator?Understanding Neural Network BackpropagationDoes Python have a string 'contains' substring method?How to update the bias in neural network backpropagation?Why does Python code run faster in a function?How to use the custom neural network function in the MATLAB Neural Network ToolboxHow to get bias and neuron weights in optimizer?Why are different bias values used in different types of layersTensorflow different activation functions for output layerQuestions on tf.layers .dense
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights =
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights
. Afterwards, we add the bias
term. Then we pass those to the tf.nn.relu
. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
python tensorflow machine-learning
add a comment
|
I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights =
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights
. Afterwards, we add the bias
term. Then we pass those to the tf.nn.relu
. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
python tensorflow machine-learning
It's done bysoftmax
as far as I understand, it's the equivalent ofsoftmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05
add a comment
|
I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights =
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights
. Afterwards, we add the bias
term. Then we pass those to the tf.nn.relu
. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
python tensorflow machine-learning
I'm trying to understand a TensorFlow code snippet. What I've been taught is that we sum all the incoming inputs and then pass them to an activation function. Shown in the picture below is a single neuron. Notice that we compute a weighted sum of the inputs and THEN compute the activation.
In most examples of the multi-layer perceptron, they don't include the summation step. I find this very confusing.
Here is an example of one of those snippets:
weights =
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
biases =
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_classes]))
# Create model
def multilayer_perceptron(x):
# Hidden fully connected layer with 256 neurons
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
# Hidden fully connected layer with 256 neurons
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.nn.relu(tf.matmul(layer_2, weights['out']) + biases['out'])
return out_layer
In each layer, we first multiply the inputs with a weights
. Afterwards, we add the bias
term. Then we pass those to the tf.nn.relu
. Where does the summation happen? It looks like we've skipped this!
Any help would be really great!
python tensorflow machine-learning
python tensorflow machine-learning
edited Mar 28 at 15:04
echo
asked Mar 28 at 14:25
echoecho
1104 bronze badges
1104 bronze badges
It's done bysoftmax
as far as I understand, it's the equivalent ofsoftmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05
add a comment
|
It's done bysoftmax
as far as I understand, it's the equivalent ofsoftmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05
It's done by
softmax
as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
It's done by
softmax
as far as I understand, it's the equivalent of softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05
add a comment
|
2 Answers
2
active
oldest
votes
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
add a comment
|
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400041%2fwhy-does-this-tensorflow-example-not-have-a-summation-before-the-activation-func%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
add a comment
|
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
add a comment
|
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
The tf.matmul operator performs a matrix multiplication, which means that each element in the resulting matrix is a sum of products (which corresponds exactly to what you describe).
Take a simple example with a row-vector and a column-vector, as would be the case if you had exactly one neuron and an input vector (as per the graphic you shared above);
x = [2,3,1]
y = [3,
1,
2]
Then the result would be:
tf.matmul(x, y) = 2*3 + 3*1 +1*2 = 11
There you can see the weighted sum.
p.s: tf.multiply performs element-wise multiplication, which is not what we want here.
answered Mar 28 at 15:50
WhynoteWhynote
4744 silver badges5 bronze badges
4744 silver badges5 bronze badges
add a comment
|
add a comment
|
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
add a comment
|
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
add a comment
|
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
The last layer of your model out_layer
outputs probabilities of each class Prob(y=yi|X)
and has shape [batch_size, n_classes]
. To calculate these probabilities the softmax
function is applied. For each single input data point x
that your model receives it outputs a vector of probabilities y
of size number of classes. You then pick the one that has highest probability by applying argmax
on the output vector class=argmax(P(y|x))
which can be written in tensorflow as y_pred = tf.argmax(out_layer, 1)
.
Consider network with a single layer. You have input matrix X
of shape [n_samples, x_dimension]
and you multiply it by some matrix W
that has shape [x_dimension, model_output]
. The summation that you're talking about is dot product between the row of matrix X
and column of matrix W
. The output will then have shape [n_samples, model_output]
. On this output you apply activation function (if it is the final layer you probably want softmax). Perhaps the picture that you've shown is a bit misleading.
Mathematically, the layer without bias can be described as and suppose that the first row of matrix (the first row is a single input data point) is
and first column of W
is
The result of this dot product is given by
which is your summation. You repeat this for each column in matrix W
and the result is vector of size model_output
(which correspond to the number of columns in W
). To this vector you add bias (if needed) and then apply activation.
edited Mar 28 at 15:46
answered Mar 28 at 15:03
VladVlad
3,9465 gold badges14 silver badges31 bronze badges
3,9465 gold badges14 silver badges31 bronze badges
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
add a comment
|
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
I updated the question to use the relu activation function at the end of the network. I don't think it should matter what the activation function is.
– echo
Mar 28 at 15:04
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
@echo I've updated my answer
– Vlad
Mar 28 at 15:13
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400041%2fwhy-does-this-tensorflow-example-not-have-a-summation-before-the-activation-func%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It's done by
softmax
as far as I understand, it's the equivalent ofsoftmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)
– EdChum
Mar 28 at 14:27
Okay -- the softmax layer does it. But the other nodes don't do it?
– echo
Mar 28 at 14:46
No I don't think so as this wouldn't make sense, if you sum or perform any kind of aggregation, they stop becoming a layer so you can't feed them to another layer
– EdChum
Mar 28 at 15:01
It does remain a layer. Each individual neuron in a layer takes input and each neuron has to produce a single scalar value.
– echo
Mar 28 at 15:05