Binary Crossentropy to penalize all components of one-hot vector The Next CEO of Stack OverflowHow to choose cross-entropy loss in tensorflow?Tensorflow sigmoid and cross entropy vs sigmoid_cross_entropy_with_logitsWhat are advantages of Artificial Neural Networks over Support Vector Machines?Keras: binary_crossentropy & categorical_crossentropy confusionTensorflow loss calculation for multiple positive classificationsAbout tf.nn.softmax_cross_entropy_with_logits_v2Sigmoid activation for multi-class classification?Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectivelykeras categorical and binary crossentropyChannel wise CrossEntropyLoss for image segmentation in pytorchTraining multiclass NN in Keras using binary cross-entropy gives higher score than using categorical cross-entropydifference between categorical and binary cross entropy
How should I support this large drywall patch?
Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?
Why did we only see the N-1 starfighters in one film?
Is micro rebar a better way to reinforce concrete than rebar?
Written every which way
Contours of a clandestine nature
Does it take more energy to get to Venus or to Mars?
Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?
Why do remote companies require working in the US?
Elegant way to replace substring in a regex with optional groups in Python?
In excess I'm lethal
What exact does MIB represent in SNMP? How is it different from OID?
Workaholic Formal/Informal
Why does standard notation not preserve intervals (visually)
Why do we use the plural of movies in this phrase "We went to the movies last night."?
Why do airplanes bank sharply to the right after air-to-air refueling?
Rotate a column
WOW air has ceased operation, can I get my tickets refunded?
Why has the US not been more assertive in confronting Russia in recent years?
Would a galaxy be visible from outside, but nearby?
How did the Bene Gesserit know how to make a Kwisatz Haderach?
How do we know the LHC results are robust?
What was the first Unix version to run on a microcomputer?
Novel about a guy who is possessed by the divine essence and the world ends?
Binary Crossentropy to penalize all components of one-hot vector
The Next CEO of Stack OverflowHow to choose cross-entropy loss in tensorflow?Tensorflow sigmoid and cross entropy vs sigmoid_cross_entropy_with_logitsWhat are advantages of Artificial Neural Networks over Support Vector Machines?Keras: binary_crossentropy & categorical_crossentropy confusionTensorflow loss calculation for multiple positive classificationsAbout tf.nn.softmax_cross_entropy_with_logits_v2Sigmoid activation for multi-class classification?Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectivelykeras categorical and binary crossentropyChannel wise CrossEntropyLoss for image segmentation in pytorchTraining multiclass NN in Keras using binary cross-entropy gives higher score than using categorical cross-entropydifference between categorical and binary cross entropy
I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.
Further, it is clear for me what softmax is.
Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.
But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?
Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy
= sum(label * -log(pred)) //just consider the 1-label
= 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
= sum(- label * log(pred) - (1 - label) * log(1 - pred))
= 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
= 0.887
I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:
target class zero 0 -> [1 0]
target class one 1 -> [0 1]
In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?
In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.
machine-learning classification multilabel-classification one-hot-encoding cross-entropy
add a comment |
I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.
Further, it is clear for me what softmax is.
Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.
But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?
Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy
= sum(label * -log(pred)) //just consider the 1-label
= 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
= sum(- label * log(pred) - (1 - label) * log(1 - pred))
= 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
= 0.887
I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:
target class zero 0 -> [1 0]
target class one 1 -> [0 1]
In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?
In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.
machine-learning classification multilabel-classification one-hot-encoding cross-entropy
add a comment |
I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.
Further, it is clear for me what softmax is.
Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.
But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?
Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy
= sum(label * -log(pred)) //just consider the 1-label
= 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
= sum(- label * log(pred) - (1 - label) * log(1 - pred))
= 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
= 0.887
I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:
target class zero 0 -> [1 0]
target class one 1 -> [0 1]
In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?
In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.
machine-learning classification multilabel-classification one-hot-encoding cross-entropy
I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.
Further, it is clear for me what softmax is.
Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.
But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?
Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy
= sum(label * -log(pred)) //just consider the 1-label
= 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
= sum(- label * log(pred) - (1 - label) * log(1 - pred))
= 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
= 0.887
I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:
target class zero 0 -> [1 0]
target class one 1 -> [0 1]
In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?
In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.
machine-learning classification multilabel-classification one-hot-encoding cross-entropy
machine-learning classification multilabel-classification one-hot-encoding cross-entropy
edited Nov 13 '17 at 15:47
Maxim
32.9k2281132
32.9k2281132
asked May 23 '17 at 14:55
hallo02hallo02
928
928
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.
But why, can't or shouldn't I use binary crossentropy on a one-hot vector?
What you compute is binary cross-entropy of 4 independent features:
pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]
The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.
In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.
See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44138324%2fbinary-crossentropy-to-penalize-all-components-of-one-hot-vector%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.
But why, can't or shouldn't I use binary crossentropy on a one-hot vector?
What you compute is binary cross-entropy of 4 independent features:
pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]
The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.
In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.
See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.
add a comment |
See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.
But why, can't or shouldn't I use binary crossentropy on a one-hot vector?
What you compute is binary cross-entropy of 4 independent features:
pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]
The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.
In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.
See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.
add a comment |
See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.
But why, can't or shouldn't I use binary crossentropy on a one-hot vector?
What you compute is binary cross-entropy of 4 independent features:
pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]
The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.
In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.
See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.
See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.
But why, can't or shouldn't I use binary crossentropy on a one-hot vector?
What you compute is binary cross-entropy of 4 independent features:
pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]
The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.
In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.
See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.
answered Nov 13 '17 at 13:53
MaximMaxim
32.9k2281132
32.9k2281132
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44138324%2fbinary-crossentropy-to-penalize-all-components-of-one-hot-vector%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown