Binary Crossentropy to penalize all components of one-hot vector The Next CEO of Stack OverflowHow to choose cross-entropy loss in tensorflow?Tensorflow sigmoid and cross entropy vs sigmoid_cross_entropy_with_logitsWhat are advantages of Artificial Neural Networks over Support Vector Machines?Keras: binary_crossentropy & categorical_crossentropy confusionTensorflow loss calculation for multiple positive classificationsAbout tf.nn.softmax_cross_entropy_with_logits_v2Sigmoid activation for multi-class classification?Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectivelykeras categorical and binary crossentropyChannel wise CrossEntropyLoss for image segmentation in pytorchTraining multiclass NN in Keras using binary cross-entropy gives higher score than using categorical cross-entropydifference between categorical and binary cross entropy

How should I support this large drywall patch?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

Why did we only see the N-1 starfighters in one film?

Is micro rebar a better way to reinforce concrete than rebar?

Written every which way

Contours of a clandestine nature

Does it take more energy to get to Venus or to Mars?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

Why do remote companies require working in the US?

Elegant way to replace substring in a regex with optional groups in Python?

In excess I'm lethal

What exact does MIB represent in SNMP? How is it different from OID?

Workaholic Formal/Informal

Why does standard notation not preserve intervals (visually)

Why do we use the plural of movies in this phrase "We went to the movies last night."?

Why do airplanes bank sharply to the right after air-to-air refueling?

Rotate a column

WOW air has ceased operation, can I get my tickets refunded?

Why has the US not been more assertive in confronting Russia in recent years?

Would a galaxy be visible from outside, but nearby?

How did the Bene Gesserit know how to make a Kwisatz Haderach?

How do we know the LHC results are robust?

What was the first Unix version to run on a microcomputer?

Novel about a guy who is possessed by the divine essence and the world ends?

Binary Crossentropy to penalize all components of one-hot vector

The Next CEO of Stack OverflowHow to choose cross-entropy loss in tensorflow?Tensorflow sigmoid and cross entropy vs sigmoid_cross_entropy_with_logitsWhat are advantages of Artificial Neural Networks over Support Vector Machines?Keras: binary_crossentropy & categorical_crossentropy confusionTensorflow loss calculation for multiple positive classificationsAbout tf.nn.softmax_cross_entropy_with_logits_v2Sigmoid activation for multi-class classification?Softmax activation with cross entropy loss results in the outputs converging to exactly 0 and 1 for both classes, respectivelykeras categorical and binary crossentropyChannel wise CrossEntropyLoss for image segmentation in pytorchTraining multiclass NN in Keras using binary cross-entropy gives higher score than using categorical cross-entropydifference between categorical and binary cross entropy

I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.

Further, it is clear for me what softmax is.

Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.

But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy 
 = sum(label * -log(pred)) //just consider the 1-label
 = 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
 = sum(- label * log(pred) - (1 - label) * log(1 - pred))
 = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
 = 0.887

I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:

target class zero 0 -> [1 0]
target class one 1 -> [0 1]

In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?

In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

add a comment |

I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.

Further, it is clear for me what softmax is.

Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.

But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy 
 = sum(label * -log(pred)) //just consider the 1-label
 = 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
 = sum(- label * log(pred) - (1 - label) * log(1 - pred))
 = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
 = 0.887

I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:

target class zero 0 -> [1 0]
target class one 1 -> [0 1]

In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?

In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

add a comment |

I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.

Further, it is clear for me what softmax is.

Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.

But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy 
 = sum(label * -log(pred)) //just consider the 1-label
 = 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
 = sum(- label * log(pred) - (1 - label) * log(1 - pred))
 = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
 = 0.887

I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:

target class zero 0 -> [1 0]
target class one 1 -> [0 1]

In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?

In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

I understand that binary cross-entropy is the same as categorical cross-entropy in case of two classes.

Further, it is clear for me what softmax is.

Therefore, I see that categorical cross-entropy just penalizes the one component (probability) that should be 1.

But why, can't or shouldn't I use binary cross-entropy on a one-hot vector?

Normal Case for 1-Label-Multiclass-Mutual-exclusivity-classification:
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: categorical crossentropy 
 = sum(label * -log(pred)) //just consider the 1-label
 = 0.523
Why not that?
################
pred = [0.1 0.3 0.2 0.4]
label (one hot) = [0 1 0 0]
costfunction: binary crossentropy
 = sum(- label * log(pred) - (1 - label) * log(1 - pred))
 = 1*-log(0.3)-log(1-0.1)-log(1-0.2)-log(1-0.4)
 = 0.887

I see that in binary cross-entropy the zero is a target class, and corresponds to the following one-hot encoding:

target class zero 0 -> [1 0]
target class one 1 -> [0 1]

In summary: Why do we just calculate/summarize the negative log likelihood for the predicted class. Why don't we penalize the other SHOULD-BE-ZERO-/NOT-THAT-CLASS classes?

In case one uses binary cross-entropy to a one-hot vector. Probabilities to expected zero labels would be penalized too.

machine-learning classification multilabel-classification one-hot-encoding cross-entropy

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

edited Nov 13 '17 at 15:47

Maxim

32.9k2281132

asked May 23 '17 at 14:55

hallo02

928

asked May 23 '17 at 14:55

hallo02

928

asked May 23 '17 at 14:55

hallo02

928

add a comment |

1 Answer
1

active

oldest

votes

See my answer on a similar question. In short, binary cross-entropy formula doesn't make sense for the one-hot vector. It's either possible to apply softmax cross-entropy for two or more classes or use the vector of (independent) probabilities in label, depending on the task.

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]

The model inference predicted that first feature is on with 10% probability, the second feature is on with 30% probability and so on. Target label is interpreted this way: all features are off, except for the second one. Note that [1, 1, 1, 1] is a perfectly valid label as well, i.e. it's not one-hot vector, and pred=[0.5, 0.8, 0.7, 0.1] is a valid prediction, i.e. the sum doesn't have to equal to one.

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44138324%2fbinary-crossentropy-to-penalize-all-components-of-one-hot-vector%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

add a comment |

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

add a comment |

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

But why, can't or shouldn't I use binary crossentropy on a one-hot vector?

What you compute is binary cross-entropy of 4 independent features:

pred = [0.1 0.3 0.2 0.4]
label = [0 1 0 0]

In other words, your computation is valid, but for a completely different problem: multi-label non-exclusive binary classification.

See also the difference between softmax and sigmoid cross-entropy loss functions in tensorflow.

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

answered Nov 13 '17 at 13:53

Maxim

32.9k2281132

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1