Limited range for TensorFlow Universal Sentence Encoder Lite embeddings?feature normalization- advantage of l2 normalizationUpdate only part of the word embedding matrix in TensorflowTensorflow limiting batch size when learning embeddingsHow to use word embeddings for prediction in TensorflowIs there any sentence embedding Tensorflow language model?How to get a random embedding, from an embedding matrix in TensorFlow?word embeddings in tensorflow (no pre_trained)Trying to use Universal Sentence Encoder Lite/2 via Tensorflow ServingTensorflow updating embeddingsTensorflow Embedding for training and inferenceWhat are the 512 dimensional vector in Universal sentence encoder
How did Gollum know Sauron was gathering the Haradrim to make war?
Are language and thought the same?
Can there be plants on the dark side of a tidally locked world?
slowest crash on the Moon?
What happens if I double Meddling Mage's 'enter the battlefield' trigger?
How could it be that the capo isn't changing the pitch?
Time to call the bluff
Why did the Joi advertisement trigger K?
Powering an offset stacked array of pistons
What's the difference between a share and a stock?
Deleting millions of records on SQL Server 14.0
Count rook moves 1D
If I have an accident, should I file a claim with my car insurance company?
Confusion in understanding control system?
What is the significance of 104% for throttle power and rotor speed?
Short story with a first person narrator in a future where racial conflict had exploded into an all out war
Main differences between 5th edition Druid and 3.5 edition Druid
How can I let authenticated users rebuild caches?
What did Boris Johnson mean when he said "extra 34 billion going into the NHS"
What exactly is a softlock?
What does "se jouer" mean here?
How many days for hunting?
How to find better food in airports
Do I need to get a noble in order to win Splendor?
Limited range for TensorFlow Universal Sentence Encoder Lite embeddings?
feature normalization- advantage of l2 normalizationUpdate only part of the word embedding matrix in TensorflowTensorflow limiting batch size when learning embeddingsHow to use word embeddings for prediction in TensorflowIs there any sentence embedding Tensorflow language model?How to get a random embedding, from an embedding matrix in TensorFlow?word embeddings in tensorflow (no pre_trained)Trying to use Universal Sentence Encoder Lite/2 via Tensorflow ServingTensorflow updating embeddingsTensorflow Embedding for training and inferenceWhat are the 512 dimensional vector in Universal sentence encoder
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.
For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:

Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):

To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:
# NEW: added this, with different messages
messages = ["cats are great!", "sometimes models are confusing"]
values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)
with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(
encodings,
feed_dict=input_placeholder.values: values,
input_placeholder.indices: indices,
input_placeholder.dense_shape: dense_shape)
for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: ".format(messages[i]))
print("Embedding size: ".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [, ...]n".format(message_embedding_snippet))
# NEW: added this, to show the range of the embedding output
print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))
And the output shows:
Message: cats are great!
Embedding range: [-0.05904272198677063, 0.05903803929686546]
Message: sometimes models are confusing
Embedding range: [-0.060731519013643265, 0.06075377017259598]
So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.
The colab notebook example code has an example sentence that says:
Universal Sentence Encoder embeddings also support short paragraphs.
There is no hard limit on how long the paragraph is. Roughly, the
longer the more 'diluted' the embedding will be.
But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.
I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.
tensorflow word-embedding tensorflow.js tensorflowjs-converter
add a comment |
Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.
For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:

Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):

To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:
# NEW: added this, with different messages
messages = ["cats are great!", "sometimes models are confusing"]
values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)
with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(
encodings,
feed_dict=input_placeholder.values: values,
input_placeholder.indices: indices,
input_placeholder.dense_shape: dense_shape)
for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: ".format(messages[i]))
print("Embedding size: ".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [, ...]n".format(message_embedding_snippet))
# NEW: added this, to show the range of the embedding output
print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))
And the output shows:
Message: cats are great!
Embedding range: [-0.05904272198677063, 0.05903803929686546]
Message: sometimes models are confusing
Embedding range: [-0.060731519013643265, 0.06075377017259598]
So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.
The colab notebook example code has an example sentence that says:
Universal Sentence Encoder embeddings also support short paragraphs.
There is no hard limit on how long the paragraph is. Roughly, the
longer the more 'diluted' the embedding will be.
But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.
I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.
tensorflow word-embedding tensorflow.js tensorflowjs-converter
add a comment |
Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.
For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:

Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):

To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:
# NEW: added this, with different messages
messages = ["cats are great!", "sometimes models are confusing"]
values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)
with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(
encodings,
feed_dict=input_placeholder.values: values,
input_placeholder.indices: indices,
input_placeholder.dense_shape: dense_shape)
for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: ".format(messages[i]))
print("Embedding size: ".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [, ...]n".format(message_embedding_snippet))
# NEW: added this, to show the range of the embedding output
print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))
And the output shows:
Message: cats are great!
Embedding range: [-0.05904272198677063, 0.05903803929686546]
Message: sometimes models are confusing
Embedding range: [-0.060731519013643265, 0.06075377017259598]
So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.
The colab notebook example code has an example sentence that says:
Universal Sentence Encoder embeddings also support short paragraphs.
There is no hard limit on how long the paragraph is. Roughly, the
longer the more 'diluted' the embedding will be.
But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.
I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.
tensorflow word-embedding tensorflow.js tensorflowjs-converter
Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.
For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:

Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):

To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:
# NEW: added this, with different messages
messages = ["cats are great!", "sometimes models are confusing"]
values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)
with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(
encodings,
feed_dict=input_placeholder.values: values,
input_placeholder.indices: indices,
input_placeholder.dense_shape: dense_shape)
for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: ".format(messages[i]))
print("Embedding size: ".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [, ...]n".format(message_embedding_snippet))
# NEW: added this, to show the range of the embedding output
print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))
And the output shows:
Message: cats are great!
Embedding range: [-0.05904272198677063, 0.05903803929686546]
Message: sometimes models are confusing
Embedding range: [-0.060731519013643265, 0.06075377017259598]
So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.
The colab notebook example code has an example sentence that says:
Universal Sentence Encoder embeddings also support short paragraphs.
There is no hard limit on how long the paragraph is. Roughly, the
longer the more 'diluted' the embedding will be.
But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.
I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.
tensorflow word-embedding tensorflow.js tensorflowjs-converter
tensorflow word-embedding tensorflow.js tensorflowjs-converter
edited Mar 28 at 11:05
Kevin Robinson
asked Mar 28 at 2:48
Kevin RobinsonKevin Robinson
1786 bronze badges
1786 bronze badges
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product
ip = 0
for i in range(512):
ip += message_embeddings[0][i] * message_embeddings[0][i]
print(ip)
> 1.0000000807544893
The implications are that:
- Most values are likely to be in a narrow range centered around zero
- The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.
- Similarly the smallest possible value is -1.
- If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.
rand_uniform = np.random.uniform(-1, 1, 512)
l2 = np.linalg.norm(rand_uniform)
plt.plot(rand_uniform / l2, 'b.')
axes = plt.gca()
axes.set_ylim([-0.5, 0.5])

Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:1.Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense"2.It's easy to calculate and enforce
– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389456%2flimited-range-for-tensorflow-universal-sentence-encoder-lite-embeddings%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product
ip = 0
for i in range(512):
ip += message_embeddings[0][i] * message_embeddings[0][i]
print(ip)
> 1.0000000807544893
The implications are that:
- Most values are likely to be in a narrow range centered around zero
- The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.
- Similarly the smallest possible value is -1.
- If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.
rand_uniform = np.random.uniform(-1, 1, 512)
l2 = np.linalg.norm(rand_uniform)
plt.plot(rand_uniform / l2, 'b.')
axes = plt.gca()
axes.set_ylim([-0.5, 0.5])

Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:1.Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense"2.It's easy to calculate and enforce
– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
add a comment |
The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product
ip = 0
for i in range(512):
ip += message_embeddings[0][i] * message_embeddings[0][i]
print(ip)
> 1.0000000807544893
The implications are that:
- Most values are likely to be in a narrow range centered around zero
- The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.
- Similarly the smallest possible value is -1.
- If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.
rand_uniform = np.random.uniform(-1, 1, 512)
l2 = np.linalg.norm(rand_uniform)
plt.plot(rand_uniform / l2, 'b.')
axes = plt.gca()
axes.set_ylim([-0.5, 0.5])

Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:1.Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense"2.It's easy to calculate and enforce
– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
add a comment |
The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product
ip = 0
for i in range(512):
ip += message_embeddings[0][i] * message_embeddings[0][i]
print(ip)
> 1.0000000807544893
The implications are that:
- Most values are likely to be in a narrow range centered around zero
- The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.
- Similarly the smallest possible value is -1.
- If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.
rand_uniform = np.random.uniform(-1, 1, 512)
l2 = np.linalg.norm(rand_uniform)
plt.plot(rand_uniform / l2, 'b.')
axes = plt.gca()
axes.set_ylim([-0.5, 0.5])

Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.
The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product
ip = 0
for i in range(512):
ip += message_embeddings[0][i] * message_embeddings[0][i]
print(ip)
> 1.0000000807544893
The implications are that:
- Most values are likely to be in a narrow range centered around zero
- The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.
- Similarly the smallest possible value is -1.
- If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.
rand_uniform = np.random.uniform(-1, 1, 512)
l2 = np.linalg.norm(rand_uniform)
plt.plot(rand_uniform / l2, 'b.')
axes = plt.gca()
axes.set_ylim([-0.5, 0.5])

Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.
answered Mar 28 at 12:20
BlessedKeyBlessedKey
1,1356 silver badges10 bronze badges
1,1356 silver badges10 bronze badges
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:1.Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense"2.It's easy to calculate and enforce
– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
add a comment |
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:1.Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense"2.It's easy to calculate and enforce
– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?
– Kevin Robinson
Mar 29 at 13:41
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:
1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce– BlessedKey
Mar 29 at 14:30
I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include:
1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce– BlessedKey
Mar 29 at 14:30
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…
– Kevin Robinson
Mar 29 at 14:35
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389456%2flimited-range-for-tensorflow-universal-sentence-encoder-lite-embeddings%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown