Limited range for TensorFlow Universal Sentence Encoder Lite embeddings?feature normalization- advantage of l2 normalizationUpdate only part of the word embedding matrix in TensorflowTensorflow limiting batch size when learning embeddingsHow to use word embeddings for prediction in TensorflowIs there any sentence embedding Tensorflow language model?How to get a random embedding, from an embedding matrix in TensorFlow?word embeddings in tensorflow (no pre_trained)Trying to use Universal Sentence Encoder Lite/2 via Tensorflow ServingTensorflow updating embeddingsTensorflow Embedding for training and inferenceWhat are the 512 dimensional vector in Universal sentence encoder

How did Gollum know Sauron was gathering the Haradrim to make war?

Are language and thought the same?

Can there be plants on the dark side of a tidally locked world?

slowest crash on the Moon?

What happens if I double Meddling Mage's 'enter the battlefield' trigger?

How could it be that the capo isn't changing the pitch?

Time to call the bluff

Why did the Joi advertisement trigger K?

Powering an offset stacked array of pistons

What's the difference between a share and a stock?

Deleting millions of records on SQL Server 14.0

Count rook moves 1D

If I have an accident, should I file a claim with my car insurance company?

Confusion in understanding control system?

What is the significance of 104% for throttle power and rotor speed?

Short story with a first person narrator in a future where racial conflict had exploded into an all out war

Main differences between 5th edition Druid and 3.5 edition Druid

How can I let authenticated users rebuild caches?

What did Boris Johnson mean when he said "extra 34 billion going into the NHS"

What exactly is a softlock?

What does "se jouer" mean here?

How many days for hunting?

How to find better food in airports

Do I need to get a noble in order to win Splendor?



Limited range for TensorFlow Universal Sentence Encoder Lite embeddings?


feature normalization- advantage of l2 normalizationUpdate only part of the word embedding matrix in TensorflowTensorflow limiting batch size when learning embeddingsHow to use word embeddings for prediction in TensorflowIs there any sentence embedding Tensorflow language model?How to get a random embedding, from an embedding matrix in TensorFlow?word embeddings in tensorflow (no pre_trained)Trying to use Universal Sentence Encoder Lite/2 via Tensorflow ServingTensorflow updating embeddingsTensorflow Embedding for training and inferenceWhat are the 512 dimensional vector in Universal sentence encoder






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.



For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:



enter image description here



Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):



enter image description here



To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:



# NEW: added this, with different messages
messages = ["cats are great!", "sometimes models are confusing"]
values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)

with tf.Session() as session:
session.run([tf.global_variables_initializer(), tf.tables_initializer()])
message_embeddings = session.run(
encodings,
feed_dict=input_placeholder.values: values,
input_placeholder.indices: indices,
input_placeholder.dense_shape: dense_shape)

for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
print("Message: ".format(messages[i]))
print("Embedding size: ".format(len(message_embedding)))
message_embedding_snippet = ", ".join(
(str(x) for x in message_embedding[:3]))
print("Embedding: [, ...]n".format(message_embedding_snippet))
# NEW: added this, to show the range of the embedding output
print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))


And the output shows:



Message: cats are great!
Embedding range: [-0.05904272198677063, 0.05903803929686546]

Message: sometimes models are confusing
Embedding range: [-0.060731519013643265, 0.06075377017259598]


So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.



The colab notebook example code has an example sentence that says:




Universal Sentence Encoder embeddings also support short paragraphs.
There is no hard limit on how long the paragraph is. Roughly, the
longer the more 'diluted' the embedding will be.




But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.



I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.










share|improve this question
































    1















    Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.



    For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:



    enter image description here



    Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):



    enter image description here



    To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:



    # NEW: added this, with different messages
    messages = ["cats are great!", "sometimes models are confusing"]
    values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)

    with tf.Session() as session:
    session.run([tf.global_variables_initializer(), tf.tables_initializer()])
    message_embeddings = session.run(
    encodings,
    feed_dict=input_placeholder.values: values,
    input_placeholder.indices: indices,
    input_placeholder.dense_shape: dense_shape)

    for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
    print("Message: ".format(messages[i]))
    print("Embedding size: ".format(len(message_embedding)))
    message_embedding_snippet = ", ".join(
    (str(x) for x in message_embedding[:3]))
    print("Embedding: [, ...]n".format(message_embedding_snippet))
    # NEW: added this, to show the range of the embedding output
    print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))


    And the output shows:



    Message: cats are great!
    Embedding range: [-0.05904272198677063, 0.05903803929686546]

    Message: sometimes models are confusing
    Embedding range: [-0.060731519013643265, 0.06075377017259598]


    So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.



    The colab notebook example code has an example sentence that says:




    Universal Sentence Encoder embeddings also support short paragraphs.
    There is no hard limit on how long the paragraph is. Roughly, the
    longer the more 'diluted' the embedding will be.




    But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.



    I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.










    share|improve this question




























      1












      1








      1








      Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.



      For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:



      enter image description here



      Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):



      enter image description here



      To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:



      # NEW: added this, with different messages
      messages = ["cats are great!", "sometimes models are confusing"]
      values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)

      with tf.Session() as session:
      session.run([tf.global_variables_initializer(), tf.tables_initializer()])
      message_embeddings = session.run(
      encodings,
      feed_dict=input_placeholder.values: values,
      input_placeholder.indices: indices,
      input_placeholder.dense_shape: dense_shape)

      for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
      print("Message: ".format(messages[i]))
      print("Embedding size: ".format(len(message_embedding)))
      message_embedding_snippet = ", ".join(
      (str(x) for x in message_embedding[:3]))
      print("Embedding: [, ...]n".format(message_embedding_snippet))
      # NEW: added this, to show the range of the embedding output
      print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))


      And the output shows:



      Message: cats are great!
      Embedding range: [-0.05904272198677063, 0.05903803929686546]

      Message: sometimes models are confusing
      Embedding range: [-0.060731519013643265, 0.06075377017259598]


      So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.



      The colab notebook example code has an example sentence that says:




      Universal Sentence Encoder embeddings also support short paragraphs.
      There is no hard limit on how long the paragraph is. Roughly, the
      longer the more 'diluted' the embedding will be.




      But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.



      I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.










      share|improve this question
















      Starting from the universal-sentence-encoder in TensorFlow.js, I noticed that the range of the numbers in the embeddings wasn't what I expected. I was expecting some distribution between [0-1] or [-1,1] but don't see either of these.



      For the sentence "cats are great!" here's a visualization, where each dimension is projected onto a scale from [-0.5, 0.5]:



      enter image description here



      Here's the same kind of visualization for "i wonder what this sentence's embedding will be" (the pattern is similar for the first ~10 sentences I tried):



      enter image description here



      To debug, I looked at whether the same kind of thing comes up in the demo Colab notebook, and it seems like it is. Here's what I see if I see for the range of the embeddings for those two sentences:



      # NEW: added this, with different messages
      messages = ["cats are great!", "sometimes models are confusing"]
      values, indices, dense_shape = process_to_IDs_in_sparse_format(sp, messages)

      with tf.Session() as session:
      session.run([tf.global_variables_initializer(), tf.tables_initializer()])
      message_embeddings = session.run(
      encodings,
      feed_dict=input_placeholder.values: values,
      input_placeholder.indices: indices,
      input_placeholder.dense_shape: dense_shape)

      for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
      print("Message: ".format(messages[i]))
      print("Embedding size: ".format(len(message_embedding)))
      message_embedding_snippet = ", ".join(
      (str(x) for x in message_embedding[:3]))
      print("Embedding: [, ...]n".format(message_embedding_snippet))
      # NEW: added this, to show the range of the embedding output
      print("Embedding range: [, ]".format(min(message_embedding), max(message_embedding)))


      And the output shows:



      Message: cats are great!
      Embedding range: [-0.05904272198677063, 0.05903803929686546]

      Message: sometimes models are confusing
      Embedding range: [-0.060731519013643265, 0.06075377017259598]


      So this again isn't what I'm expecting - the range is more narrow than I'd expect. I thought this might be a TF convention that I missed, but couldn't see it in the TFHub page or the guide to text embeddings or in the paper so am not sure where else to look without digging into the training code.



      The colab notebook example code has an example sentence that says:




      Universal Sentence Encoder embeddings also support short paragraphs.
      There is no hard limit on how long the paragraph is. Roughly, the
      longer the more 'diluted' the embedding will be.




      But the range of the embedding is roughly the same for all the other examples in the colab, even one word examples.



      I'm assuming this range is not just arbitrary, and it does make sense to me that the range is centered in zero and small, but I'm trying to understand how this scale came to be.







      tensorflow word-embedding tensorflow.js tensorflowjs-converter






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 28 at 11:05







      Kevin Robinson

















      asked Mar 28 at 2:48









      Kevin RobinsonKevin Robinson

      1786 bronze badges




      1786 bronze badges

























          1 Answer
          1






          active

          oldest

          votes


















          4
















          The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product



          ip = 0
          for i in range(512):
          ip += message_embeddings[0][i] * message_embeddings[0][i]

          print(ip)

          > 1.0000000807544893


          The implications are that:



          • Most values are likely to be in a narrow range centered around zero

          • The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.

          • Similarly the smallest possible value is -1.

          • If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.

          rand_uniform = np.random.uniform(-1, 1, 512)
          l2 = np.linalg.norm(rand_uniform)
          plt.plot(rand_uniform / l2, 'b.')
          axes = plt.gca()
          axes.set_ylim([-0.5, 0.5])


          enter image description here



          Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.






          share|improve this answer

























          • This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

            – Kevin Robinson
            Mar 29 at 13:41











          • I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

            – BlessedKey
            Mar 29 at 14:30












          • Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

            – Kevin Robinson
            Mar 29 at 14:35











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389456%2flimited-range-for-tensorflow-universal-sentence-encoder-lite-embeddings%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4
















          The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product



          ip = 0
          for i in range(512):
          ip += message_embeddings[0][i] * message_embeddings[0][i]

          print(ip)

          > 1.0000000807544893


          The implications are that:



          • Most values are likely to be in a narrow range centered around zero

          • The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.

          • Similarly the smallest possible value is -1.

          • If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.

          rand_uniform = np.random.uniform(-1, 1, 512)
          l2 = np.linalg.norm(rand_uniform)
          plt.plot(rand_uniform / l2, 'b.')
          axes = plt.gca()
          axes.set_ylim([-0.5, 0.5])


          enter image description here



          Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.






          share|improve this answer

























          • This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

            – Kevin Robinson
            Mar 29 at 13:41











          • I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

            – BlessedKey
            Mar 29 at 14:30












          • Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

            – Kevin Robinson
            Mar 29 at 14:35
















          4
















          The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product



          ip = 0
          for i in range(512):
          ip += message_embeddings[0][i] * message_embeddings[0][i]

          print(ip)

          > 1.0000000807544893


          The implications are that:



          • Most values are likely to be in a narrow range centered around zero

          • The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.

          • Similarly the smallest possible value is -1.

          • If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.

          rand_uniform = np.random.uniform(-1, 1, 512)
          l2 = np.linalg.norm(rand_uniform)
          plt.plot(rand_uniform / l2, 'b.')
          axes = plt.gca()
          axes.set_ylim([-0.5, 0.5])


          enter image description here



          Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.






          share|improve this answer

























          • This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

            – Kevin Robinson
            Mar 29 at 13:41











          • I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

            – BlessedKey
            Mar 29 at 14:30












          • Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

            – Kevin Robinson
            Mar 29 at 14:35














          4














          4










          4









          The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product



          ip = 0
          for i in range(512):
          ip += message_embeddings[0][i] * message_embeddings[0][i]

          print(ip)

          > 1.0000000807544893


          The implications are that:



          • Most values are likely to be in a narrow range centered around zero

          • The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.

          • Similarly the smallest possible value is -1.

          • If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.

          rand_uniform = np.random.uniform(-1, 1, 512)
          l2 = np.linalg.norm(rand_uniform)
          plt.plot(rand_uniform / l2, 'b.')
          axes = plt.gca()
          axes.set_ylim([-0.5, 0.5])


          enter image description here



          Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.






          share|improve this answer













          The output of the universal sentence encoder is a vector of length 512, with an L2 norm of (approximately) 1.0. You can check this by calculating the inner product



          ip = 0
          for i in range(512):
          ip += message_embeddings[0][i] * message_embeddings[0][i]

          print(ip)

          > 1.0000000807544893


          The implications are that:



          • Most values are likely to be in a narrow range centered around zero

          • The largest possible single value in the vector is 1.0 - and this would only happen if all other values are exactly 0.

          • Similarly the smallest possible value is -1.

          • If we take a random vector of length 512, with values distributed uniformly, and then normalize it to unit magnitude, we expect to see values in a range similar to what you see.

          rand_uniform = np.random.uniform(-1, 1, 512)
          l2 = np.linalg.norm(rand_uniform)
          plt.plot(rand_uniform / l2, 'b.')
          axes = plt.gca()
          axes.set_ylim([-0.5, 0.5])


          enter image description here



          Judging visually, the distribution of excitations does not look uniform, but rather is biased toward extremes.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 at 12:20









          BlessedKeyBlessedKey

          1,1356 silver badges10 bronze badges




          1,1356 silver badges10 bronze badges















          • This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

            – Kevin Robinson
            Mar 29 at 13:41











          • I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

            – BlessedKey
            Mar 29 at 14:30












          • Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

            – Kevin Robinson
            Mar 29 at 14:35


















          • This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

            – Kevin Robinson
            Mar 29 at 13:41











          • I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

            – BlessedKey
            Mar 29 at 14:30












          • Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

            – Kevin Robinson
            Mar 29 at 14:35

















          This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

          – Kevin Robinson
          Mar 29 at 13:41





          This is amazingly helpful, thank you! :) I'm still wondering about why this is like this. I can see general things about ways to norm vectors, and some things like L2 is computationally simpler, but am interested in learning more specifically why this matters for text embeddings. My understanding is that it's because there are additional operations or properties that are only provably correct in spaces that are normed in a particular way. In particular, it seems like this may be related to guarantees to the output of cosine similarity in an L2 normed space?

          – Kevin Robinson
          Mar 29 at 13:41













          I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

          – BlessedKey
          Mar 29 at 14:30






          I'm not sure there is a short and closed answer to the question why the algorithm designers selected L2 normalization. Some possible perspectives that helped guide their choice probably include: 1. Having an output with a constant L2 norm allows for constraints on the inner product between any two embeddings, so that it's possible to say something like : "these two inputs are similar because their embeddings are close in a dot product sense" 2. It's easy to calculate and enforce

          – BlessedKey
          Mar 29 at 14:30














          Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

          – Kevin Robinson
          Mar 29 at 14:35






          Got it, thanks for your help in learning more! A related question for other folks who find this: stackoverflow.com/questions/32276391/…

          – Kevin Robinson
          Mar 29 at 14:35









          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389456%2flimited-range-for-tensorflow-universal-sentence-encoder-lite-embeddings%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

          155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해