Using BERT for next sentence predictionBest way to classify labeled sentences from a set of documentsHow to use word embeddings for prediction in Tensorflowtensorflow loaded model gives different predictionsSentence order prediction from user given input using RNN- LSTM language modelingUse LSTM tutorial code to predict next word in a sentence?How to detect sentence type using pythonNeural Network High Confidence Inaccurate PredictionsPytorch tutorial LSTMdynamic input slicing for tensorflow dynamic RNNHow to use BERT in image caption tasks,such as im2txt,densecap

If an attacker targets a creature with the Sanctuary spell cast on them, but fails the Wisdom save, can they choose not to attack anyone else?

Test whether a string is in a list with variable

Why is the blank symbol not considered part of the input alphabet of a Turing machine?

Justification of physical currency in an interstellar civilization?

Are modes in jazz primarily a melody thing?

How do I minimise waste on a flight?

Range hood vents into crawl space

When does WordPress.org notify sites of new version?

Gift for mentor after his thesis defense?

How can I test a shell script in a "safe environment" to avoid harm to my computer?

Magical Modulo Squares

Make me a minimum magic sum

In a series of books, what happens after the coming of age?

Translation of "invincible independence"

Scaling rounded rectangles in Illustrator

Why were the rules for Proliferate changed?

Can you just subtract the challenge rating of friendly NPCs?

Picking a theme as a discovery writer

Explaining intravenous drug abuse to a small child

How can I finally understand the confusing modal verb "мочь"?

What’s the interaction between darkvision and the Eagle Aspect of the beast, if you have Darkvision past 100 feet?

While drilling into kitchen wall, hit a wire - any advice?

Why did Gendry call himself Gendry Rivers?

Splitting polygons and dividing attribute value proportionally using ArcGIS Pro?



Using BERT for next sentence prediction


Best way to classify labeled sentences from a set of documentsHow to use word embeddings for prediction in Tensorflowtensorflow loaded model gives different predictionsSentence order prediction from user given input using RNN- LSTM language modelingUse LSTM tutorial code to predict next word in a sentence?How to detect sentence type using pythonNeural Network High Confidence Inaccurate PredictionsPytorch tutorial LSTMdynamic input slicing for tensorflow dynamic RNNHow to use BERT in image caption tasks,such as im2txt,densecap






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















Google's BERT is pretrained on next sentence prediction tasks, but I'm wondering if it's possible to call the next sentence prediction function on new data.



The idea is: given sentence A and given sentence B, I want a probabilistic label for whether or not sentence B follows sentence A. BERT is pretrained on a huge set of data, so I was hoping to use this next sentence prediction on new sentence data. I can't seem to figure out if this next sentence prediction function can be called and if so, how. Thanks for your help!










share|improve this question




























    2















    Google's BERT is pretrained on next sentence prediction tasks, but I'm wondering if it's possible to call the next sentence prediction function on new data.



    The idea is: given sentence A and given sentence B, I want a probabilistic label for whether or not sentence B follows sentence A. BERT is pretrained on a huge set of data, so I was hoping to use this next sentence prediction on new sentence data. I can't seem to figure out if this next sentence prediction function can be called and if so, how. Thanks for your help!










    share|improve this question
























      2












      2








      2








      Google's BERT is pretrained on next sentence prediction tasks, but I'm wondering if it's possible to call the next sentence prediction function on new data.



      The idea is: given sentence A and given sentence B, I want a probabilistic label for whether or not sentence B follows sentence A. BERT is pretrained on a huge set of data, so I was hoping to use this next sentence prediction on new sentence data. I can't seem to figure out if this next sentence prediction function can be called and if so, how. Thanks for your help!










      share|improve this question














      Google's BERT is pretrained on next sentence prediction tasks, but I'm wondering if it's possible to call the next sentence prediction function on new data.



      The idea is: given sentence A and given sentence B, I want a probabilistic label for whether or not sentence B follows sentence A. BERT is pretrained on a huge set of data, so I was hoping to use this next sentence prediction on new sentence data. I can't seem to figure out if this next sentence prediction function can be called and if so, how. Thanks for your help!







      tensorflow deep-learning nlp reproducible-research natural-language-processing






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 11 at 22:29









      PaulPaul

      162




      162






















          1 Answer
          1






          active

          oldest

          votes


















          1














          Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854



          class BertForNextSentencePrediction(BertPreTrainedModel):
          """BERT model with next sentence prediction head.
          This module comprises the BERT model followed by the next sentence classification head.
          Params:
          config: a BertConfig class instance with the configuration to build a new model.
          Inputs:
          `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
          with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
          `extract_features.py`, `run_classifier.py` and `run_squad.py`)
          `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
          types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
          a `sentence B` token (see BERT paper for more details).
          `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
          selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
          input sequence length in the current batch. It's the mask that we typically use for attention when
          a batch has varying length sentences.
          `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
          with indices selected in [0, 1].
          0 => next sentence is the continuation, 1 => next sentence is a random sentence.
          Outputs:
          if `next_sentence_label` is not `None`:
          Outputs the total_loss which is the sum of the masked language modeling loss and the next
          sentence classification loss.
          if `next_sentence_label` is `None`:
          Outputs the next sentence classification logits of shape [batch_size, 2].
          Example usage:
          ```python
          # Already been converted into WordPiece token ids
          input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
          input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
          token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
          config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
          num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
          model = BertForNextSentencePrediction(config)
          seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
          ```
          """
          def __init__(self, config):
          super(BertForNextSentencePrediction, self).__init__(config)
          self.bert = BertModel(config)
          self.cls = BertOnlyNSPHead(config)
          self.apply(self.init_bert_weights)

          def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
          _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
          output_all_encoded_layers=False)
          seq_relationship_score = self.cls( pooled_output)

          if next_sentence_label is not None:
          loss_fct = CrossEntropyLoss(ignore_index=-1)
          next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
          return next_sentence_loss
          else:
          return seq_relationship_score





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55111360%2fusing-bert-for-next-sentence-prediction%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854



            class BertForNextSentencePrediction(BertPreTrainedModel):
            """BERT model with next sentence prediction head.
            This module comprises the BERT model followed by the next sentence classification head.
            Params:
            config: a BertConfig class instance with the configuration to build a new model.
            Inputs:
            `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
            with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
            `extract_features.py`, `run_classifier.py` and `run_squad.py`)
            `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
            types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
            a `sentence B` token (see BERT paper for more details).
            `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
            selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
            input sequence length in the current batch. It's the mask that we typically use for attention when
            a batch has varying length sentences.
            `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
            with indices selected in [0, 1].
            0 => next sentence is the continuation, 1 => next sentence is a random sentence.
            Outputs:
            if `next_sentence_label` is not `None`:
            Outputs the total_loss which is the sum of the masked language modeling loss and the next
            sentence classification loss.
            if `next_sentence_label` is `None`:
            Outputs the next sentence classification logits of shape [batch_size, 2].
            Example usage:
            ```python
            # Already been converted into WordPiece token ids
            input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
            input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
            token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
            config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
            num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
            model = BertForNextSentencePrediction(config)
            seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
            ```
            """
            def __init__(self, config):
            super(BertForNextSentencePrediction, self).__init__(config)
            self.bert = BertModel(config)
            self.cls = BertOnlyNSPHead(config)
            self.apply(self.init_bert_weights)

            def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
            _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
            output_all_encoded_layers=False)
            seq_relationship_score = self.cls( pooled_output)

            if next_sentence_label is not None:
            loss_fct = CrossEntropyLoss(ignore_index=-1)
            next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
            return next_sentence_loss
            else:
            return seq_relationship_score





            share|improve this answer



























              1














              Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854



              class BertForNextSentencePrediction(BertPreTrainedModel):
              """BERT model with next sentence prediction head.
              This module comprises the BERT model followed by the next sentence classification head.
              Params:
              config: a BertConfig class instance with the configuration to build a new model.
              Inputs:
              `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
              with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
              `extract_features.py`, `run_classifier.py` and `run_squad.py`)
              `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
              types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
              a `sentence B` token (see BERT paper for more details).
              `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
              selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
              input sequence length in the current batch. It's the mask that we typically use for attention when
              a batch has varying length sentences.
              `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
              with indices selected in [0, 1].
              0 => next sentence is the continuation, 1 => next sentence is a random sentence.
              Outputs:
              if `next_sentence_label` is not `None`:
              Outputs the total_loss which is the sum of the masked language modeling loss and the next
              sentence classification loss.
              if `next_sentence_label` is `None`:
              Outputs the next sentence classification logits of shape [batch_size, 2].
              Example usage:
              ```python
              # Already been converted into WordPiece token ids
              input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
              input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
              token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
              config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
              num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
              model = BertForNextSentencePrediction(config)
              seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
              ```
              """
              def __init__(self, config):
              super(BertForNextSentencePrediction, self).__init__(config)
              self.bert = BertModel(config)
              self.cls = BertOnlyNSPHead(config)
              self.apply(self.init_bert_weights)

              def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
              _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
              output_all_encoded_layers=False)
              seq_relationship_score = self.cls( pooled_output)

              if next_sentence_label is not None:
              loss_fct = CrossEntropyLoss(ignore_index=-1)
              next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
              return next_sentence_loss
              else:
              return seq_relationship_score





              share|improve this answer

























                1












                1








                1







                Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854



                class BertForNextSentencePrediction(BertPreTrainedModel):
                """BERT model with next sentence prediction head.
                This module comprises the BERT model followed by the next sentence classification head.
                Params:
                config: a BertConfig class instance with the configuration to build a new model.
                Inputs:
                `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
                with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
                `extract_features.py`, `run_classifier.py` and `run_squad.py`)
                `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
                types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
                a `sentence B` token (see BERT paper for more details).
                `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
                selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
                input sequence length in the current batch. It's the mask that we typically use for attention when
                a batch has varying length sentences.
                `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
                with indices selected in [0, 1].
                0 => next sentence is the continuation, 1 => next sentence is a random sentence.
                Outputs:
                if `next_sentence_label` is not `None`:
                Outputs the total_loss which is the sum of the masked language modeling loss and the next
                sentence classification loss.
                if `next_sentence_label` is `None`:
                Outputs the next sentence classification logits of shape [batch_size, 2].
                Example usage:
                ```python
                # Already been converted into WordPiece token ids
                input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
                input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
                token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
                config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
                num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
                model = BertForNextSentencePrediction(config)
                seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
                ```
                """
                def __init__(self, config):
                super(BertForNextSentencePrediction, self).__init__(config)
                self.bert = BertModel(config)
                self.cls = BertOnlyNSPHead(config)
                self.apply(self.init_bert_weights)

                def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
                _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
                output_all_encoded_layers=False)
                seq_relationship_score = self.cls( pooled_output)

                if next_sentence_label is not None:
                loss_fct = CrossEntropyLoss(ignore_index=-1)
                next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
                return next_sentence_loss
                else:
                return seq_relationship_score





                share|improve this answer













                Hugging face did it for you: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L854



                class BertForNextSentencePrediction(BertPreTrainedModel):
                """BERT model with next sentence prediction head.
                This module comprises the BERT model followed by the next sentence classification head.
                Params:
                config: a BertConfig class instance with the configuration to build a new model.
                Inputs:
                `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
                with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
                `extract_features.py`, `run_classifier.py` and `run_squad.py`)
                `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
                types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
                a `sentence B` token (see BERT paper for more details).
                `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
                selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
                input sequence length in the current batch. It's the mask that we typically use for attention when
                a batch has varying length sentences.
                `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
                with indices selected in [0, 1].
                0 => next sentence is the continuation, 1 => next sentence is a random sentence.
                Outputs:
                if `next_sentence_label` is not `None`:
                Outputs the total_loss which is the sum of the masked language modeling loss and the next
                sentence classification loss.
                if `next_sentence_label` is `None`:
                Outputs the next sentence classification logits of shape [batch_size, 2].
                Example usage:
                ```python
                # Already been converted into WordPiece token ids
                input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
                input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
                token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
                config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
                num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
                model = BertForNextSentencePrediction(config)
                seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
                ```
                """
                def __init__(self, config):
                super(BertForNextSentencePrediction, self).__init__(config)
                self.bert = BertModel(config)
                self.cls = BertOnlyNSPHead(config)
                self.apply(self.init_bert_weights)

                def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
                _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
                output_all_encoded_layers=False)
                seq_relationship_score = self.cls( pooled_output)

                if next_sentence_label is not None:
                loss_fct = CrossEntropyLoss(ignore_index=-1)
                next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
                return next_sentence_loss
                else:
                return seq_relationship_score






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 23 at 6:03









                AerinAerin

                4,35364067




                4,35364067





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55111360%2fusing-bert-for-next-sentence-prediction%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript