Deserializing Spacy results Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to work with SpaCy parser?Why Spacy api version and web version results are different?Error while installing spacyspaCy is_oov not working as expectedunable to download spacy modelSpacy - Tokenize quoted stringNeed clarity on which component in default pipeline modifies lemma_ on Doc and need suggestions for improving spacy throughputspaCy Abbreviation/Acronym Handlingspacy convert conllul to spacy json formatReproduce spaCy training results

Will I be more secure with my own router behind my ISP's router?

What's parked in Mil Moscow helicopter plant?

"Working on a knee"

How can I wire a 9-position switch so that each position turns on one more LED than the one before?

What were wait-states, and why was it only an issue for PCs?

Retract an already submitted Recommendation Letter (written for an undergrad student)

What is the ongoing value of the Kanban board to the developers as opposed to management

Does Prince Arnaud cause someone holding the Princess to lose?

What was Apollo 13's "Little Jolt" after MECO?

Will temporary Dex penalties prevent you from getting the benefits of the "Two Weapon Fighting" feat if your Dex score falls below the prerequisite?

How would you suggest I follow up with coworkers about our deadline that's today?

How would it unbalance gameplay to rule that Weapon Master allows for picking a fighting style?

Raising a bilingual kid. When should we introduce the majority language?

Determinant of a matrix with 2 equal rows

Why I cannot instantiate a class whose constructor is private in a friend class?

What does こした mean?

Is there a verb for listening stealthily?

/bin/ls sorts differently than just ls

Why is water being consumed when my shutoff valve is closed?

What is /etc/mtab in Linux?

Putting Ant-Man on house arrest

TV series episode where humans nuke aliens before decrypting their message that states they come in peace

Variable does not exist: sObjectType (Task.sObjectType)

Is it appropriate to mention a relatable company blog post when you're asked about the company?

Deserializing Spacy results

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)

Data science time! April 2019 and salary with experience

The Ask Question Wizard is Live!How to work with SpaCy parser?Why Spacy api version and web version results are different?Error while installing spacyspaCy is_oov not working as expectedunable to download spacy modelSpacy - Tokenize quoted stringNeed clarity on which component in default pipeline modifies lemma_ on Doc and need suggestions for improving spacy throughputspaCy Abbreviation/Acronym Handlingspacy convert conllul to spacy json formatReproduce spaCy training results

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I need to run an algorithm on a lot of text files. In order to pre-process them, I use Spacy which has pre-trained models in different languages. Since the pre-processed results are employed in different part of algorithms, it is better to save them on disk once and load them many times. However, the Spacy deserialization method makes an error.
I wrote a simple code to show the error:

de_nlp=spacy.load("de_core_news_sm",disable=['ner', 'parser'])
de_nlp.add_pipe(de_nlp.create_pipe('sentencizer'))
doc = de_nlp(text_file_content)

for ix, sent in enumerate(doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

#Serialization and Deserilization 
doc.to_disk("/tmp/test_result.bin")
new_doc = Doc(Vocab()).from_disk("/tmp/test_result.bin")

for ix, sent in enumerate(new_doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

However, the above example code makes the following error:

Traceback (most recent call last): File "/tmp/test_result.bin", line 14, in <module>
 for ix, sent in enumerate(new_doc.sents, 1): 
File "doc.pyx", line 535, in __get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

The version of Spacy I use is 2.0.18.

Any information about this topic is appreciated.

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

add a comment |

de_nlp=spacy.load("de_core_news_sm",disable=['ner', 'parser'])
de_nlp.add_pipe(de_nlp.create_pipe('sentencizer'))
doc = de_nlp(text_file_content)

for ix, sent in enumerate(doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

#Serialization and Deserilization 
doc.to_disk("/tmp/test_result.bin")
new_doc = Doc(Vocab()).from_disk("/tmp/test_result.bin")

for ix, sent in enumerate(new_doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

However, the above example code makes the following error:

Traceback (most recent call last): File "/tmp/test_result.bin", line 14, in <module>
 for ix, sent in enumerate(new_doc.sents, 1): 
File "doc.pyx", line 535, in __get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

The version of Spacy I use is 2.0.18.

Any information about this topic is appreciated.

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

add a comment |

de_nlp=spacy.load("de_core_news_sm",disable=['ner', 'parser'])
de_nlp.add_pipe(de_nlp.create_pipe('sentencizer'))
doc = de_nlp(text_file_content)

for ix, sent in enumerate(doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

#Serialization and Deserilization 
doc.to_disk("/tmp/test_result.bin")
new_doc = Doc(Vocab()).from_disk("/tmp/test_result.bin")

for ix, sent in enumerate(new_doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

However, the above example code makes the following error:

Traceback (most recent call last): File "/tmp/test_result.bin", line 14, in <module>
 for ix, sent in enumerate(new_doc.sents, 1): 
File "doc.pyx", line 535, in __get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

The version of Spacy I use is 2.0.18.

Any information about this topic is appreciated.

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

de_nlp=spacy.load("de_core_news_sm",disable=['ner', 'parser'])
de_nlp.add_pipe(de_nlp.create_pipe('sentencizer'))
doc = de_nlp(text_file_content)

for ix, sent in enumerate(doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

#Serialization and Deserilization 
doc.to_disk("/tmp/test_result.bin")
new_doc = Doc(Vocab()).from_disk("/tmp/test_result.bin")

for ix, sent in enumerate(new_doc.sents, 1):
 print("--Sentence number : ".format(ix, sent))
 lemma = [w.lemma_ for w in sent]
 print(f"Lemma ==> lemma")

However, the above example code makes the following error:

Traceback (most recent call last): File "/tmp/test_result.bin", line 14, in <module>
 for ix, sent in enumerate(new_doc.sents, 1): 
File "doc.pyx", line 535, in __get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

The version of Spacy I use is 2.0.18.

Any information about this topic is appreciated.

python-3.x nlp text-mining spacy natural-language-processing

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

edited Mar 22 at 15:38

TrebledJ

4,04821432

edited Mar 22 at 15:38

TrebledJ

4,04821432

edited Mar 22 at 15:38

TrebledJ

4,04821432

asked Mar 22 at 14:43

SahelSoft

3251517

asked Mar 22 at 14:43

SahelSoft

3251517

asked Mar 22 at 14:43

SahelSoft

3251517

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302122%2fdeserializing-spacy-results%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴