Scrapy xml pipelineHow does one parse XML files?Pretty printing XML in PythonWhat characters do I need to escape in XML documents?How do I parse XML in Python?How do I comment out a block of tags in XML?What does <![CDATA[]]> in XML mean?How do you parse and process HTML/XML in PHP?Scrapy Pipeline loads but doesn't workHow to access scrapy settings from item PipelineScrapy pipeline html parsing

How do you cope with rejection?

Shortest amud or daf in Shas?

How does this piece of code determine array size without using sizeof( )?

Taylor series leads to two different functions - why?

How would fantasy dwarves exist, realistically?

Can ThermodynamicData be used with NSolve?

Why didn't Daenerys' advisers suggest assassinating Cersei?

Why is choosing a suitable thermodynamic potential important?

Why would you put your input amplifier in front of your filtering for an ECG signal?

Why does the setUID bit work inconsistently?

French equivalent of the German expression "flöten gehen"

I recently started my machine learning PhD and I have absolutely no idea what I'm doing

Can more than one instance of Bend Luck be applied to the same roll by multiple Wild Magic sorcerers?

Lock out of Oracle based on Windows username

Good examples of "two is easy, three is hard" in computational sciences

Windows reverting changes made by Linux to FAT32 partion

What technology would Dwarves need to forge titanium?

How was the blinking terminal cursor invented?

Error when running ((x++)) as root

Why do academics prefer Mac/Linux?

Is there any deeper thematic meaning to the white horse that Arya finds in The Bells (S08E05)?

Told to apply for UK visa before other visas, on UK-Spain-etc. visit

How can sister protect herself from impulse purchases with a credit card?

Is it a good idea to teach algorithm courses using pseudocode?



Scrapy xml pipeline


How does one parse XML files?Pretty printing XML in PythonWhat characters do I need to escape in XML documents?How do I parse XML in Python?How do I comment out a block of tags in XML?What does <![CDATA[]]> in XML mean?How do you parse and process HTML/XML in PHP?Scrapy Pipeline loads but doesn't workHow to access scrapy settings from item PipelineScrapy pipeline html parsing






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
























  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16


















1















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
























  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16














1












1








1








I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0







python xml scrapy pipeline






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 17:39







Juan Manuel

















asked Mar 23 at 17:29









Juan ManuelJuan Manuel

133




133












  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16


















  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16

















Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

– balderman
Mar 24 at 8:06






Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

– balderman
Mar 24 at 8:06














Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

– Juan Manuel
Mar 24 at 15:38





Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

– Juan Manuel
Mar 24 at 15:38













So use only the 'item_element'

– balderman
Mar 24 at 15:42





So use only the 'item_element'

– balderman
Mar 24 at 15:42













It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

– Juan Manuel
Mar 24 at 18:16






It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

– Juan Manuel
Mar 24 at 18:16













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316485%2fscrapy-xml-pipeline%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316485%2fscrapy-xml-pipeline%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현