Scrapy xml pipelineHow does one parse XML files?Pretty printing XML in PythonWhat characters do I need to escape in XML documents?How do I parse XML in Python?How do I comment out a block of tags in XML?What does <![CDATA[]]> in XML mean?How do you parse and process HTML/XML in PHP?Scrapy Pipeline loads but doesn't workHow to access scrapy settings from item PipelineScrapy pipeline html parsing

How do you cope with rejection?

Shortest amud or daf in Shas?

How does this piece of code determine array size without using sizeof( )?

Taylor series leads to two different functions - why?

How would fantasy dwarves exist, realistically?

Can ThermodynamicData be used with NSolve?

Why didn't Daenerys' advisers suggest assassinating Cersei?

Why is choosing a suitable thermodynamic potential important?

Why would you put your input amplifier in front of your filtering for an ECG signal?

Why does the setUID bit work inconsistently?

French equivalent of the German expression "flöten gehen"

I recently started my machine learning PhD and I have absolutely no idea what I'm doing

Can more than one instance of Bend Luck be applied to the same roll by multiple Wild Magic sorcerers?

Lock out of Oracle based on Windows username

Good examples of "two is easy, three is hard" in computational sciences

Windows reverting changes made by Linux to FAT32 partion

What technology would Dwarves need to forge titanium?

How was the blinking terminal cursor invented?

Error when running ((x++)) as root

Why do academics prefer Mac/Linux?

Is there any deeper thematic meaning to the white horse that Arya finds in The Bells (S08E05)?

Told to apply for UK visa before other visas, on UK-Spain-etc. visit

How can sister protect herself from impulse purchases with a credit card?

Is it a good idea to teach algorithm courses using pseudocode?



Scrapy xml pipeline


How does one parse XML files?Pretty printing XML in PythonWhat characters do I need to escape in XML documents?How do I parse XML in Python?How do I comment out a block of tags in XML?What does <![CDATA[]]> in XML mean?How do you parse and process HTML/XML in PHP?Scrapy Pipeline loads but doesn't workHow to access scrapy settings from item PipelineScrapy pipeline html parsing






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
























  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16


















1















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
























  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16














1












1








1








I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0










share|improve this question
















I need to make a spider that which must output a xml file for any article.



The pipeline.py:



from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
def process_item(self, item, spider):
return item

class XmlExportPipeline(object):
def __init__(self):
self.files =

def process_item(self, item, spider):
file = open((spider.name + datetime.now().strftime("_%H%M%S%f.xml")), 'w+b')
self.files[spider] = file
self.exporter = XmlItemExporter(file)
self.exporter.start_exporting()
self.exporter.export_item(item)
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
return item


The output:



<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</item>
</items>


But I need a output like this:



<?xml version="1.0" encoding="iso-8859-1"?>
<article>
<text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora </text_img>
<title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
<url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
<content> Nelson Argaña, hijo de Luis María Arg ...</content>
<sum_content>4805</sum_content>
<time>14:30:06</time>
<date>20190323</date>
</article>


The settings.py:



ITEM_PIPELINES = 
'common.pipelines.XmlExportPipeline': 300,

FEED_EXPORTERS_BASE =
'xml': 'scrapy.contrib.exporter.XmlItemExporter',




I tried adding in settings.py:



FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]


But don't works.



I use Scrapy 1.4.0







python xml scrapy pipeline






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 17:39







Juan Manuel

















asked Mar 23 at 17:29









Juan ManuelJuan Manuel

133




133












  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16


















  • Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

    – balderman
    Mar 24 at 8:06












  • Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

    – Juan Manuel
    Mar 24 at 15:38











  • So use only the 'item_element'

    – balderman
    Mar 24 at 15:42











  • It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

    – Juan Manuel
    Mar 24 at 18:16

















Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

– balderman
Mar 24 at 8:06






Try (inside process_item) - self.exporter = XmlItemExporter(file, item_element="article", root_element="articles"). See docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

– balderman
Mar 24 at 8:06














Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

– Juan Manuel
Mar 24 at 15:38





Thanks for your comment. I tried that option but I need only the tag <article> not <articles><article>. This is a mandatory request and the encoding too.

– Juan Manuel
Mar 24 at 15:38













So use only the 'item_element'

– balderman
Mar 24 at 15:42





So use only the 'item_element'

– balderman
Mar 24 at 15:42













It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

– Juan Manuel
Mar 24 at 18:16






It does not work. The root_element appears by default as <items>. I tried root_element=False, root_element=None, root_element='' but it does not work. The same happens in reverse.

– Juan Manuel
Mar 24 at 18:16













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316485%2fscrapy-xml-pipeline%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55316485%2fscrapy-xml-pipeline%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript