Python Scrapy | How to pass the response to the main function from the spider

If I understand correctly you want the spider to return the response and parse it in the main script?

main.py:

from scrapy.crawler import CrawlerProcess, CrawlerRunner
from scrapy.utils.project import get_project_settings
from scrapy.signalmanager import dispatcher
from scrapy import signals


def spider_output(spider):
    output = []

    def get_output(item):
        output.append(item)

    dispatcher.connect(get_output, signal=signals.item_scraped)

    settings = get_project_settings()
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

    return output


if __name__ == "__main__":
    spider = "exampleSpider"
    response = spider_output(spider)
    response = response[0]['response']
    title = response.xpath('//h3//text()').get()
    price = response.xpath('//div[@class="card-body"]/h4/text()').get()

    print(f"Title: {title}")
    print(f"Price: {price}")

We start the spider and appending the yielded items to output. Since output has only one value we don’t have to loop at and just take the first value response[0]. Then we want to get the value from the key response, so response = response[0]['response'].

spider.py:

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ['https://scrapingclub.com/exercise/detail_basic/']

    def parse(self, response):
        yield {'response': response}

Here we return an item with the response.

The steps are: main->spider_output->spider-> return response item to spider_output ->append the items to output list -> return output to main -> get the response from output -> parse the response.

Output:

Title: Long-sleeved Jersey Top
Price: $12.99

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top