How does the response.url know which url we’re requesting?(Scrapy)

Notice that when you define the class you are creating a child class of scrapy.Spider, therefore inheriting the methods and attributes of the parent class.

class PostsSpider(scrapy.Spider):

This parent class has a method called start_requests (source code) that will use the URLs defined in the class variable start_urls to create requests. When a Request object is created it carries a callback function. That is the function that will be called when Scrapy Engine receives a response for the request.

The default callback function is called parse, that’s why you are expected to implement a function named parse that will parse the response in your spider.

When called, this callback function receives an argument named response. That is nothing more than an object containing all the information regarding the response of the request, including what was the URL you made the request. (the response.url attribute)

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top