python爬虫scrapy框架的梨视频案例解析

1.爬虫文件

这里我们可以仿照爬虫文件中的parse方法，写一个新的parse方法，可以将新的url的响应对象传给这个新的parse方法
如果需要在不同的parse方法中使用同一个item对象，可以使用meta参数字典，将item传给callback回调函数
爬虫文件中的parse需要yield的Request请求，而item则在新的parse方法中使用yield item传给下一个parse方法或管道文件

import scrapy# 从items.py文件中导入BossprojectItem类from bossProject.items import BossprojectItemclass BossSpider(scrapy.Spider): name = 'boss' # allowed_domains = ['www.xxx.com'] start_urls = ['https://www.pearvideo.com/category_5'] # 回调函数接受响应对象，并且接受传递过来的meata参数 def content_parse(self,response): # meta参数包含在response响应对象中，调用meta，然后根据键值取出对应的值:item item = response.meta['item'] # 解析视频链接中的对视频的描述 des = response.xpath('//div[@class="summary"]/text()').extract() des = "".join(des) item['des'] = des yield item  # 解析首页视频的标题以及视频的链接 def parse(self, response): li_list = response.xpath('//div[@id="listvideoList"]/ul/li') for li in li_list:  href = li.xpath('./div/a/@href').extract()  href = "https://www.pearvideo.com/" + "".join(href)  title = li.xpath('./div[1]/a/div[2]/text()').extract()  title = "".join(title)  item = BossprojectItem()  item["title"] = title  #手动发送请求，并将响应对象传给回调函数  #请求传参:meta={}，可以将meta字典传递给请求对应的回调函数  yield scrapy.Request(href,callback=self.content_parse,meta={'item':item})

2.items.py

要将BossprojectItem类导入爬虫文件中才能够创建item对象

import scrapyclass BossprojectItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() # 定义了item属性 title = scrapy.Field() des = scrapy.Field()

3.pipelines.py

open_spider(self,spider)和close_spider(self,spider)重写这两个父类方法，且这两个方法都只执行一次在process_item方法中最好保留return item，因为如果存在多个管道类，return item会自动将item对象传给优先级低于自己的管道类

from itemadapter import ItemAdapterclass BossprojectPipeline: def __init__(self): self.fp = None # 重写父类方法，只调用一次 def open_spider(self,spider): print("爬虫开始") self.fp = open('./lishipin.txt','w') # 接受爬虫文件中yield传递来的item对象，将item中的内容持久化存储 def process_item(self, item, spider): self.fp.write(item['title'] + '/n/t' + item['des'] + '/n') # 如果有多个管道类，会将item传递给下一个管道类 # 管道类的优先级取决于settings.py中的ITEM_PIPELINES属性中对应的值  ## ITEM_PIPELINES = {'bossProject.pipelines.BossprojectPipeline': 300,} 键值中的值越小优先级越高 return item # 重写父类方法，只调用一次 def close_spider(self,spider):  self.fp.close() print("爬虫结束")

4.进行持久化存储

在这里插入图片描述

到此这篇关于python爬虫scrapy框架的梨视频案例解析的文章就介绍到这了,更多相关python爬虫scrapy框架内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持！

最后更新于 2021-11-05 08:44:36 并被添加「」标签，已有位童鞋阅读过。

本站使用「署名 4.0 国际」创作共享协议，可自由转载、引用，但需署名作者且注明文章出处

老多知识库

python爬虫scrapy框架的梨视频案例解析

目录

1.爬虫文件

2.items.py

3.pipelines.py

4.进行持久化存储

相关文章