当前位置：首页 > news >正文

网站维护流程个人做交通违章查询网站违法吗

news 2025/12/8 10:45:20

网站维护流程,个人做交通违章查询网站违法吗,怎么做交易猫假网站,wordpress分类1、使用标准库urllib爬取“http://news.pdsu.edu.cn/info/1005/31269.htm”平顶山学院新闻网上的图片#xff0c;要求:保存到F盘pic目录中#xff0c;文件名称命名规则为“本人姓名” “_图片编号”#xff0c;如姓名为张三的第一张图片命名为“张三_1.jpg”。 from re imp…1、使用标准库urllib爬取“http://news.pdsu.edu.cn/info/1005/31269.htm”平顶山学院新闻网上的图片要求:保存到F盘pic目录中文件名称命名规则为“本人姓名” “_图片编号”如姓名为张三的第一张图片命名为“张三_1.jpg”。 from re import findall from urllib.request import urlopenurl http://news.pdsu.edu.cn/info/1005/31269.htm with urlopen(url) as fp:contentfp.read().decode(utf-8)pattern img width500 src(.?) #查找所有图片链接地址 result findall(pattern, content) #捕获分组 #逐个读取图片数据并写入本地文件 pathf:/pic/ name烟雨 for index, item in enumerate(result):picture http://news.pdsu.edu.cn/ itemwith urlopen(str(picture)) as fp:with open(pathname_str(index1).png,wb) as fp1: #这里因为是从1开始这里注意下fp1.write(fp.read())效果图如下 2、采用scrapy爬虫框架抓取平顶山学院新闻网http://news.pdsu.edu.cn/站上的内容具体要求抓取新闻栏目将结果写入lm.txt。 cmd打开之后就别关了 scrapy startproject wsqwsq为项目名 cd wsq scrapy genspider lm news.pdsu.edu.cnlm为爬虫名称pdsu.edu.cn为爬取起始位置分析编写正确的正则表达式筛选信息由关键信息h2 classfl媒体平院/h2 筛选其正则表达式如下soup.find_all(h2, class_fl) 找到lm.py也就是上面创建的爬虫编辑将下面代码负责粘贴下 pip install beautifulsoup4 pip install scrapy 俩第三方库要安装下 # -*- coding: utf-8 -*- import scrapy from bs4 import BeautifulSoup import re class LmmSpider(scrapy.Spider):name lmmallowed_domains [pdsu.cn]start_urls [http://news.pdsu.edu.cn/]def parse(self, response):html_docresponse.textsoup BeautifulSoup(html_doc, html.parser) resoup.find_all(h2, class_fl)contentfor lm in re:print(lm.text)contentlm.text\nwith open(f:\\lm.txt, a) as fp:fp.writelines(content)#保存路径可变scrapy crawl lmlm为爬虫名称效果图如下 3、采用request爬虫模块抓取平顶山学院网络教学平台上的Python语言及应用课程上的每一章标题http://mooc1.chaoxing.com/course/206046270.html。 cmd打开之后就别关了 scrapy startproject yyyy为项目名 cd yy scrapy genspider beyond news.mooc1.chaoxing.com/course/206046270.htmlbeyond为爬虫名称mooc1.chaoxing.com/course/206046270.html为爬取起始位置分析编写正确的正则表达式筛选信息由关键信息div classf16 chapterText第一章 python概述/div 筛选其正则表达式如下soup.findAll(div,class_f16 chapterText) 找到beyond.py也就是上面创建的爬虫编辑将下面代码负责粘贴下 # -*- coding: utf-8 -*- import scrapy import re import requests import bs4headers {user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36 }urlhttp://mooc1.chaoxing.com/course/206046270.html response requests.get(url,headersheaders).text soup bs4.BeautifulSoup(response,html.parser) tsoup.findAll(div,class_f16 chapterText) for ml in t:print (ml.text) 效果图如下

查看全文

http://www.sadfv.cn/news/290895/