合肥官网设计地址_动漫制作专业能选择什么职业_站长工具精华_微信上如何投放广告

数据解析2

一、xpath结合分页爬虫

还是那上次我们讲的二手房的例子, 上次我们实战做了用xpath爬取二手房下面的标题, 房子信息, 总价, 单价, 大致地址信息。那这里我们就需要结合分页爬虫来实现爬取更多的数据。

当我们进入二手房网页的时候, 默认是第一页, 那第一页的请求的url是https://cs.lianjia.com/ershoufang/pg1/, 我们再点击最下面的页面按钮, 点击第二页, 我们可以发现, 请求的url就变成了https://cs.lianjia.com/ershoufang/pg2/。如图:

第一页:

第二页:

我们可以发现, 请求的url的规律是https://cs.lianjia.com/ershoufang/pg{页数}/。

那就好办了, 我们可以用我们之前学习过的分页爬虫的知识来解决啦!!!

分页爬虫在之前有讲到过很多次哦, 如果不熟悉的小伙伴们, 可以去看一看我之前几篇写的博客。

from lxml import etreeimport requests
import requestsheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36','cookie': "lianjia_uuid=0741e41c-75be-4e7b-9bd0-ee4002203371; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22192752ebbdd1116-0ed7d9ff3e4d3e-26001051-1474560-192752ebbded87%22%2C%22%24device_id%22%3A%22192752ebbdd1116-0ed7d9ff3e4d3e-26001051-1474560-192752ebbded87%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%7D%7D; _ga=GA1.2.1019355060.1728542865; _jzqa=1.489454228011104260.1728542851.1728548018.1728557075.4; _jzqx=1.1728543854.1728557075.1.jzqsr=hip%2Elianjia%2Ecom|jzqct=/.-; _qzja=1.151913534.1728542860751.1728548018319.1728557075265.1728557075265.1728557908126.0.0.0.7.4; _ga_4JBJY7Y7MX=GS1.2.1728557086.3.1.1728557928.0.0.0; lianjia_ssid=f3c8b7c1-375c-4f14-a4c9-acbb379c4cb4; hip=3q1TIMAzuiGCFUUH5zDyksWsjn9m0gEdHu5fF1eVR7-AhrbKmrDXh1c00aDV_L4EOrtKOjOc529AmjCFX-9Cm-5xl1Tc3u5atBBAzWdOdVtFwdT0zN7_tTl0zrqIyxFCzHN-K5dZCDEFX6PljnmkIBqgC5er6ldLmXVRCPJjLW-BVhHj9Au9XKg3Zg%3D%3D; select_city=430100; Hm_lvt_46bf127ac9b856df503ec2dbf942b67e=1728542848,1728543853,1728557075,1728707798; HMACCOUNT=7AB3E94A75916BE3; srcid=eyJ0Ijoie1wiZGF0YVwiOlwiYmIwMmYyYzU1ODZjMjNhZGJjOGVmZTZmYmEyYzVlODRjNTgwZjJmZGZlZTU4MzJhYmM0OTFiMDJiOGVhNDQxNTY4N2NlOWU4ZDQ3OTMxN2ZhYjFlMTczZTg5NzI1ZDg0YjQxZGY4ZWFlOGIxYzg3YzU2MjFlNTZlMWI0OWJjMzI3NmExOTlmOTY0YzhmOWE2ZWFhYWU2NTUyYjAzMmJjNWJiMjNkYmNiODRmNTBhYjg5NmNlOTNmNTA0MmY0ODdkNjg2MDQ5YTk5ODRmNGNmOTUwODkxNmVmOTZjMTdjYmI2MmZmYTI1NDBlYTZkOWU5MDMxNTk4ZjYyZjJlMDk3Y1wiLFwia2V5X2lkXCI6XCIxXCIsXCJzaWduXCI6XCJkZTgyMTgyMlwifSIsInIiOiJodHRwczovL2NzLmxpYW5qaWEuY29tL2Vyc2hvdWZhbmcvcGcxLyIsIm9zIjoid2ViIiwidiI6IjAuMSJ9; Hm_lpvt_46bf127ac9b856df503ec2dbf942b67e=1728709146"
}
count = 0
for page in range(1, 6):url = f'https://cs.lianjia.com/ershoufang/pg{page}/'res = requests.get(url, headers=headers)tree = etree.HTML(res.text)lis = tree.xpath('//ul[@class="sellListContent"]/li')  # 30个房屋信息的整体# print(lis)# print(len(lis))for li in lis:#     第一次循环 li=第一个房子信息的整体对象# 第一次循环，li.xpath 通过编写的xpath语法 从当前第一个li标签中去匹配内容# 配合.进行使用：代表当前标签title = li.xpath('.//div[@class="title"]/a/text()')[0]pirce = li.xpath('.//div[@class="totalPrice totalPrice2"]//text()')# [' ', '220', '万'] ---》 220万pirce = ''.join(pirce)# 获取单价，地址，户型信息#   单价unitPrice = li.xpath('.//div[@class="unitPrice"]/span/text()')[0]# 地址info_ls = li.xpath('.//div[@class="positionInfo"]//text()')info_str = ''.join(info_ls)info_str = info_str.replace(' ', '')# 户型houseInfo = li.xpath('.//div[@class="houseInfo"]/text()')[0]count += 1print(count, title, pirce, unitPrice, info_str, houseInfo)print(f'当前是第{page}页数据已经全部获取成功')

这里我们使用的分页查询, 没有使用死循环直到没有数据了结束循环的那个策略, 我们就简单的使用了for循环, 如果想要研究while true, 判断是否还有数据来决定是否继续爬虫, 这个就留给大家去研究啦, 思路和之前爬虫文章中讲的方法是一样的哦。注意:cookie的信息随时会变换, 如果爬虫发现获取到的数据缺失或者有其他的问题的话, 需要更换cookie, cookie就在请求当中就可以找到。

寻找cookie:

找到对应请求的url的标头, 往下翻找到cookie就可以了, 然后把这段cookie已经对应的值写到代码的headers里面去。

结果:

二、通过xpath实现城市找房

url是https://www.lianjia.com/city/, 我们先定义一个字典, 用来保存城市数据。先对url发起请求。还是和刚才一样使用简单的for循环分页。

import requests
from lxml import etreecity_name_input = input('请输入你要搜索的城市房屋信息')  # 长沙
# 获取所有城市
city_url = 'https://www.lianjia.com/city/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36','cookie': 'lianjia_uuid=0741e41c-75be-4e7b-9bd0-ee4002203371; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22192752ebbdd1116-0ed7d9ff3e4d3e-26001051-1474560-192752ebbded87%22%2C%22%24device_id%22%3A%22192752ebbdd1116-0ed7d9ff3e4d3e-26001051-1474560-192752ebbded87%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_referrer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%7D%7D; _ga=GA1.2.1019355060.1728542865; _jzqx=1.1728543854.1728557075.1.jzqsr=hip%2Elianjia%2Ecom|jzqct=/.-; _ga_4JBJY7Y7MX=GS1.2.1728557086.3.1.1728557928.0.0.0; hip=3q1TIMAzuiGCFUUH5zDyksWsjn9m0gEdHu5fF1eVR7-AhrbKmrDXh1c00aDV_L4EOrtKOjOc529AmjCFX-9Cm-5xl1Tc3u5atBBAzWdOdVtFwdT0zN7_tTl0zrqIyxFCzHN-K5dZCDEFX6PljnmkIBqgC5er6ldLmXVRCPJjLW-BVhHj9Au9XKg3Zg%3D%3D; select_city=430100; Hm_lvt_46bf127ac9b856df503ec2dbf942b67e=1728542848,1728543853,1728557075,1728707798; HMACCOUNT=7AB3E94A75916BE3; lianjia_ssid=1f3035d0-cd6a-4fb7-8f3f-f936defe9d17; Hm_lpvt_46bf127ac9b856df503ec2dbf942b67e=1728714289; _jzqa=1.489454228011104260.1728542851.1728557075.1728714311.5; _jzqc=1; _jzqckmp=1; _qzja=1.1533727166.1728714310904.1728714310904.1728714310905.1728714310904.1728714310905.0.0.0.1.1; _qzjc=1; _qzjto=1.1.0; srcid=eyJ0Ijoie1wiZGF0YVwiOlwiYmIwMmYyYzU1ODZjMjNhZGJjOGVmZTZmYmEyYzVlODRjMTY3MmI3ZmM5ZmI5YmMxZGQzNDQwMzFhYjZmZmQ4MDIzOWY1NTM0ZDJhNmY0MzQ5YzRkZmRhOWVkZGJkNjVhMTNiNDA5MGY1ZmY2ZDNlZGMwYzM4YjFkZmEwNDNmMzhlMjVjNTgxOGJiNWRhMGE4N2ZkNGNjYjhlYzMxN2Q2NmY5ZDFjMTVjMGY0OWQ0OWYwZWNiMDAxZjYxZGMxOWQ0NGIxMDZkZmQ3ZGM0OTkzMzJiYTQwYzVkM2RmYTU4ZWFjYWRjZjYzYzA4MDkwNjgxM2EzZmQxYjg3OTQ0MmYxMFwiLFwia2V5X2lkXCI6XCIxXCIsXCJzaWduXCI6XCJkMGRmMTgyM1wifSIsInIiOiJodHRwczovL3d3dy5saWFuamlhLmNvbS9jaXR5LyIsIm9zIjoid2ViIiwidiI6IjAuMSJ9; _jzqb=1.1.10.1728714311.1; _qzjb=1.1728714310904.1.0.0.0; _gid=GA1.2.1870308340.1728714317; _ga_TJZVFLS7KV=GS1.2.1728714317.1.0.1728714317.0.0.0; _ga_WLZSQZX7DE=GS1.2.1728714317.1.0.1728714317.0.0.0'
}
city_res = requests.get(city_url, headers=headers)
# 解析城市的名字 城市的url
# print(city_res.text)
tree = etree.HTML(city_res.text)
# 根据页面层级编写xpath
lis = tree.xpath('//div[@class="city_province"]/ul/li')
# print(len(lis))
# 定义空字典，保存城市数据
city_dict = {}

提取城市数据并把相应的url, 并记录到字典当中。

for li in lis:# 城市名city_name = li.xpath('./a/text()')[0]# 城市urlcity_url_2 = li.xpath('./a/@href')[0]# if city_name_input == city_name:# #     发请求  字典={城市名:城市url}# print(city_name,city_url_2)# 字典添加数据的语法：字典名[键] = 值city_dict[city_name] = city_url_2

判断当前输入的城市名有没有在字典中。(这里还是使用分页爬虫的方法)

# 判断当前输入的城市名有没有在字典中
# 长沙 in city_dict 键是否存在
count = 0
if city_name_input in city_dict:#     从字典中根据输入的名字获取到城市的url# 根据键获取值city_url = city_dict[city_name_input]print(city_url)#     发起请求# https://cs.lianjia.com/# https://{city_url}.lianjia.com/ershoufang/pg{page}/for page in range(1, 6):  # 分页city_res = requests.get(f'{city_url}ershoufang/pg{page}/', headers=headers)#     数据解析tree = etree.HTML(city_res.text)lis = tree.xpath('//ul[@class="sellListContent"]/li')  # 30个房屋信息的整体# print(lis)# print(len(lis))for li in lis:#     第一次循环 li=第一个房子信息的整体对象# 第一次循环，li.xpath 通过编写的xpath语法 从当前第一个li标签中去匹配内容# 配合.进行使用：代表当前标签title = li.xpath('.//div[@class="title"]/a/text()')[0]pirce = li.xpath('.//div[@class="totalPrice totalPrice2"]//text()')# [' ', '220', '万'] ---》 220万pirce = ''.join(pirce)# 获取单价，地址，户型信息#   单价unitPrice = li.xpath('.//div[@class="unitPrice"]/span/text()')[0]# 地址info_ls = li.xpath('.//div[@class="positionInfo"]//text()')info_str = ''.join(info_ls)info_str = info_str.replace(' ', '')# 户型houseInfo = li.xpath('.//div[@class="houseInfo"]/text()')[0]count += 1print(count, "\n标题:", title, "\n总价:", pirce, "\n单价:", unitPrice, "\n大致定位:", info_str, "\n房子信息:", houseInfo)print(f'当前是第{page}页数据已经全部获取成功')
else:print('此城市没有房子数据')

结果:

在这里插入图片描述

三、学习数据解析的第二种方式(BeautifulSoup)

在我们使用BeautifulSoup之前, 我们需要安装第三方库, 打开cmd, 输入以下命令:
pip install beautifulsoup4
安装成功以后, 我们在代码里面导入相应模块:
# 创建Beautifulsoup对象
from bs4 import BeautifulSoup
接下来, 我们就可以使用BeautifulSoup对象了。

获取html响应(我们还是使用上次保存好的html文件):
with open('链家.html', 'r', encoding='utf-8') as f:html_code = f.read()
bs = BeautifulSoup(html_code, 'lxml')
获取标签对象(bs4)

bs对象.标签名返回值：标签对象
print(bs.title)
# 只会获取到第一个标签
print(bs.div)
注意:用这种办法, 只会获取到第一个标签。

bs对象.find(标签名) 返回值：标签对象
print(bs.find('title'))
bs对象.find(标签名,属性名=属性值)class=“title” , 如果要通过class做限定，使用class_。
print(bs.find('div',class_='title'))
添加限制条件href=“https://cs.lianjia.com/ershoufang/104113837527.html”, 根据href进行筛选标签。
print(bs.find('a',href="https://cs.lianjia.com/ershoufang/104113837527.html"))
bs.对象.findAll(标签名) 返回值：长得像列表的一种数据类型做列表处理就好
print(bs.findAll('title'))
print(bs.findAll('a',href="https://cs.lianjia.com/ershoufang/104113837527.html"))
获取页面中所有的div class=title
print(bs.findAll('div',class_='title'))
层级结构获取

bs对象.select(css选择器) 返回值：长得像列表的一种数据类型做列表处理就好
print(bs.select('.totalPrice>span'))
常见的用法:
'''
id选择器
<div id="box"></div>
#id值  #box
bs.select("#box")class选择器
<div id="box" class="abc"></div>
.class值  .abc
bs.select(".abc")标签名选择器
<div id="box" class="abc"></div>
标签名  div
bs.select("div")子代选择器
<div  class="abc"><div><div></div></div>
</div>
.abc>div>div
bs.select(".abc>div>div")后代选择器 
<div  class="abc"><div><div><div><div></div></div></div></div>
</div>
.abc div
bs.select(".abc div")bs对象.标签名
bs对象.find(标签名,属性名=属性值)
bs对象.findAll(标签名,属性名=属性值)
bs对象.select(css选择器)'''
获取标签属性标签对象[属性名]
# 获取标签属性 标签对象[属性名]
# .title>a指的是类为title的标签里面的a标签 如<div class="title"><a href="www.baidu.com">链接</a></div>
a = bs.select('.title>a')
# print(a)
for i in a:# href是属性print(i['href'])
结果:

四、利用BeautifulSoup来分析二手房网站的数据

获取所有房屋的li标签

from bs4 import BeautifulSoupwith open('链家.html', 'r', encoding='utf-8') as f:html_code = f.read()
bs = BeautifulSoup(html_code, 'lxml')# 获取所有房屋的li标签
lis = bs.select('.sellListContent>li')

然后再依次获取在li标签里面我们想要的数据。

count = 1
for li in lis:# 第一次循环 获取第一个li标签# 标题title = li.select('.title>a')[0].text# 总价price = li.select('.totalPrice>span')[0].text + '万'# 大致定位info = li.select('.positionInfo')[0].text.replace(' ', '')# 房子信息houseInfo = li.select('.houseInfo')[0].text.replace(' ', '')# 单价unitPrice = li.select('.unitPrice>span')[0].text.replace(',', '')print(count, title, price, info, houseInfo, unitPrice)count += 1

思路也和上次的xpath差不多, 先获取到大范围的数据, 然后再逐一获取小范围里面自己想要的数据, 这样可以避免数据对应不上的情况。

完整代码:

from bs4 import BeautifulSoupwith open('链家.html', 'r', encoding='utf-8') as f:html_code = f.read()
bs = BeautifulSoup(html_code, 'lxml')# 获取所有房屋的li标签
lis = bs.select('.sellListContent>li')
count = 1
for li in lis:# 第一次循环 获取第一个li标签title = li.select('.title>a')[0].textprice = li.select('.totalPrice>span')[0].text + '万'info = li.select('.positionInfo')[0].text.replace(' ', '')houseInfo = li.select('.houseInfo')[0].text.replace(' ', '')unitPrice = li.select('.unitPrice>span')[0].text.replace(',', '')print(count, title, price, info, houseInfo, unitPrice)count += 1

结果:

在这里插入图片描述

BeautifulSoup和xpath都能实现一样的效果, 在实战当中, 我们可以可以选择擅长自己的方法去爬虫, 爬取我们想要的数据。

五、实战:

爬取星座运程的相关信息: 星座名称, 星座时间, 综合运势, 爱情运势, 财富运势, 健康运势。并且讲数据保存到json文件里面。

已知信息:星座图片网址, 相应的请求url

星座图片网址

img_url = ["http://43.143.122.8/img/白羊座.png", "http://43.143.122.8/img/金牛座.png", "http://43.143.122.8/img/双子座.png","http://43.143.122.8/img/巨蟹座.png","http://43.143.122.8/img/狮子座.png", "http://43.143.122.8/img/处女座.png", "http://43.143.122.8/img/天秤座.png","http://43.143.122.8/img/天蝎座.png","http://43.143.122.8/img/射手座.png", "http://43.143.122.8/img/摩羯座.png", "http://43.143.122.8/img/水瓶座.png","http://43.143.122.8/img/双鱼座.png"]  # 星座图片网址

相应的请求url

url = "https://www.1212.com/luck/"

完整代码:

import requests
from bs4 import BeautifulSoup
import urllib3urllib3.disable_warnings()headers = {"User-Agent": "Mozilla/5.0(Windows NT 10.0;""Win64;x64)AppleWebKit/537.36""(KHTML,like Gecko)Chrome/71.0.3578.9""8 Safari/537.76", "Connection": "close"}url = 'https://www.1212.com/luck/'  # 网址
res = requests.get(url, headers=headers, timeout=40, stream=True, verify=False)
res.encoding = "utf-8"
soup = BeautifulSoup(res.text, 'lxml')
url_list = []file = open("C:/Users/Administrator/Desktop/zodiac_signs.json", "w", encoding='utf-8')def write_json_file(L1, L2, L3, L4, L5, L6, L7, L8):"""将爬取到的内容写入json文件:param L1: name_list:param L2: date_time_list:param L3: img_url:param L4: comprehensiveFortunes_list:param L5: loveFortunes_list:param L6: careersAndStudies_list:param L7: wealthFortune_list:param L8: healthFortunes_list"""for i in range(len(url_list)):if i == 0:file.write('{\n\t"code":200,\n')file.write('\t"msg":"success",\n')file.write('\t"total":'+str(len(url_list))+',\n')file.write('\t"data":[\n')file.write('\t\t{\n\t\t\t"id":' + str(i+1) + ",\n")file.write('\t\t\t"name":' + '"' + str(L1[i]) + '"' + ",\n")file.write('\t\t\t"dateTime":' + '"' + str(L2[i]) + '"' + ",\n")file.write('\t\t\t"imgUrl":' + '"' + str(L3[i]) + '"' + ",\n")file.write('\t\t\t"comprehensiveFortunes":' + '"' + str(L4[i]) + '"' + ",\n")file.write('\t\t\t"loveFortunes":' + '"' + str(L5[i]) + '"' + ",\n")file.write('\t\t\t"careersAndStudies":' + '"' + str(L6[i]) + '"' + ",\n")file.write('\t\t\t"wealthFortune":' + '"' + str(L7[i]) + '"' + ",\n")file.write('\t\t\t"healthFortunes":' + '"' + str(L8[i]) + '"' + "\n")if i != len(url_list) - 1:file.write("\t\t},\n")else:file.write("\t\t}\n")file.write("\t]\n")file.write("}")for i in range(12):url_list.append("https://www.1212.com" +soup.find_all('div', class_="daily-luck-body")[0].find_all_next('ul')[0].find_all_next('li')[i].find_all_next('a')[0].get("href"))res.close()name_list = []  # 星座名称
date_time_list = []  # 星座时间
img_url = ["http://43.143.122.8/img/白羊座.png", "http://43.143.122.8/img/金牛座.png", "http://43.143.122.8/img/双子座.png","http://43.143.122.8/img/巨蟹座.png","http://43.143.122.8/img/狮子座.png", "http://43.143.122.8/img/处女座.png", "http://43.143.122.8/img/天秤座.png","http://43.143.122.8/img/天蝎座.png","http://43.143.122.8/img/射手座.png", "http://43.143.122.8/img/摩羯座.png", "http://43.143.122.8/img/水瓶座.png","http://43.143.122.8/img/双鱼座.png"]  # 星座图片网址
comprehensiveFortunes_list = []  # 综合运势
loveFortunes_list = []  # 爱情运势
careersAndStudies_list = []  # 事业学业
wealthFortune_list = []  # 财富运势
healthFortunes_list = []  # 健康运势
global res2
global res3
for i in range(len(url_list)):res2 = requests.get(url_list[i], headers=headers, timeout=40, verify=False)res2.encoding = "utf-8"soup2 = BeautifulSoup(res2.text, 'lxml')url3 = 'https://www.1212.com/luck/'  # 网址res3 = requests.get(url, headers=headers, timeout=40, verify=False)res3.encoding = "utf-8"soup3 = BeautifulSoup(res3.text, 'lxml')name_list.append(soup2.find_all('div', class_="xzxzmenu")[0].find_all_next('p')[0].find_all_next('em')[0].text)date_time_list.append(soup3.find_all('div', class_="daily-luck-body")[0].find_all_next('ul')[0].find_all_next('li')[i].find_all_next('div', class_="con")[0].find_all_next('span', class_="time")[0].text.replace("(", "").replace(")", ""))comprehensiveFortunes_list.append(soup2.find_all('div', class_="infro-list")[0].find_all_next('dl')[0].find_all_next('div', class_="jzbox")[0].text)loveFortunes_list.append(soup2.find_all('div', class_="infro-list")[0].find_all_next('dl')[1].find_all_next('div', class_="jzbox")[0].text)careersAndStudies_list.append(soup2.find_all('div', class_="infro-list")[0].find_all_next('dl')[2].find_all_next('div', class_="jzbox")[0].text)wealthFortune_list.append(soup2.find_all('div', class_="infro-list")[0].find_all_next('dl')[3].find_all_next('div', class_="jzbox")[0].text)healthFortunes_list.append(soup2.find_all('div', class_="infro-list")[0].find_all_next('dl')[4].find_all_next('div', class_="jzbox")[0].text)
print(name_list)
print(date_time_list)
print(comprehensiveFortunes_list)
print(loveFortunes_list)
print(careersAndStudies_list)
print(wealthFortune_list)
print(healthFortunes_list)
write_json_file(name_list, date_time_list, img_url, comprehensiveFortunes_list, loveFortunes_list,careersAndStudies_list, wealthFortune_list, healthFortunes_list)
res2.close()
res3.close()
file.close()

以上就是爬虫数据解析2的所有内容了, 如果有哪里不懂的地方,可以把问题打在评论区, 欢迎大家在评论区交流!!!
如果我有写错的地方, 望大家指正, 也可以联系我, 让我们一起努力, 继续不断的进步.
学习是个漫长的过程, 需要我们不断的去学习并掌握消化知识点, 有不懂或概念模糊不理解的情况下,一定要赶紧的解决问题, 否则问题只会越来越多, 漏洞也就越老越大.
人生路漫漫, 白鹭常相伴!!!

合肥官网设计地址_动漫制作专业能选择什么职业_站长工具精华_微信上如何投放广告

数据解析2

目录

1.补充上次讲的xpath, 并结合分页爬虫

2.通过xpath实现城市找房

3.学习数据解析的第二种方式(BeautifulSoup)

4.利用BeautifulSoup来分析二手房网站的数据

5.实战

一、xpath结合分页爬虫

二、通过xpath实现城市找房

三、学习数据解析的第二种方式(BeautifulSoup)

获取标签对象(bs4)

bs对象.标签名返回值：标签对象

bs对象.find(标签名) 返回值：标签对象

层级结构获取

四、利用BeautifulSoup来分析二手房网站的数据

五、实战:

最新新闻

热搜词

合肥官网设计地址_动漫制作专业能选择什么职业_站长工具精华_微信上如何投放广告

数据解析2

目录

1.补充上次讲的xpath, 并结合分页爬虫

2.通过xpath实现城市找房

3.学习数据解析的第二种方式(BeautifulSoup)

4.利用BeautifulSoup来分析二手房网站的数据

5.实战

一、xpath结合分页爬虫

二、通过xpath实现城市找房

三、学习数据解析的第二种方式(BeautifulSoup)

获取标签对象(bs4)

bs对象.标签名 返回值：标签对象

bs对象.find(标签名) 返回值：标签对象

层级结构获取

四、利用BeautifulSoup来分析二手房网站的数据

五、实战:

最新新闻

热搜词

bs对象.标签名返回值：标签对象