捡肥皂网

python爬取新浪财经新闻内容

2022-08-04 1,221 0

本次内容易错点

1.乱码的问题，要熟练使用以下自动识别html里的编码格式

html.encoding=html.apparent_encoding

2.for循环的嵌套问题，要注意第2个for循环print的位置

        for article in articles:
            res+=article.text
        print('新闻内容：',res)

最后放出全部代码，算是基础知识的一次复习

from bs4 import BeautifulSoup
import requests
url="https://finance.sina.com.cn/"
html=requests.get(url)
html.encoding=html.apparent_encoding
soup=BeautifulSoup(html.text,'lxml')
lis=soup.select('.m-p1-m-blk2 .m-p1-mb2-list.m-list-container ul li a')
for li in lis:
    title=li.text
    InnerUrl=li['href']
    if InnerUrl.endswith('shtml') and len(title)>3:
        print(title,InnerUrl)
        html=requests.get(InnerUrl)
        html.encoding=html.apparent_encoding
        soup=BeautifulSoup(html.text,'lxml')
        articles=soup.select('.article p')
        res=''
        for article in articles:
            res+=article.text
        print('新闻内容：',res)
        with open('xinlang.txt','a',encoding='utf8')as f:
            f.write(res+'\n')

K’

编程实例

0 0

爬虫实现批量下载酷狗音乐（旧）

python抓取唯品会3D打印笔信息

python 免费下载歌曲和破解VIP视频

Python抓取淘宝评论（1）

Python抓取3D打印笔天猫评论（3）

利用python对电脑文件进行分类整理

Python抓取3D打印笔天猫评论（1）

python抓取唯品会3D打印笔信息

发布评论取消回复

K’

渡己

文章38 评论0