如何将网页保存为文本文件 [Python]

时间：2023-11-08

本文介绍了如何将网页保存为文本文件 [Python]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我想将网页(所有内容)保存为文本文件.(好像您确实右键单击网页 -> 另存为" -> 另存为文本文件"，而不是作为 html 文件)

I would like to save a web page (all content) as a text file. (As if you did right click on webpage -> "Save Page As" -> "Save as text file" and not as html file)

我已尝试使用以下代码:

I have tried using the following code:

import urllib2
url=''
page = urllib2.urlopen(url)
page_content = page.read()
file = open('file_text.txt', 'w')
f.write(page_content)
f.close()

我的目标是能够在没有 html 代码的情况下保存整个文本.(例如我想读è"而不是&eacute")

My goal is to be able to save a whole text without html code. (for example i would like read "è" instead "&eacute")

推荐答案

看看html2text如上所述其他地方

import urllib2
import html2text
url=''
page = urllib2.urlopen(url)
html_content = page.read()
rendered_content = html2text.html2text(html_content)
file = open('file_text.txt', 'w')
file.write(rendered_content)
file.close()

这篇关于如何将网页保存为文本文件 [Python]的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：在名称中使用字符串和迭代索引的 Python 中的 savefig 下一篇：在 Python 中将 3D 数组保存到一堆 2D 图像中

如何将网页保存为文本文件 [Python]

问题描述

推荐答案

相关文章