在 Python 中使用 ElementTree 解析带有命名空间的 XML

时间：2023-01-26

本文介绍了在 Python 中使用 ElementTree 解析带有命名空间的 XML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我有一个xml，它的一小部分看起来像这样:

I have an xml, small part of it looks like this:

<?xml version="1.0" ?>
<i:insert xmlns:i="urn:com:xml:insert" xmlns="urn:com:xml:data">
  <data>
    <image imageId="1"></image>
    <content>Content</content>
  </data>
</i:insert>

当我使用 ElementTree 解析它并将其保存到一个文件中时，我看到以下内容:

When i parse it using ElementTree and save it to a file i see following:

<ns0:insert xmlns:ns0="urn:com:xml:insert" xmlns:ns1="urn:com:xml:data">
  <ns1:data>
    <ns1:image imageId="1"></ns1:image>
    <ns1:content>Content</ns1:content>
  </ns1:data>
</ns0:insert>

为什么它会改变前缀并将它们放在任何地方?使用 minidom 我没有这样的问题.配置好了吗?ElementTree 的文档很差.问题是，在这样的解析之后我找不到任何节点，例如 image - 如果我像 {namespace}image 或只是 image.为什么?任何建议都非常感谢.


Why does it change prefixes and put them everywhere? Using minidom i don't have such problem. Is it configured? Documentation for ElementTree is very poor.
The problem is, that i can't find any node after such parsing, for example image - can't find it with or without namespace if i use it like {namespace}image or just image. Why's that? Any suggestions are strongly appreciated.
我已经尝试过的:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for a in root.findall('ns1:image'):
    print a.attrib

这会返回一个错误，而另一个则什么也不返回:
This returns an error and the other one returns nothing:
for a in root.findall('{urn:com:xml:data}image'):
    print a.attrib

我也尝试过制作这样的命名空间并使用它:
I also tried to make namespace like this and use it:
namespaces = {'ns1': 'urn:com:xml:data'}
for a in root.findall('ns1:image', namespaces):
    print a.attrib

它什么也不返回.我做错了什么?
It returns nothing. What am i doing wrong?
推荐答案
这个片段来自你的问题，
This snippet from your question,
for a in root.findall('{urn:com:xml:data}image'):
    print a.attrib

不输出任何内容，因为它只查找树根的直接 {urn:com:xml:data}image 子级.
does not output anything because it only looks for direct {urn:com:xml:data}image children of the root of the tree.
这个稍加修改的代码，
for a in root.findall('.//{urn:com:xml:data}image'):
    print a.attrib

将打印 {'imageId': '1'} 因为它使用 .//，它会选择所有级别的匹配子元素.
will print {'imageId': '1'} because it uses .//, which selects matching subelements on all levels.
参考:https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax.
ElementTree 默认情况下不仅保留原始命名空间前缀有点烦人，但请记住，无论如何，前缀并不重要.register_namespace() 函数可用于在序列化 XML 时设置所需的前缀.该函数对解析或搜索没有任何影响.
It is a bit annoying that ElementTree does not just retain the original namespace prefixes by default, but keep in mind that it is not the prefixes that matter anyway. The register_namespace() function can be used to set the wanted prefix when serializing the XML. The function does not have any effect on parsing or searching.

                        这篇关于在 Python 中使用 ElementTree 解析带有命名空间的 XML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：在 lxml 中解析 XML 时如何不加载注释 
下一篇：Python 中的大型 XML 文件解析 

 
相关文章
python在循环内任意递增迭代器python arbitrarily incrementing an iterator inside a loop(python在循环内任意递增迭代器)
加入一组产生 Python 迭代器的有序整数Joining a set of ordered-integer yielding Python iterators(加入一组产生 Python 迭代器的有序整数)
在 Python 3 中迭代字典 items()、values()、keys()Iterating over dictionary items(), values(), keys() in Python 3(在 Python 3 中迭代字典 items()、values()、keys())
Python 迭代器的 Perl 版本是什么?What is the Perl version of a Python iterator?(Python 迭代器的 Perl 版本是什么?)
如何使用 Python C API 创建生成器/迭代器?How to create a generator/iterator with the Python C API?(如何使用 Python C API 创建生成器/迭代器?)
Python 生成器行为Python generator behaviour(Python 生成器行为)



最新文章

在 Python 中通过 ElementTree 解析 xml 时如何保留命名空间
如何获取 Python 中两个 xml 标签之间的全部内容?
XML 声明独立=“是"；lxml
即使在 pretty_print=True 时，使用 lxml 编写也不会产生空格
如何让 BeautifulSoup 4 尊重自闭标签?
如何使用 BeautifulSoup 访问命名空间的 XML 元素?
lxml etree xmlparser 删除不需要的命名空间
Python元素树 - 从元素中提取文本，剥离标签
UnicodeEncodeError:'ascii' 编解码器无法在位置 0 编码字符 u'xe
“findAll"之间的区别和“find_all"在美丽汤