集合上的 Python 迭代顺序

时间：2023-10-19

本文介绍了集合上的 Python 迭代顺序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我正在解析两个大文件(Gb 大小顺序)，每个文件都包含 keys 和相应的 values.一些 keys 在两个文件之间共享，但对应的 values 不同.对于每个文件，我想将 keys* 和相应的 values 写入一个新文件，其中 keys* 表示两者中都存在的键文件 1 和文件 2.我不在乎输出中的 key 顺序，但两个文件中的顺序绝对应该相同.

I am parsing two big files (Gb size order), that each contains keys and corresponding values. Some keys are shared between the two files, but with differing corresponding values. For each of the files, I want to write to a new file the keys* and corresponding values, with keys* representing keys present both in file1 and file2. I don't care on the key order in the output, but the should absolutely be in the same order in the two files.

文件 1:

key1
value1-1
key2
value1-2
key3
value1-3

文件2:

key1
value2-1
key5
value2-5
key2
value2-2

一个有效的输出是:

解析文件 1:

key1
value1-1
key2
value1-2

解析文件 2:

key1
value2-1
key2
value2-2

另一个有效的输出:

解析文件 1:

key2
value1-2
key1
value1-1

解析文件 2:

key2
value2-2
key1
value2-1

无效输出(文件 1 和文件 2 中的键顺序不同):

An invalid output (keys in differing order in file 1 and file 2):

解析文件 1:

key2
value1-2
key1
value1-1

解析文件 2:

key1
value2-1
key2
value2-2

最后一个精度是值大小远远大于键大小.

A last precision is that value sizes are by far bigger than key sizes.

我想做的是:

对于每个输入文件，解析并返回一个dict(我们称之为file_index)，其中key对应于文件中的key，value对应于在输入文件中找到密钥的偏移量.

For each input file, parse and return a dict (let's call it file_index) with keys corresponding to the keys in the file, and values corresponding to the offset where the key was found in the input file.

计算交集

good_keys = file1_index.viewkeys() & file2_index.viewkeys()

做一些类似(伪代码):

do something like (pseudo-code) :

for each file:
    for good_key in good_keys:
        offset = file_index[good_key]
        go to offset in input_file
        get corresponding value
        write (key, value) to output file

迭代同一个集合是否保证我有完全相同的顺序(假设它是相同的集合:我不会在两次迭代之间修改它)，或者我应该转换先设置一个列表，然后遍历列表?

Does iterating over the same set guarantee me to have the exact same order (providing that it is the same set: I won't modify it between the two iterations), or should I convert the set to a list first, and iterate over the list?

集合上的 Python 迭代顺序

问题描述

推荐答案

相关文章