Python 3:Pool 是否保持传递给 map 的原始数据顺序?

时间：2023-03-14

本文介绍了Python 3:Pool 是否保持传递给 map 的原始数据顺序?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我编写了一个小脚本来在 4 个线程之间分配工作负载并测试结果是否保持有序(相对于输入的顺序):

I have written a little script to distribute workload between 4 threads and to test whether the results stay ordered (in respect to the order of the input):

from multiprocessing import Pool
import numpy as np
import time
import random


rows = 16
columns = 1000000

vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)

def worker(arr):
    time.sleep(random.random())        # let the process sleep a random
    for idx in np.ndindex(arr.shape):  # amount of time to ensure that
        arr[idx] += 1                  # the processes finish at different
                                       # time steps
    return arr

# create the threadpool
with Pool(4) as p:
    # schedule one map/worker for each row in the original data
    q = p.map(worker, [row for row in vals])

for idx, row in enumerate(q):
    print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))

对我来说，这总是会导致:

For me this always results in:

[00]:        1 -  1000000
[01]:  1000001 -  2000000
[02]:  2000001 -  3000000
[03]:  3000001 -  4000000
[04]:  4000001 -  5000000
[05]:  5000001 -  6000000
[06]:  6000001 -  7000000
[07]:  7000001 -  8000000
[08]:  8000001 -  9000000
[09]:  9000001 - 10000000
[10]: 10000001 - 11000000
[11]: 11000001 - 12000000
[12]: 12000001 - 13000000
[13]: 13000001 - 14000000
[14]: 14000001 - 15000000
[15]: 15000001 - 16000000

问题:那么，Pool在q<中存储每个map函数的结果时，是否真的保持原始输入的顺序?/代码>?


Question: So, does Pool really keep the original input's order when storing the results of each map function in q?
旁注:我问这个，因为我需要一种简单的方法来并行处理多个工人的工作.在某些情况下，排序无关紧要.但是，在某些情况下(如 q 中的结果)必须以原始顺序返回，因为我使用了一个依赖于有序数据的附加 reduce 函数.
Sidenote: I am asking this, because I need an easy way to parallelize work over several workers. In some cases the ordering is irrelevant. However, there are some cases where the results (like in q) have to be returned in the original order, because I'm using an additional reduce function that relies on ordered data.
性能:在我的机器上，这个操作比在单个进程上的正常执行快了大约 4 倍(正如预期的那样，因为我有 4 个内核).此外，所有 4 个内核在运行时均处于 100% 的使用率.
Performance: On my machine this operation is about 4 times faster (as expected, since I have 4 cores) than normal execution on a single process. Additionally, all 4 cores are at 100% usage during the runtime.
推荐答案
Pool.map 结果是有序的.如果您需要订购，很好；如果你不这样做，池.imap_unordered 可能是一个有用的优化.
Pool.map results are ordered. If you need order, great; if you don't, Pool.imap_unordered may be a useful optimization.
请注意，虽然您从 Pool.map 接收结果的顺序是固定的，但它们的计算顺序是任意的.
Note that while the order in which you receive the results from Pool.map is fixed, the order in which they are computed is arbitrary.

                        这篇关于Python 3:Pool 是否保持传递给 map 的原始数据顺序?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：PyInstaller 构建的 Windows EXE 因多处理而失败 
下一篇：为什么 multiprocessing.Process 在 windows 和 linux 上对于全局对象和函数参数的行 

 
相关文章

     
    
如何将python dict与多处理同步
Python:在使用多处理池时使用队列写入单个文件
带有工作进程的 python 池
了解多处理:Python 中的共享内存管理、锁和队列
多处理模块中的 ThreadPool 与 Pool 有什么区别?
将 asyncio 与多处理结合起来会出现什么样的问题(如果有的话)?
多处理 AsyncResult.get() 在 Python 3.7.2 中挂起，但在 3.6 中没有
将 Python Twisted 与多处理混合使用?
对每个进程使用具有不同随机种子的 python 多处理
Python多处理错误:AttributeError:模块'__main__'没有属性'__sp