delay() 函数有什么作用(在 Python 中与 joblib 一起使用时)

时间:2023-03-14
本文介绍了delay() 函数有什么作用(在 Python 中与 joblib 一起使用时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我已经阅读了 文档,但我不明白这是什么意思:延迟函数是一个简单的技巧,可以使用函数调用语法创建元组(函数、args、kwargs).

I've read through the documentation, but I don't understand what is meant by: The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

我正在使用它来遍历我想要操作的列表(allImages),如下所示:

I'm using it to iterate over the list I want to operate on (allImages) as follows:

def joblib_loop():
    Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)

这会返回我想要的 HOG 功能(并使用我所有的 8 个内核来提高速度),但我只是不确定它实际上在做什么.

This returns my HOG features, like I want (and with the speed gain using all my 8 cores), but I'm just not sure what it is actually doing.

我的 Python 知识充其量还可以,但我很可能缺少一些基本知识.任何指向正确方向的指针将不胜感激

My Python knowledge is alright at best, and it's very possible that I'm missing something basic. Any pointers in the right direction would be most appreciated

推荐答案

如果我们看看如果我们简单地写会发生什么事情会变得更清楚

Perhaps things become clearer if we look at what would happen if instead we simply wrote

Parallel(n_jobs=8)(getHog(i) for i in allImages)

在这种情况下,可以更自然地表达为:

which, in this context, could be expressed more naturally as:

  1. 使用 n_jobs=8
  2. 创建一个 Parallel 实例
  3. 创建列表[getHog(i) for i in allImages]
  4. 将该列表传递给 Parallel 实例
  1. Create a Parallel instance with n_jobs=8
  2. create the list [getHog(i) for i in allImages]
  3. pass that list to the Parallel instance

有什么问题?当列表被传递给 Parallel 对象时,所有 getHog(i) 调用都已经返回 - 所以没有任何东西可以并行执行!所有的工作都已经在主线程中按顺序完成了.

What's the problem? By the time the list gets passed to the Parallel object, all getHog(i) calls have already returned - so there's nothing left to execute in Parallel! All the work was already done in the main thread, sequentially.

我们实际上想要的是告诉Python我们想用什么参数调用什么函数,没有实际调用它们——换句话说,我们想要延迟执行.

What we actually want is to tell Python what functions we want to call with what arguments, without actually calling them - in other words, we want to delay the execution.

这是 delayed 方便我们做的事情,语法清晰.如果我们想告诉 Python 我们想稍后调用 foo(2, g=3),我们可以简单地写成 delayed(foo)(2, g=3).返回的是元组 (foo, [2], {g: 3}),包含:

This is what delayed conveniently allows us to do, with clear syntax. If we want to tell Python that we'd like to call foo(2, g=3) sometime later, we can simply write delayed(foo)(2, g=3). Returned is the tuple (foo, [2], {g: 3}), containing:

  • 对我们要调用的函数的引用,例如foo
  • 所有参数(简称args")不带关键字,例如2
  • 所有关键字参数(简称kwargs"),例如g=3
  • a reference to the function we want to call, e.g.foo
  • all arguments (short "args") without a keyword, e.g.t 2
  • all keyword arguments (short "kwargs"), e.g. g=3

因此,通过编写 Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages),而不是上面的顺序,现在会发生以下情况:

So, by writing Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages), instead of the above sequence, now the following happens:

  1. 创建了具有 n_jobs=8Parallel 实例

名单

 [delayed(getHog)(i) for i in allImages]

被创建,评估为

 [(getHog, [img1], {}), (getHog, [img2], {}), ... ]

  • 该列表被传递给 Parallel 实例

    Parallel 实例创建 8 个线程并将列表中的元组分配给它们

    The Parallel instance creates 8 threads and distributes the tuples from the list to them

    最后,这些线程中的每一个都开始执行元组,即,它们调用第一个元素,并将第二个和第三个元素解包为参数 tup[0](*tup[1], **tup[2]),将元组转回我们真正想要做的调用,getHog(img2).

    Finally, each of those threads starts executing the tuples, i.e., they call the first element with the second and the third elements unpacked as arguments tup[0](*tup[1], **tup[2]), turning the tuple back into the call we actually intended to do, getHog(img2).

    这篇关于delay() 函数有什么作用(在 Python 中与 joblib 一起使用时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!