我创建了两个重复两个不同值的长列表.在第一个列表中,值交替出现,在第二个列表中,一个值出现在另一个值之前:
I create two long lists repeating two different values. In the first list the values alternate, in the second list one value's occurrences come before the other's:
a1 = [object(), object()] * 10**6
a2 = a1[::2] + a1[1::2]
然后我迭代它们,对它们什么都不做:
Then I iterate them, doing nothing with them:
for _ in a1: pass
for _ in a2: pass
两者中哪一个迭代得更快?取决于我如何测量!我用每种计时方法进行了 50 场比赛:
Which of the two gets iterated faster? Depends on how I measure! I did 50 races with each timing method:
timing method: timeit.timeit
list a1 won 50 times
list a2 won 0 times
timing method: timeit.default_timer
list a1 won 0 times
list a2 won 50 times
为什么这两种计时方法给我的结果完全相反?为什么这两个列表之间存在速度差异?我希望两次更像 25-25,而不是 50-0 和 0-50.
Why do the two timing methods give me completely opposite results? And why are there speed differences between the two lists at all? I'd expect something more like 25-25 twice, instead of 50-0 and 0-50.
以前类似问题的原因以及我认为他们在这里不负责任的原因:
Reasons from previous similar questions and why I don't think they're responsible here:
我使用的是 Python 3.10,在弱 Windows 笔记本电脑和强 Linux Google Compute Engine 实例上的结果相同.
I'm using Python 3.10, same results on a weak Windows laptop and a strong Linux Google Compute Engine instance.
完整代码:
from timeit import timeit, default_timer
a1 = [object(), object()] * 10**6
a2 = a1[::2] + a1[1::2]
for method in 'timeit', 'default_timer':
wins1 = wins2 = 0
for _ in range(50):
time1 = time2 = float('inf')
for _ in range(5):
if method == 'timeit':
t1 = timeit('for _ in a1: pass', 'from __main__ import a1', number=1)
t2 = timeit('for _ in a2: pass', 'from __main__ import a2', number=1)
else:
start = default_timer();
for _ in a1: pass
end = default_timer()
t1 = end - start
start = default_timer();
for _ in a2: pass
end = default_timer()
t2 = end - start
time1 = min(time1, t1)
time2 = min(time2, t2)
wins1 += time1 < time2
wins2 += time2 < time1
print(f'timing method: timeit.{method}')
print(f' list a1 won {wins1} times')
print(f' list a2 won {wins2} times')
print()
不是答案",而是线索:添加
Not "an answer", but a clue: add
def f(a):
for _ in a: pass
然后在default_timer
分支中替换
for _ in a1: pass
使用 f(a1)
,对于 a2
的迭代也类似.
with f(a1)
, and similarly for the iteration over a2
.
那时我看到了两件事:
timing method: timeit.timeit
list a1 won 50 times
list a2 won 0 times
timing method: timeit.default_timer
list a1 won 50 times
list a2 won 0 times
timeit
跑得比 default_timer
快的情况,timeit
runs faster than default_timer
,#2 非常明显,可能很难注意到原因 ;-) 在原来的情况下,for 循环目标 _
是一个全局模块,因此在每次迭代时更新它需要一个 dict 查找和存储.但是 timeit
会根据传递给它的内容构建一个函数,因此在 timeit
运行的代码中,_
是一个局部变量,而一个更快的 STORE_FAST
索引存储就足够了.
#2 is so obvious it may be hard to notice why ;-) In the original, the for-loop target _
is a module global, so updating it on each iteration requires a dict lookup and store. But timeit
builds a function out of the stuff passed to it, so in the code timeit
runs, _
is a local variable instead, and a faster STORE_FAST
indexed store suffices.
函数 f()
被引入以便举重"在这两种情况下都在局部变量上完成.
Function f()
was introduced so that the "heavy lifting" is done in both cases on local variables.
这里计时的代码做的很少,以至于 dict 查找和 index-a-vector-with-an-int 操作之间的差异可能非常显着
The code being timed here does so very little that the difference between a dict lookup and an index-a-vector-with-an-int operation can be highly significant
这篇关于迭代列表的奇怪速度差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!