我有一些代码可以在我自己的类 R 的 C# DataFrame 类中处理数百万行数据.有许多并行迭代数据行的 Parallel.ForEach 调用.此代码使用 VS2013 和 .NET 4.5 运行了一年多,没有出现任何问题.
I have some code to process several million data rows in my own R-like C# DataFrame class. There's a number of Parallel.ForEach calls for iterating over the data rows in parallel. This code has been running for over a year using VS2013 and .NET 4.5 without issues.
我有两台开发机器(A 和 B),最近将机器 A 升级到 VS2015.我开始注意到我的代码中有大约一半的时间出现了奇怪的间歇性冻结.让它运行了很长时间,结果证明代码最终完成了.只需 15-120 分钟,而不是 1-2 分钟.
I have two dev machines (A and B) and recently upgraded machine A to VS2015. I started noticing a strange intermittent freeze in my code about half the time. Letting it run for a long time, it turns out that the code does eventually finish. It just takes 15-120 minutes instead of 1-2 minutes.
由于某种原因,使用 VS2015 调试器尝试中断所有操作总是失败.所以我插入了一堆日志语句.事实证明,当在 Parallel.ForEach 循环期间存在 Gen2 收集时会发生这种冻结(比较每个 Parallel.ForEach 循环之前和之后的收集计数).整个额外的 13-118 分钟都花在了 Parallel.ForEach 循环调用碰巧与 Gen2 集合(如果有)重叠的地方.如果在任何 Parallel.ForEach 循环期间没有 Gen2 集合(大约 50% 的时间在我运行它时),那么一切都会在 1-2 分钟内完成.
Attempts to Break All using the VS2015 debugger keep failing for some reason. So I inserted a bunch of log statements. It turns out that this freeze occurs when there is a Gen2 collection during a Parallel.ForEach loop (comparing the collection count before and after each Parallel.ForEach loop). The entire extra 13-118 minutes is spent inside whichever Parallel.ForEach loop call happens to overlap with a Gen2 collection (if any). If there are no Gen2 collections during any Parallel.ForEach loops (about 50% of the time when I run it), then everything finishes fine in 1-2 minutes.
当我在机器 A 上的 VS2013 中运行相同的代码时,我得到了相同的冻结.当我在机器 B(从未升级)上运行 VS2013 中的代码时,它运行良好.它在一夜之间运行了几十次,没有结冰.
When I run the same code in VS2013 on Machine A, I get the same freezes. When I run the code in VS2013 on Machine B (which was never upgraded), it works perfectly. It ran dozens of time overnight with no freezing.
我注意到/尝试过的一些事情:
Some things I've noticed / tried:
我根本不会更改默认的 GC 设置.根据 GCSettings,所有运行都发生在 LatencyMode Interactive 和 IsServerGC 为 false 的情况下.
I'm not changing the default GC settings at all. According to GCSettings, all runs are happening with LatencyMode Interactive and IsServerGC as false.
我可以在每次调用 Parallel.ForEach 之前切换到 LowLatency,但我真的更想了解发生了什么.
I could just switch to LowLatency before every call to Parallel.ForEach, but I'd really prefer to understand what's going on.
有没有其他人在 VS2015 升级后看到 Parallel.ForEach 出现奇怪的冻结?有什么好的下一步计划的想法吗?
Has anyone else seen strange freezes in Parallel.ForEach after the VS2015 upgrade? Any ideas on what a good next step would be?
更新 1:在上面模糊的解释中添加一些示例代码...
这里有一些示例代码,我希望能证明这个问题.此代码在 B 机器上运行 10-12 秒,始终如一.它遇到了许多 Gen2 集合,但它们几乎不需要任何时间.如果我取消注释两个 GC 设置行,我可以强制它没有 Gen2 集合.然后在 30-50 秒时稍慢.
Here is some sample code that I hope will demonstrate this issue. This code runs in 10-12 seconds on B machine, consistently. It encounters a number of Gen2 collections, but they take almost no time at all. If I uncomment the two GC settings lines, I can force it to have no Gen2 collections. It's somewhat slower then at 30-50 seconds.
现在在我的 A 机器上,代码需要随机的时间.似乎在 5 到 30 分钟之间.而且它似乎变得更糟,它遇到的 Gen2 集合越多.如果我取消注释两条 GC 设置行,机器 A 也需要 30-50 秒(与机器 B 相同).
Now on my A machine, the code takes a random amount of time. Seems to be between 5 and 30 minutes. And it seems to get worse, the more Gen2 collections it encounters. If I uncomment the two GC settings lines, it takes 30-50 seconds on Machine A also (same as Machine B).
可能需要在行数和数组大小方面进行一些调整才能显示在另一台机器上.
It might take some tweaking in terms of the number of rows and array size for this to show up on another machine.
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Linq;
using System.Runtime;
public class MyDataRow
{
public int Id { get; set; }
public double Value { get; set; }
public double DerivedValuesSum { get; set; }
public double[] DerivedValues { get; set; }
}
class Program
{
static void Example()
{
const int numRows = 2000000;
const int tempArraySize = 250;
var r = new Random();
var dataFrame = new List<MyDataRow>(numRows);
for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { Id = i, Value = r.NextDouble() });
Stopwatch stw = Stopwatch.StartNew();
int gcs0Initial = GC.CollectionCount(0);
int gcs1Initial = GC.CollectionCount(1);
int gcs2Initial = GC.CollectionCount(2);
//GCSettings.LatencyMode = GCLatencyMode.LowLatency;
Parallel.ForEach(dataFrame, dr =>
{
double[] tempArray = new double[tempArraySize];
for (int j = 0; j < tempArraySize; j++) tempArray[j] = Math.Pow(dr.Value, j);
dr.DerivedValuesSum = tempArray.Sum();
dr.DerivedValues = tempArray.ToArray();
});
int gcs0Final = GC.CollectionCount(0);
int gcs1Final = GC.CollectionCount(1);
int gcs2Final = GC.CollectionCount(2);
stw.Stop();
//GCSettings.LatencyMode = GCLatencyMode.Interactive;
Console.Out.WriteLine("ElapsedTime = {0} Seconds ({1} Minutes)", stw.Elapsed.TotalSeconds, stw.Elapsed.TotalMinutes);
Console.Out.WriteLine("Gcs0 = {0} = {1} - {2}", gcs0Final - gcs0Initial, gcs0Final, gcs0Initial);
Console.Out.WriteLine("Gcs1 = {0} = {1} - {2}", gcs1Final - gcs1Initial, gcs1Final, gcs1Initial);
Console.Out.WriteLine("Gcs2 = {0} = {1} - {2}", gcs2Final - gcs2Initial, gcs2Final, gcs2Initial);
Console.Out.WriteLine("Press Any Key To Exit...");
Console.In.ReadLine();
}
static void Main(string[] args)
{
Example();
}
}
更新 2:只是为了将内容从评论中移出以供未来读者使用...
此修补程序:https://support.microsoft.com/en-us/kb/3088957 彻底解决问题.申请后我根本没有看到任何缓慢的问题.
This hotfix: https://support.microsoft.com/en-us/kb/3088957 totally fixes the issue. I'm not seeing any slowness issues at all after applying.
事实证明与 Parallel.ForEach 没有任何关系,我相信基于此:http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx 尽管由于某种原因该修补程序确实提到了 Parallel.ForEach.
It turned out not to have anything to do with Parallel.ForEach I believe based on this: http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx though the hotfix does mention Parallel.ForEach for some reason.
现在看来问题已经解决了,见http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx
It looks like the problem has been addressed now, see http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx
这篇关于VS2015升级后的垃圾收集和Parallel.ForEach问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!