pandas 时间序列重新采样和插值在一起

I have timestamped sensor data. Because of technical details, I get data from the sensors at approximately one minute intervals. The data may look like this:

   tstamp               val
0  2016-09-01 00:00:00  57
1  2016-09-01 00:01:00  57
2  2016-09-01 00:02:23  57
3  2016-09-01 00:03:04  57
4  2016-09-01 00:03:58  58
5  2016-09-01 00:05:00  60

现在,基本上,如果我能在准确的时间获得所有数据,我会非常高兴,但我没有.保存分布和每分钟都有数据的唯一方法是插值.例如,在行索引 1 和 2 之间有 83 秒,而在精确分钟获取值的自然方法是在两行数据之间进行插值(在本例中为 57,但事实并非如此无处不在).

Now, essentially, I would be extremely happy if I got all data at the exact minute, but I don't. The only way to conserve the distribution and have data at each minute is to interpolate. For example, between row indexes 1 and 2 there are 83 seconds, and the natural way to get a value at the exact minute is to interpolate between the two rows of data (in this case, it is 57, but that is not the case everywhere).


Right now, my approach is to do the following:

date = pd.to_datetime(df['measurement_tstamp'].iloc[0].date())
ts_d = df['measurement_tstamp'].dt.hour * 60 * 60 +
       df['measurement_tstamp'].dt.minute * 60 +
ts_r = np.arange(0, 24*60*60, 60)
data = scipy.interpolate.interp1d(x=ts_d, y=df['speed'].values)(ts_r)
req = pd.Series(data, index=pd.to_timedelta(ts_r, unit='s'))
req.index = date + req.index

但这对我来说感觉相当冗长和漫长.有一些出色的 pandas 方法可以进行重采样、舍入等.我整天都在阅读它们,但事实证明,没有任何东西可以按照我想要的方式进行插值.resamplegroupby 一样工作,并平均落在一起的时间点.fillna 进行插值,但不是在 resample 已经通过平均改变数据之后.

But this feels rather drawn out and long to me. There are excellent pandas methods that do resampling, rounding, etc. I have been reading them all day, but it turns out that nothing does interpolation just the way I want it. resample works like a groupby and averages time points that fall together. fillna does interpolation, but not after resample has already altered the data by averaging.


Am I missing something, or is my approach the best there is?

为简单起见,假设我按天和传感器对数据进行分组,因此一次只插入来自一个传感器的 24 小时时段.

For simplicity, assume that I group the data by day, and by sensor, so only a 24 hour period from one sensor is interpolated at a time.


d = df.set_index('tstamp')
t = d.index
r = pd.date_range(t.min().date(), periods=24*60, freq='T')


注意,periods=24*60 适用于每日数据,而不适用于问题中提供的样本.对于该示例,periods=6 将起作用.

Note, periods=24*60 works on daily data, not on the sample provided in the question. For that sample, periods=6 will work.

