我有一个时间序列数据集,其中包含一整年的数据(日期是索引).每 15 分钟(全年)测量一次数据,每天测量 96 个时间步长.数据已经标准化.变量是相关的.除 VAR 以外的所有变量都是天气测量值.
I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures.
VAR 在一天和一周内是季节性的(因为它在周末看起来有点不同,但每个周末都不太一样).VAR 值是固定的.我想预测接下来两天(提前 192 步)和接下来 7 天(提前 672 步)的 VAR 值.
VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead).
这是数据集的示例:
DateIdx VAR dewpt hum press temp
2017-04-17 00:00:00 0.369397 0.155039 0.386792 0.196721 0.238889
2017-04-17 00:15:00 0.363214 0.147287 0.429245 0.196721 0.233333
2017-04-17 00:30:00 0.357032 0.139535 0.471698 0.196721 0.227778
2017-04-17 00:45:00 0.323029 0.127907 0.429245 0.204918 0.219444
2017-04-17 01:00:00 0.347759 0.116279 0.386792 0.213115 0.211111
2017-04-17 01:15:00 0.346213 0.127907 0.476415 0.204918 0.169444
2017-04-17 01:30:00 0.259660 0.139535 0.566038 0.196721 0.127778
2017-04-17 01:45:00 0.205564 0.073643 0.523585 0.172131 0.091667
2017-04-17 02:00:00 0.157650 0.007752 0.481132 0.147541 0.055556
2017-04-17 02:15:00 0.122101 0.003876 0.476415 0.122951 0.091667
输入数据集图
我决定在 Keras 中使用 LSTM.有了全年的数据,我使用过去 329 天的数据作为训练数据,其余的数据在训练期间进行验证.train_X -> 包含包括 329 天的 VAR 在内的全部测量值train_Y -> 仅包含 329 天的 VAR.该值向前移动了一步.其余时间步长为 test_X 和 test_Y.
I have decided to use LSTM in Keras. Having data from the whole year, I have used data from past 329 days as a training data and the rest for a validation during the training. train_X -> contains whole measures including VAR from 329 days train_Y -> contains only VAR from 329 days. The value is shifted one step ahead. The rest timesteps goes to test_X and test_Y.
这是我准备train_X和train_Y的代码:
Here is the code I prepare train_X and train_Y:
#X -> is the whole dataframe
#Y -> is a vector of VAR from whole dataframe, already shifted 1 step ahead
#329 * 96 = 31584
train_X = X[:31584]
train_X = train_X.reshape(train_X.shape[0],1,5)
train_Y = Y[:31584]
train_Y = train_Y.reshape(train_Y.shape[0],1)
为了预测下一个 VAR 值,我想使用过去 672 个时间步长(整周度量).为此,我设置了 batch_size=672
,因此fit"命令如下所示:
To predict next VAR value I would like to use past 672 timesteps (whole week measures). For this reason I have set batch_size=672
, so that the ‘fit’ command look like this:
history = model.fit(train_X, train_Y, epochs=50, batch_size=672, validation_data=(test_X, test_Y), shuffle=False)
这是我的网络架构:
model = models.Sequential()
model.add(layers.LSTM(672, input_shape=(None, 672), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(336, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(168, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(84, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(21, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mae', optimizer='adam')
model.summary()
从下图中我们可以看到,网络在 50 个 epoch 之后学习了一些东西":
From the plot below we can see that the network has learn ‘something’ after 50 epochs:
学习过程图
出于预测目的,我准备了一组数据,其中包含最后 672 个步骤,其中包含所有值和 96 个没有 VAR 值——应该预测.我还使用了自回归,所以我在每次预测后更新了 VAR 并将其用于下一次预测.
For the prediction purpose I have prepared a set of data containing last 672 steps with all values and 96 without VAR value – which should be predicted. I also used autoregression, so I updated VAR after each prediction and used it for next prediction.
predX 数据集(用于预测)如下所示:
The predX dataset (used for prediction) looks like this:
print(predX['VAR'][668:677])
DateIdx VAR
2017-04-23 23:00:00 0.307573
2017-04-23 23:15:00 0.278207
2017-04-23 23:30:00 0.284390
2017-04-23 23:45:00 0.309118
2017-04-24 00:00:00 NaN
2017-04-24 00:15:00 NaN
2017-04-24 00:30:00 NaN
2017-04-24 00:45:00 NaN
2017-04-24 01:00:00 NaN
Name: VAR, dtype: float64
这是我用来预测接下来 96 个步骤的代码(自回归):
Here is the code (autoregression) I have used to predict next 96 steps:
stepsAhead = 96
historySteps = 672
for i in range(0,stepsAhead):
j = i + historySteps
ypred = model.predict(predX.values[i:j].reshape(1,historySteps,5))
predX['VAR'][j] = ypred
不幸的是,结果很差,与预期相差甚远:
Unfortunately the results are very poor and very far from the expectations:
带有预测数据的绘图
结合前一天的结果:
结合前一天的预测数据
除了我做错了什么"这个问题,我想问几个问题:
Except from the ‘What have I done wrong‘ question, I would like to ask a few questions:
Q1. 在模型查找期间,我刚刚将整个历史记录在 672 大小的批次中.这是正确的吗?我应该如何组织模型拟合的数据集?我有什么选择?我应该使用滑动窗口"方法吗(如此处的链接:https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/)?
Q1. During model fifing, I have just put the whole history in batches of 672 size. Is it correct? How should I organize the dataset for the model fitting? What options do I have? Should I use the "sliding window" approach (like in the link here: https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/ )?
Q2. 50 个 epoch 够吗?这里的常见做法是什么?也许网络拟合不足导致预测不佳?到目前为止,我尝试了 200 个 epoch,结果相同.
Q2. Is the 50 epochs enough? What is the common practice here? Maybe the network is underfitted resulting in poor prediction? So far I tried 200 epoch with the same result.
Q3.我应该尝试不同的架构吗?提议的网络足够大"来处理这样的数据吗?也许有状态"网络是正确的方法?
Q3. Should I try a different architecture? Is the proposed network ‘big enough’ to handle such a data? Maybe a "stateful" network is the right approach here?
Q4.我是否正确实施了自回归?是否有任何其他方法可以预测前面的许多步骤,例如192 还是 672 就像我的情况一样?
Q4. Did I implement the autoregression correctly? Is there any other approach to make a prediction for many steps ahead e.g. 192 or 672 like in my case?
看起来对于如何组织数据来训练 RNN 存在混淆.那么让我们来讨论一下这些问题:
It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:
(total_samples, 5)
,你就可以使用 TimeseriesGenerator 创建一个滑动窗口,它将为您生成 (batch_size, past_timesteps, 5)
.在这种情况下,您将使用 .fit_generator
来训练网络.n
个预测数.(total_samples, 5)
you can use the TimeseriesGenerator to create a sliding window what will generate (batch_size, past_timesteps, 5)
for you. In this case, you will use .fit_generator
to train the network.n
number of predictions after training.单点预测模型可能如下所示:
The single point prediction model could look like:
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))
这篇关于Keras LSTM:时间序列多步多特征预测 - 结果不佳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!