在 Pandas 数据框列中填充缺失的日期值

时间:2023-01-23
本文介绍了在 Pandas 数据框列中填充缺失的日期值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我正在使用 Pandas 使用 Data Frames 存储股票价格数据.数据集中有 2940 行.数据集快照如下所示:

I'm using Pandas to store stock prices data using Data Frames. There are 2940 rows in the dataset. The Dataset snapshot is displayed below:

时间序列数据不包含周六和周日的值.因此,必须填充缺失值.
这是我写的代码,但没有解决问题:

The time series data does not contain the values for Saturday and Sunday. Hence missing values have to be filled.
Here is the code I've written but it is not solving the problem:

import pandas as pd
import numpy as np
import os
os.chdir('C:/Users/Admin/Analytics/stock-prices')

data  = pd.read_csv('stock-data.csv')

# PriceDate Column - Does not contain Saturday and Sunday stock entries
data['PriceDate'] =  pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_index(by=['PriceDate'], ascending=[True])


# Starting date is Aug 25 2004
idx = pd.date_range('08-25-2004',periods=2940,freq='D')


data = data.set_index(idx)
data['newdate']=data.index
newdate=data['newdate'].values   # Create a time series column   


data = pd.merge(newdate, data, on='PriceDate', how='outer')

如何填补周六周日的缺失值?

推荐答案

我觉得你可以使用 resampleffillbfill,但在 set_index 来自 PriceDate 列:

I think you can use resample with ffill or bfill, but before set_index from column PriceDate:

print (data)
   ID  PriceDate  OpenPrice  HighPrice
0   1  6/24/2016          1          2
1   2  6/23/2016          3          4
2   2  6/22/2016          5          6
3   2  6/21/2016          7          8
4   2  6/20/2016          9         10
5   2  6/17/2016         11         12
6   2  6/16/2016         13         14

data['PriceDate'] =  pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_values(by=['PriceDate'], ascending=[True])
data.set_index('PriceDate', inplace=True)
print (data)
            ID  OpenPrice  HighPrice
PriceDate                           
2016-06-16   2         13         14
2016-06-17   2         11         12
2016-06-20   2          9         10
2016-06-21   2          7          8
2016-06-22   2          5          6
2016-06-23   2          3          4
2016-06-24   1          1          2

data = data.resample('D').ffill().reset_index()
print (data)
   PriceDate  ID  OpenPrice  HighPrice
0 2016-06-16   2         13         14
1 2016-06-17   2         11         12
2 2016-06-18   2         11         12
3 2016-06-19   2         11         12
4 2016-06-20   2          9         10
5 2016-06-21   2          7          8
6 2016-06-22   2          5          6
7 2016-06-23   2          3          4
8 2016-06-24   1          1          2

<小时>

data = data.resample('D').bfill().reset_index()
print (data)
   PriceDate  ID  OpenPrice  HighPrice
0 2016-06-16   2         13         14
1 2016-06-17   2         11         12
2 2016-06-18   2          9         10
3 2016-06-19   2          9         10
4 2016-06-20   2          9         10
5 2016-06-21   2          7          8
6 2016-06-22   2          5          6
7 2016-06-23   2          3          4
8 2016-06-24   1          1          2

这篇关于在 Pandas 数据框列中填充缺失的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

上一篇:如何在 Pandas DataFrame 上计算滚动累积乘积 下一篇:选择由 DatetimeIndex 索引的 Pandas DataFrame 的子集和时间戳列表

相关文章

最新文章