在维护列数据类型的同时将行插入 pandas DataFrame

时间：2023-09-27

本文介绍了在维护列数据类型的同时将行插入 pandas DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

在保持列数据类型的同时，将新行插入现有 pandas DataFrame 的最佳方法是什么，同时为未指定的列提供用户定义的填充值?这是一个例子:

What's the best way to insert new rows into an existing pandas DataFrame while maintaining column data types and, at the same time, giving user-defined fill values for columns that aren't specified? Here's an example:

df = pd.DataFrame({
    'name': ['Bob', 'Sue', 'Tom'],
    'age': [45, 40, 10],
    'weight': [143.2, 130.2, 34.9],
    'has_children': [True, True, False]
})

假设我想添加一条只传递 name 和 age 的新记录.为了维护数据类型，我可以从 df 复制行，修改值，然后将 df 附加到副本，例如

Assume that I want to add a new record passing just name and age. To maintain data types, I can copy rows from df, modify values and then append df to the copy, e.g.

columns = ('name', 'age')
copy_df = df.loc[0:0, columns].copy()
copy_df.loc[0, columns] = 'Cindy', 42
new_df = copy_df.append(df, sort=False).reset_index(drop=True)

但这会将 bool 列转换为对象.

But that converts the bool column to an object.

这是一个非常老套的解决方案，感觉不是这样做的正确方法":

Here's a really hacky solution that doesn't feel like the "right way" to do this:

columns = ('name', 'age')
copy_df = df.loc[0:0].copy()

missing_remap = {
    'int64': 0,
    'float64': 0.0,
    'bool': False,
    'object': ''
}
for c in set(copy_df.columns).difference(columns)):
    copy_df.loc[:, c] = missing_remap[str(copy_df[c].dtype)]

new_df = copy_df.append(df, sort=False).reset_index(drop=True)
new_df.loc[0, columns] = 'Cindy', 42

我知道我一定错过了什么.

I know I must be missing something.

推荐答案

如你所见，由于 NaN 是 float，添加 NaN到一个系列可能会导致它被向上转换为 float 或转换为 object.您确定这不是一个理想的结果是正确的.

As you found, since NaN is a float, adding NaN to a series may cause it to be either upcasted to float or converted to object. You are right in determining this is not a desirable outcome.

没有直接的方法.我的建议是将您的输入行数据存储在字典中，并在附加之前将其与默认字典相结合.请注意，这是有效的，因为 pd.DataFrame.append 接受 dict 参数.

There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append accepts a dict argument.

在 Python 3.6 中，您可以使用语法 {**d1, **d2} 组合两个字典，并优先选择第二个.

In Python 3.6, you can use the syntax {**d1, **d2} to combine two dictionaries with preference for the second.

default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False}

row = {'name': 'Cindy', 'age': 42}

df = df.append({**default, **row}, ignore_index=True)

print(df)

   age  has_children   name  weight
0   45          True    Bob   143.2
1   40          True    Sue   130.2
2   10         False    Tom    34.9
3   42         False  Cindy     0.0

print(df.dtypes)

age               int64
has_children       bool
name             object
weight          float64
dtype: object

这篇关于在维护列数据类型的同时将行插入 pandas DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：将一个 numpy 数组附加到一个列表 - 奇怪的事情 下一篇：Python 是否优化尾递归?

在维护列数据类型的同时将行插入 pandas DataFrame

问题描述

推荐答案

相关文章