如何根据根据条件重置的累积总和进行分组

时间：2023-08-30

本文介绍了如何根据根据条件重置的累积总和进行分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我有一个 pandas df，其字数与文章相对应.我希望能够添加另一列 MERGED，该列基于具有最小累积总和min_words"的文章组.

I have a pandas df with word counts corresponding to articles. I want to be able to be able to add another column MERGED that is based on groups of articles that have a minimum cumulative sum of 'min_words'.

df = pd.DataFrame([[  0,  6],
       [  1,  10],
       [  3,   5],
       [  4,   7],
       [  5,  26],
       [  6,   7],
       [  9,   4],
       [ 10, 133],
       [ 11,  42],
       [ 12,   1]], columns=['ARTICLE', 'WORD_COUNT'])

df
Out[15]: 
   ARTICLE  WORD_COUNT
0        0           6
1        1          10
2        3           5
3        4           7
4        5          26
5        6           7
6        9           4
7       10         133
8       11          42
9       12           1

那么如果 min_words = 20 这是所需的输出:

So then if min_words = 20 this is the desired output:

    df
Out[17]: 
   ARTICLE  WORD_COUNT  MERGED
0        0           6       0
1        1          10       0
2        3           5       0
3        4           7       1
4        5          26       1
5        6           7       2
6        9           4       2
7       10         133       2
8       11          42       3
9       12           1       4

如上所示，最终文章可能不满足 min_words 条件，这没关系.

As seen above, it is possible that the final article(s) won't satisfy the min_words condition, and that's ok.

推荐答案

只能做self def功能

We can only do self def function

def dymcumsum(v, limit):
     idx = []
     sums = 0
     for i in range(len(v)):
         sums += v[i]
         if sums >= limit:
             idx.append(i)
             sums = 0
     return(idx)
df['New']=np.nan
df.loc[dymcumsum(df.WORD_COUNT,20),'New']=1
df.New=df.New.iloc[::-1].eq(1).cumsum()[::-1].factorize()[0]+1
 
df
   ARTICLE  WORD_COUNT  New
0        0           6    1
1        1          10    1
2        3           5    1
3        4           7    2
4        5          26    2
5        6           7    3
6        9           4    3
7       10         133    3
8       11          42    4
9       12           1    5

这篇关于如何根据根据条件重置的累积总和进行分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：使用条件语句替换 pandas DataFrame 中的条目 下一篇：测试字典键是否存在的条件始终为 False

如何根据根据条件重置的累积总和进行分组

问题描述

推荐答案

相关文章