• <tfoot id='ch2CB'></tfoot>
      <bdo id='ch2CB'></bdo><ul id='ch2CB'></ul>
      <legend id='ch2CB'><style id='ch2CB'><dir id='ch2CB'><q id='ch2CB'></q></dir></style></legend>

        <small id='ch2CB'></small><noframes id='ch2CB'>

      1. <i id='ch2CB'><tr id='ch2CB'><dt id='ch2CB'><q id='ch2CB'><span id='ch2CB'><b id='ch2CB'><form id='ch2CB'><ins id='ch2CB'></ins><ul id='ch2CB'></ul><sub id='ch2CB'></sub></form><legend id='ch2CB'></legend><bdo id='ch2CB'><pre id='ch2CB'><center id='ch2CB'></center></pre></bdo></b><th id='ch2CB'></th></span></q></dt></tr></i><div id='ch2CB'><tfoot id='ch2CB'></tfoot><dl id='ch2CB'><fieldset id='ch2CB'></fieldset></dl></div>

        Python pandas 基于多个列值进行分组

        时间:2024-08-21
      2. <legend id='TU70L'><style id='TU70L'><dir id='TU70L'><q id='TU70L'></q></dir></style></legend><tfoot id='TU70L'></tfoot>
        • <i id='TU70L'><tr id='TU70L'><dt id='TU70L'><q id='TU70L'><span id='TU70L'><b id='TU70L'><form id='TU70L'><ins id='TU70L'></ins><ul id='TU70L'></ul><sub id='TU70L'></sub></form><legend id='TU70L'></legend><bdo id='TU70L'><pre id='TU70L'><center id='TU70L'></center></pre></bdo></b><th id='TU70L'></th></span></q></dt></tr></i><div id='TU70L'><tfoot id='TU70L'></tfoot><dl id='TU70L'><fieldset id='TU70L'></fieldset></dl></div>

            <tbody id='TU70L'></tbody>

              1. <small id='TU70L'></small><noframes id='TU70L'>

                  <bdo id='TU70L'></bdo><ul id='TU70L'></ul>
                  本文介绍了Python pandas 基于多个列值进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我在Pandas数据集中有一个连续的活动数据。

                  #sample data code 
                  user_id = [9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4705,4705,4705,4705,4705,223,223,223,223,223,223,223,223]
                  transaction_Value= [50,125,0,100,0,1000,473,0,47,110,0,44,93,0,49,92,0,242,0,75,0,47,122,0,50,100,200,0,35,85,0,50]
                  Campaign = ['M1','M1','Used','M1','Used','W1','Used','Used','W2','W2','Used','W2','W2','Used','W2','W2','Used','O1','Used','W3','Used','W2','S1','Lost','M1','M1','M1','Used','W2','S2','Lost','S2',]
                  transaction_c= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,1,2,3,4,5,1,2,3,4,5,6,7,8]
                   
                  df = pd.DataFrame(list(zip(user_id,transaction_Value,Campaign,transaction_c)), columns =['user_id','transaction_Value', 'Campaign','transaction_c'])
                  
                  

                  到目前为止,我已经使用以下代码对数据进行了分组

                  df2 = (df.set_index(['user_id',df.groupby('user_id').cumcount()])[('transaction_Value')]
                           .unstack(fill_value='')
                           .reset_index())
                  

                  这将根据交易编号调换数值

                  | user_id | 0  | 1   | 2   | 3   | 4  | 5    | 6   | 7  | 8  | 9   | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17  | 18 |
                  |---------|----|-----|-----|-----|----|------|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|
                  | 9       | 50 | 125 | 0   | 100 | 0  | 1000 | 473 | 0  | 47 | 110 | 0  | 44 | 93 | 0  | 49 | 92 | 0  | 242 | 0  |
                  | 223     | 50 | 100 | 200 | 0   | 35 | 85   | 0   | 50 |    |     |    |    |    |    |    |    |    |     |    |
                  | 4705    | 75 | 0   | 47  | 122 | 0  |      |     |    |    |     |    |    |    |    |    |    |    |     |    |
                  

                  如何编写代码以使其更改为每次使用行值丢失

                  我可以对活动值执行相同的操作,然后将这两个数据帧堆叠在一起

                  理想输出

                  | user_id | Type        | 1    | 2    | 3    | 4    |
                  |---------|-------------|------|------|------|------|
                  | 9       | Campaign    | M1   | M1   | Used |      |
                  | 9       | Campaign    | M1   | Used |      |      |
                  | 9       | Campaign    | W1   | Used |      |      |
                  | 9       | Campaign    | Used |      |      |      |
                  | 9       | Campaign    | W2   | W2   | Used |      |
                  | 9       | Campaign    | W2   | W2   | Used |      |
                  | 9       | Campaign    | W2   | W2   | Used |      |
                  | 9       | Campaign    | O1   | Used |      |      |
                  | 223     | Campaign    | M1   | M1   | M1   | Used |
                  | 223     | Campaign    | W2   | S2   | Lost |      |
                  | 223     | Campaign    | S2   |      |      |      |
                  | 9       | Transaction | 50   | 125  | 0    |      |
                  | 9       | Transaction | 100  | 0    |      |      |
                  | 9       | Transaction | 1000 | 473  |      |      |
                  | 9       | Transaction | 0    |      |      |      |
                  | 9       | Transaction | 47   | 110  | 0    |      |
                  | 9       | Transaction | 44   | 93   | 0    |      |
                  | 9       | Transaction | 49   | 92   | 0    |      |
                  | 223     | Transaction | 242  | 0    |      |      |
                  | 223     | Transaction | 50   | 100  | 200  | 0    |
                  | 223     | Transaction | 35   | 85   | 0    |      |
                  | 223     | Transaction | 50   |      |      |      |
                  
                  

                  感谢您在解决此问题方面提供的所有帮助。 谢谢:)

                  推荐答案

                  创建分组依据测试CampaignSeries.isin,变更单依据iloc和创建分组依据Series.cumsum,添加到set_indexgroupby,然后使用DataFrame.stack并按第三级排序,最后删除第二级并将MultiIndex转换为列:

                  g = df['Campaign'].isin(['Used','Lost']).iloc[::-1].cumsum().iloc[::-1]
                  g = pd.factorize(g)[0]
                  
                  df2 = (df.set_index(['user_id',g, df.groupby(['user_id', g]).cumcount()])[['Campaign','transaction_Value']]
                            .unstack(fill_value='')
                            .stack(0)
                            .sort_index(level=[2])
                            .rename_axis(['user_id','Campaign','Type'])
                            .reset_index(level=1, drop=True)
                            .reset_index())
                  

                  print (df2)
                      user_id               Type     0     1     2     3
                  0         9           Campaign    M1    M1  Used      
                  1         9           Campaign    M1  Used            
                  2         9           Campaign    W1  Used            
                  3         9           Campaign  Used                  
                  4         9           Campaign    W2    W2  Used      
                  5         9           Campaign    W2    W2  Used      
                  6         9           Campaign    W2    W2  Used      
                  7         9           Campaign    O1  Used            
                  8       223           Campaign    M1    M1    M1  Used
                  9       223           Campaign    W2    S2  Lost      
                  10      223           Campaign    S2                  
                  11     4705           Campaign    W3  Used            
                  12     4705           Campaign    W2    S1  Lost      
                  13        9  transaction_Value    50   125     0      
                  14        9  transaction_Value   100     0            
                  15        9  transaction_Value  1000   473            
                  16        9  transaction_Value     0                  
                  17        9  transaction_Value    47   110     0      
                  18        9  transaction_Value    44    93     0      
                  19        9  transaction_Value    49    92     0      
                  20        9  transaction_Value   242     0            
                  21      223  transaction_Value    50   100   200     0
                  22      223  transaction_Value    35    85     0      
                  23      223  transaction_Value    50                  
                  24     4705  transaction_Value    75     0            
                  25     4705  transaction_Value    47   122     0      
                  

                  这篇关于Python pandas 基于多个列值进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:Pandas:按两列分组并随机选择组,这样第一列中的每个值都将由单个组表示 下一篇:基于数据帧中数字连续出现的条件概率计算

                  相关文章

                    <bdo id='oq7Wr'></bdo><ul id='oq7Wr'></ul>

                      <small id='oq7Wr'></small><noframes id='oq7Wr'>

                    1. <legend id='oq7Wr'><style id='oq7Wr'><dir id='oq7Wr'><q id='oq7Wr'></q></dir></style></legend>
                      <tfoot id='oq7Wr'></tfoot>

                    2. <i id='oq7Wr'><tr id='oq7Wr'><dt id='oq7Wr'><q id='oq7Wr'><span id='oq7Wr'><b id='oq7Wr'><form id='oq7Wr'><ins id='oq7Wr'></ins><ul id='oq7Wr'></ul><sub id='oq7Wr'></sub></form><legend id='oq7Wr'></legend><bdo id='oq7Wr'><pre id='oq7Wr'><center id='oq7Wr'></center></pre></bdo></b><th id='oq7Wr'></th></span></q></dt></tr></i><div id='oq7Wr'><tfoot id='oq7Wr'></tfoot><dl id='oq7Wr'><fieldset id='oq7Wr'></fieldset></dl></div>