1. <i id='HlO8J'><tr id='HlO8J'><dt id='HlO8J'><q id='HlO8J'><span id='HlO8J'><b id='HlO8J'><form id='HlO8J'><ins id='HlO8J'></ins><ul id='HlO8J'></ul><sub id='HlO8J'></sub></form><legend id='HlO8J'></legend><bdo id='HlO8J'><pre id='HlO8J'><center id='HlO8J'></center></pre></bdo></b><th id='HlO8J'></th></span></q></dt></tr></i><div id='HlO8J'><tfoot id='HlO8J'></tfoot><dl id='HlO8J'><fieldset id='HlO8J'></fieldset></dl></div>

    2. <small id='HlO8J'></small><noframes id='HlO8J'>

        <legend id='HlO8J'><style id='HlO8J'><dir id='HlO8J'><q id='HlO8J'></q></dir></style></legend>
          <bdo id='HlO8J'></bdo><ul id='HlO8J'></ul>
      1. <tfoot id='HlO8J'></tfoot>

        在GROUPBY集合函数中传递参数

        时间:2024-08-22
          <tbody id='lVHqf'></tbody>
            <bdo id='lVHqf'></bdo><ul id='lVHqf'></ul>
            <legend id='lVHqf'><style id='lVHqf'><dir id='lVHqf'><q id='lVHqf'></q></dir></style></legend>
            • <small id='lVHqf'></small><noframes id='lVHqf'>

                  <i id='lVHqf'><tr id='lVHqf'><dt id='lVHqf'><q id='lVHqf'><span id='lVHqf'><b id='lVHqf'><form id='lVHqf'><ins id='lVHqf'></ins><ul id='lVHqf'></ul><sub id='lVHqf'></sub></form><legend id='lVHqf'></legend><bdo id='lVHqf'><pre id='lVHqf'><center id='lVHqf'></center></pre></bdo></b><th id='lVHqf'></th></span></q></dt></tr></i><div id='lVHqf'><tfoot id='lVHqf'></tfoot><dl id='lVHqf'><fieldset id='lVHqf'></fieldset></dl></div>

                  <tfoot id='lVHqf'></tfoot>
                  本文介绍了在GROUPBY集合函数中传递参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我有数据帧,我在代码中将其引用为df,并且我在每个组的多个列上应用聚合函数。我还应用了用户定义的lambda函数f4, f5, f6, f7。有些函数非常相似,比如f4, f6f7,只是参数值不同。我是否可以从字典d传递这些参数,以便我只需编写一个函数,而不是编写多个函数?

                  f4 = lambda x: len(x[x>10]) # count the frequency of bearing greater than threshold value
                  f4.__name__ = 'Frequency'
                  
                  f5 = lambda x: len(x[x<3.4]) # count the stop points with velocity less than threshold value 3.4
                  f5.__name__ = 'stop_frequency'
                  
                  f6 = lambda x: len(x[x>0.2]) # count the points with velocity greater than threshold value 0.2
                  f6.__name__ = 'frequency'
                  
                  f7 = lambda x: len(x[x>0.25]) # count the points with accelration greater than threshold value 0.25
                  f7.__name__ = 'frequency'
                  
                  d = {'acceleration':['mean', 'median', 'min'], 
                   'velocity':[f5, 'sum' ,'count', 'median', 'min'], 
                   'velocity_rate':f6,
                   'acc_rate':f7,
                   'bearing':['sum', f4], 
                   'bearing_rate':'sum',     
                   'Vincenty_distance':'sum'}
                  
                  df1 = df.groupby(['userid','trip_id','Transportation_Mode','segmentid'], sort=False).agg(d)
                  
                  #flatenning MultiIndex in columns
                  df1.columns = df1.columns.map('_'.join)
                  #MultiIndex in index to columns
                  df1 = df1.reset_index(level=2, drop=False).reset_index()
                  

                  我喜欢编写这样的函数

                  f4(p) = lambda x: len(x[x>p]) 
                  f4.__name__ = 'Frequency'
                  
                  d = {'acceleration':['mean', 'median', 'min'], 
                   'velocity':[f5, 'sum' ,'count', 'median', 'min'], 
                   'velocity_rate':f4(0.2),
                   'acc_rate':f4(0.25),
                   'bearing':['sum', f4(10)], 
                   'bearing_rate':'sum',     
                   'Vincenty_distance':'sum'}
                  

                  数据帧DF的CSV文件在给定的链接上提供,以使数据更加清晰。 https://drive.google.com/open?id=1R_BBL00G_Dlo-6yrovYJp5zEYLwlMPi9

                  推荐答案

                  neilaronson可以解决,但不容易解决。

                  还通过布尔掩码值Truesum简化了求解。

                  def f4(p):
                      def ipf(x):
                          return (x < p).sum()
                          #your solution
                          #return len(x[x < p])
                      ipf.__name__ = 'Frequency'
                      return ipf 
                  
                  d = {'acceleration':['mean', 'median', 'min'], 
                   'velocity':[f4(3.4), 'sum' ,'count', 'median', 'min'], 
                   'velocity_rate':f4(0.2),
                   'acc_rate':f4(.25),
                   'bearing':['sum', f4(10)], 
                   'bearing_rate':'sum',     
                   'Vincenty_distance':'sum'}
                  
                  df1 = df.groupby(['userid','trip_id','Transportation_Mode','segmentid'], sort=False).agg(d)
                  
                  #flatenning MultiIndex in columns
                  df1.columns = df1.columns.map('_'.join)
                  #MultiIndex in index to columns
                  df1 = df1.reset_index(level=2, drop=False).reset_index()
                  

                  编辑:也可以传递大大小小的参数:

                  def f4(p, op):
                      def ipf(x):
                          if op == 'greater':
                              return (x > p).sum()
                          elif op == 'less':
                              return (x < p).sum()  
                          else:
                              raise ValueError("second argument has to be greater or less only")
                      ipf.__name__ = 'Frequency'
                      return ipf 
                  
                  
                  
                  d = {'acceleration':['mean', 'median', 'min'], 
                   'velocity':[f4(3.4, 'less'), 'sum' ,'count', 'median', 'min'], 
                   'velocity_rate':f4(0.2, 'greater'),
                   'acc_rate':f4(.25, 'greater'),
                   'bearing':['sum', f4(10, 'greater')], 
                   'bearing_rate':'sum',     
                   'Vincenty_distance':'sum'}
                  
                  df1 = df.groupby(['userid','trip_id','Transportation_Mode','segmentid'], sort=False).agg(d)
                  
                  #flatenning MultiIndex in columns
                  df1.columns = df1.columns.map('_'.join)
                  #MultiIndex in index to columns
                  df1 = df1.reset_index(level=2, drop=False).reset_index()
                  

                  print (df1.head())
                     userid  trip_id  segmentid Transportation_Mode  acceleration_mean  
                  0     141      1.0          1                walk           0.061083   
                  1     141      2.0          1                walk           0.109148   
                  2     141      3.0          1                walk           0.106771   
                  3     141      4.0          1                walk           0.141180   
                  4     141      5.0          1                walk           1.147157   
                  
                     acceleration_median  acceleration_min  velocity_Frequency  velocity_sum  
                  0        -1.168583e-02         -2.994428              1000.0   1506.679506   
                  1         1.665535e-09         -3.234188               464.0    712.429005   
                  2        -3.055414e-08         -3.131293               996.0   1394.746071   
                  3         9.241707e-09         -3.307262               340.0    513.461259   
                  4        -2.609489e-02         -3.190424               493.0    729.702854   
                  
                     velocity_count  velocity_median  velocity_min  velocity_rate_Frequency  
                  0            1028         1.294657      0.284747                    288.0   
                  1             486         1.189650      0.284725                    134.0   
                  2            1020         1.241419      0.284733                    301.0   
                  3             352         1.326324      0.339590                     93.0   
                  4             504         1.247868      0.284740                    168.0   
                  
                     acc_rate_Frequency   bearing_sum  bearing_Frequency  bearing_rate_sum  
                  0               169.0  81604.187066              884.0       -371.276356   
                  1                89.0  25559.589869              313.0       -357.869944   
                  2               203.0 -71540.141199               57.0        946.382581   
                  3                78.0   9548.920765              167.0       -943.184805   
                  4                93.0 -24021.555784               67.0        535.333624   
                  
                     Vincenty_distance_sum  
                  0            1506.679506  
                  1             712.429005  
                  2            1395.328768  
                  3             513.461259  
                  4             731.823664  
                  

                  这篇关于在GROUPBY集合函数中传递参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:如何使用pd.Grouper按月对日期进行分组? 下一篇:如何对具有重复列名行进行切片,并按顺序堆叠这些行

                  相关文章

                  <small id='7Ogbz'></small><noframes id='7Ogbz'>

                  <i id='7Ogbz'><tr id='7Ogbz'><dt id='7Ogbz'><q id='7Ogbz'><span id='7Ogbz'><b id='7Ogbz'><form id='7Ogbz'><ins id='7Ogbz'></ins><ul id='7Ogbz'></ul><sub id='7Ogbz'></sub></form><legend id='7Ogbz'></legend><bdo id='7Ogbz'><pre id='7Ogbz'><center id='7Ogbz'></center></pre></bdo></b><th id='7Ogbz'></th></span></q></dt></tr></i><div id='7Ogbz'><tfoot id='7Ogbz'></tfoot><dl id='7Ogbz'><fieldset id='7Ogbz'></fieldset></dl></div>
                  • <bdo id='7Ogbz'></bdo><ul id='7Ogbz'></ul>

                  1. <legend id='7Ogbz'><style id='7Ogbz'><dir id='7Ogbz'><q id='7Ogbz'></q></dir></style></legend>

                      <tfoot id='7Ogbz'></tfoot>