<bdo id='4ecGz'></bdo><ul id='4ecGz'></ul>

    <tfoot id='4ecGz'></tfoot>
  1. <i id='4ecGz'><tr id='4ecGz'><dt id='4ecGz'><q id='4ecGz'><span id='4ecGz'><b id='4ecGz'><form id='4ecGz'><ins id='4ecGz'></ins><ul id='4ecGz'></ul><sub id='4ecGz'></sub></form><legend id='4ecGz'></legend><bdo id='4ecGz'><pre id='4ecGz'><center id='4ecGz'></center></pre></bdo></b><th id='4ecGz'></th></span></q></dt></tr></i><div id='4ecGz'><tfoot id='4ecGz'></tfoot><dl id='4ecGz'><fieldset id='4ecGz'></fieldset></dl></div>

    <legend id='4ecGz'><style id='4ecGz'><dir id='4ecGz'><q id='4ecGz'></q></dir></style></legend>
  2. <small id='4ecGz'></small><noframes id='4ecGz'>

    1. 如何在一行中计算数据框中的并发事件?

      时间:2023-08-29
      <legend id='xPgcW'><style id='xPgcW'><dir id='xPgcW'><q id='xPgcW'></q></dir></style></legend>
        <bdo id='xPgcW'></bdo><ul id='xPgcW'></ul>

          • <small id='xPgcW'></small><noframes id='xPgcW'>

            1. <i id='xPgcW'><tr id='xPgcW'><dt id='xPgcW'><q id='xPgcW'><span id='xPgcW'><b id='xPgcW'><form id='xPgcW'><ins id='xPgcW'></ins><ul id='xPgcW'></ul><sub id='xPgcW'></sub></form><legend id='xPgcW'></legend><bdo id='xPgcW'><pre id='xPgcW'><center id='xPgcW'></center></pre></bdo></b><th id='xPgcW'></th></span></q></dt></tr></i><div id='xPgcW'><tfoot id='xPgcW'></tfoot><dl id='xPgcW'><fieldset id='xPgcW'></fieldset></dl></div>
              <tfoot id='xPgcW'></tfoot>
                  <tbody id='xPgcW'></tbody>
                本文介绍了如何在一行中计算数据框中的并发事件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我有一个电话数据集.我想计算每条记录有多少活动呼叫.我发现了这个问题,但我想避免循环和功能.

                I have a dataset with phone calls. I want to count how many active calls there are for each record. I found this question but I'd like to avoid loops and functions.

                每个调用都有一个日期、一个开始时间和一个结束时间.

                Each call has a date, a start time and a end time.

                数据框:

                      start       end        date
                0  09:17:12  09:18:20  2016-08-10
                1  09:15:58  09:17:42  2016-08-11
                2  09:16:40  09:17:49  2016-08-11
                3  09:17:05  09:18:03  2016-08-11
                4  09:18:22  09:18:30  2016-08-11
                

                我想要什么:

                      start       end        date  activecalls
                0  09:17:12  09:18:20  2016-08-10            1
                1  09:15:58  09:17:42  2016-08-11            1
                2  09:16:40  09:17:49  2016-08-11            2
                3  09:17:05  09:18:03  2016-08-11            3
                4  09:18:22  09:18:30  2016-08-11            1
                

                我的代码:

                import pandas as pd
                
                df = pd.read_clipboard(sep='ss+')
                
                df['activecalls'] = df[(df['start'] <= df.loc[df.index]['start']) & 
                                       (df['end'] > df.loc[df.index]['start']) & 
                                       (df['date'] == df.loc[df.index]['date'])].count()
                
                print(df)
                

                我得到了什么:

                      start       end        date  activecalls
                0  09:17:12  09:18:20  2016-08-10          NaN
                1  09:15:58  09:17:42  2016-08-11          NaN
                2  09:16:40  09:17:49  2016-08-11          NaN
                3  09:17:05  09:18:03  2016-08-11          NaN
                4  09:18:22  09:18:30  2016-08-11          NaN
                

                推荐答案

                你可以使用:

                #convert time and date to datetime
                df['date_start'] = pd.to_datetime(df.start + ' ' + df.date)
                df['date_end'] = pd.to_datetime(df.end + ' ' + df.date)
                #remove columns
                df = df.drop(['start','end','date'], axis=1)
                

                带循环的解决方案:

                active_events= []
                for i in df.index:
                    active_events.append(len(df[(df["date_start"]<=df.loc[i,"date_start"]) & 
                                                (df["date_end"]> df.loc[i,"date_start"])]))
                df['activecalls'] = pd.Series(active_events)
                print (df)
                           date_start            date_end  activecalls
                0 2016-08-10 09:17:12 2016-08-10 09:18:20            1
                1 2016-08-11 09:15:58 2016-08-11 09:17:42            1
                2 2016-08-11 09:16:40 2016-08-11 09:17:49            2
                3 2016-08-11 09:17:05 2016-08-11 09:18:03            3
                4 2016-08-11 09:18:22 2016-08-11 09:18:30            1
                

                merge

                #cross join
                df['tmp'] = 1
                df1 = pd.merge(df,df.reset_index(),on=['tmp'])
                df = df.drop('tmp', axis=1)
                #print (df1)
                
                #filtering by conditions
                df1 = df1[(df1["date_start_x"]<=df1["date_start_y"])  
                          (df1["date_end_x"]> df1["date_start_y"])]
                print (df1)
                          date_start_x          date_end_x  activecalls_x  tmp  index  
                0  2016-08-10 09:17:12 2016-08-10 09:18:20              1    1      0   
                6  2016-08-11 09:15:58 2016-08-11 09:17:42              1    1      1   
                7  2016-08-11 09:15:58 2016-08-11 09:17:42              1    1      2   
                8  2016-08-11 09:15:58 2016-08-11 09:17:42              1    1      3   
                12 2016-08-11 09:16:40 2016-08-11 09:17:49              2    1      2   
                13 2016-08-11 09:16:40 2016-08-11 09:17:49              2    1      3   
                18 2016-08-11 09:17:05 2016-08-11 09:18:03              3    1      3   
                24 2016-08-11 09:18:22 2016-08-11 09:18:30              1    1      4   
                
                          date_start_y          date_end_y  activecalls_y  
                0  2016-08-10 09:17:12 2016-08-10 09:18:20              1  
                6  2016-08-11 09:15:58 2016-08-11 09:17:42              1  
                7  2016-08-11 09:16:40 2016-08-11 09:17:49              2  
                8  2016-08-11 09:17:05 2016-08-11 09:18:03              3  
                12 2016-08-11 09:16:40 2016-08-11 09:17:49              2  
                13 2016-08-11 09:17:05 2016-08-11 09:18:03              3  
                18 2016-08-11 09:17:05 2016-08-11 09:18:03              3  
                24 2016-08-11 09:18:22 2016-08-11 09:18:30              1  
                

                #get size - active calls
                print (df1.groupby(['index'], sort=False).size())
                index
                0    1
                1    1
                2    2
                3    3
                4    1
                dtype: int64
                
                df['activecalls'] = df1.groupby('index').size()
                print (df)
                           date_start            date_end  activecalls
                0 2016-08-10 09:17:12 2016-08-10 09:18:20            1
                1 2016-08-11 09:15:58 2016-08-11 09:17:42            1
                2 2016-08-11 09:16:40 2016-08-11 09:17:49            2
                3 2016-08-11 09:17:05 2016-08-11 09:18:03            3
                4 2016-08-11 09:18:22 2016-08-11 09:18:30            1
                

                时间安排:

                def a(df):
                    active_events= []
                    for i in df.index:
                        active_events.append(len(df[(df["date_start"]<=df.loc[i,"date_start"]) & (df["date_end"]> df.loc[i,"date_start"])]))
                    df['activecalls'] = pd.Series(active_events)
                    return (df)
                
                def b(df):
                    df['tmp'] = 1
                    df1 = pd.merge(df,df.reset_index(),on=['tmp'])
                    df = df.drop('tmp', axis=1)
                    df1 = df1[(df1["date_start_x"]<=df1["date_start_y"])  & (df1["date_end_x"]> df1["date_start_y"])]
                    df['activecalls'] = df1.groupby('index').size()
                    return (df)
                
                print (a(df))
                print (b(df))
                
                In [160]: %timeit (a(df))
                100 loops, best of 3: 6.76 ms per loop
                
                In [161]: %timeit (b(df))
                The slowest run took 4.42 times longer than the fastest. This could mean that an intermediate result is being cached.
                100 loops, best of 3: 4.61 ms per loop
                

                这篇关于如何在一行中计算数据框中的并发事件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                上一篇:Python3有条件地装饰? 下一篇:Pandas - 带条件公式的 Groupby

                相关文章

                <small id='YvtWg'></small><noframes id='YvtWg'>

                1. <legend id='YvtWg'><style id='YvtWg'><dir id='YvtWg'><q id='YvtWg'></q></dir></style></legend>
                    <bdo id='YvtWg'></bdo><ul id='YvtWg'></ul>
                    <tfoot id='YvtWg'></tfoot>
                  1. <i id='YvtWg'><tr id='YvtWg'><dt id='YvtWg'><q id='YvtWg'><span id='YvtWg'><b id='YvtWg'><form id='YvtWg'><ins id='YvtWg'></ins><ul id='YvtWg'></ul><sub id='YvtWg'></sub></form><legend id='YvtWg'></legend><bdo id='YvtWg'><pre id='YvtWg'><center id='YvtWg'></center></pre></bdo></b><th id='YvtWg'></th></span></q></dt></tr></i><div id='YvtWg'><tfoot id='YvtWg'></tfoot><dl id='YvtWg'><fieldset id='YvtWg'></fieldset></dl></div>