<legend id='nsnRC'><style id='nsnRC'><dir id='nsnRC'><q id='nsnRC'></q></dir></style></legend>

<small id='nsnRC'></small><noframes id='nsnRC'>

  • <tfoot id='nsnRC'></tfoot>

      <i id='nsnRC'><tr id='nsnRC'><dt id='nsnRC'><q id='nsnRC'><span id='nsnRC'><b id='nsnRC'><form id='nsnRC'><ins id='nsnRC'></ins><ul id='nsnRC'></ul><sub id='nsnRC'></sub></form><legend id='nsnRC'></legend><bdo id='nsnRC'><pre id='nsnRC'><center id='nsnRC'></center></pre></bdo></b><th id='nsnRC'></th></span></q></dt></tr></i><div id='nsnRC'><tfoot id='nsnRC'></tfoot><dl id='nsnRC'><fieldset id='nsnRC'></fieldset></dl></div>

          <bdo id='nsnRC'></bdo><ul id='nsnRC'></ul>

        pandas :加入有条件的数据框

        时间:2023-08-29

          <i id='G9uxN'><tr id='G9uxN'><dt id='G9uxN'><q id='G9uxN'><span id='G9uxN'><b id='G9uxN'><form id='G9uxN'><ins id='G9uxN'></ins><ul id='G9uxN'></ul><sub id='G9uxN'></sub></form><legend id='G9uxN'></legend><bdo id='G9uxN'><pre id='G9uxN'><center id='G9uxN'></center></pre></bdo></b><th id='G9uxN'></th></span></q></dt></tr></i><div id='G9uxN'><tfoot id='G9uxN'></tfoot><dl id='G9uxN'><fieldset id='G9uxN'></fieldset></dl></div>
            <tfoot id='G9uxN'></tfoot>
              <tbody id='G9uxN'></tbody>

            <small id='G9uxN'></small><noframes id='G9uxN'>

          1. <legend id='G9uxN'><style id='G9uxN'><dir id='G9uxN'><q id='G9uxN'></q></dir></style></legend>
            • <bdo id='G9uxN'></bdo><ul id='G9uxN'></ul>

                • 本文介绍了 pandas :加入有条件的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  所以我有这个数据框(如下所示),我试图通过将其复制到另一个 df 来加入自己.加盟条件如下;加盟条件:

                  So I have this dataframe (as below), I am trying to join itself by copying it into another df. The join condition as below; Join condition:

                  1. PERSONID 和 Badge_ID 相同
                  2. 但不同的 SITE_ID1
                  3. 两行之间的时间差应小于 48 小时.

                  期待

                  PERSONID    Badge_ID    Reader_ID1_x    SITE_ID1_x  EVENT_TS1_x         Reader_ID1_y    SITE_ID1_x  EVENT_TS1_y
                  2553-AMAGID 4229        141                 99          2/1/2016 3:26   145                 97          2/1/2016 3:29
                  2553-AMAGID 4229        248                 99          2/1/2016 3:26   145                 97          2/1/2016 3:29
                  2553-AMAGID 4229        145                 97          2/1/2016 3:29   251                 99          2/1/2016 3:29
                  2553-AMAGID 4229        145                 97          2/1/2016 3:29   291                 99          2/1/2016 3:29
                  

                  这就是我累的地方,制作 df 的副本,然后使用下面的条件过滤每个 df,然后再次加入它们.但是以下条件不起作用:(在读入 df 之前,我在 SQL 中尝试了这个过滤器,但是对于 600k+ 行、带有索引的事件来说,这太慢了.

                  Here is what I tired, Make a copy of df and then filter each df with this condition like below and then join them back again. But the below condition doesn't work :( I tried this filters in SQL before reading into df but that's too slow for 600k+ rows, event with indexes.

                  df1 = df1[(df1['Badge_ID']==df2['Badge_ID']) and (df1['SITE_ID1']!=df2['SITE_ID1']) and ((df1['EVENT_TS1']-df2['EVENT_TS1'])<=datetime.timedelta(hours=event_time_diff))]
                  
                  PERSONID    Badge_ID    Reader_ID1  SITE_ID1              EVENT_TS1
                  2553-AMAGID     4229    141             99          2/1/2016 3:26:10 AM
                  2553-AMAGID     4229    248             99          2/1/2016 3:26:10 AM
                  2553-AMAGID     4229    145             97          2/1/2016 3:29:56 AM
                  2553-AMAGID     4229    251             99          2/1/2016 3:29:56 AM
                  2553-AMAGID     4229    291             99          2/1/2016 3:29:56 AM
                  2557-AMAGID     4219    144             99          2/1/2016 2:36:30 AM
                  2557-AMAGID     4219    144             99          2/1/2016 2:40:00 AM
                  2557-AMAGID     4219    250             99          2/1/2016 2:40:00 AM
                  2557-AMAGID     4219    290             99          2/1/2016 2:40:00 AM
                  2557-AMAGID     4219    144             97          2/1/2016 4:02:06 AM
                  2557-AMAGID     4219    250             99          2/1/2016 4:02:06 AM
                  2557-AMAGID     4219    290             99          2/1/2016 4:02:06 AM
                  2557-AMAGID     4219    250             97          2/2/2016 1:36:30 AM
                  2557-AMAGID     4219    290             99          2/3/2016 2:38:30 AM
                  2559-AMAGID     4227    141             99          2/1/2016 4:33:24 AM
                  2559-AMAGID     4227    248             99          2/1/2016 4:33:24 AM
                  2560-AMAGID     4226    141             99          2/1/2016 4:10:56 AM
                  2560-AMAGID     4226    248             99          2/1/2016 4:10:56 AM
                  2560-AMAGID     4226    145             99          2/1/2016 4:33:52 AM
                  2560-AMAGID     4226    251             99          2/1/2016 4:33:52 AM
                  2560-AMAGID     4226    291             99          2/1/2016 4:33:52 AM
                  2570-AMAGID     4261    141             99          2/1/2016 4:27:02 AM
                  2570-AMAGID     4261    248             99          2/1/2016 4:27:02 AM
                  2986-AMAGID     4658    145             99          2/1/2016 3:14:54 AM
                  2986-AMAGID     4658    251             99          2/1/2016 3:14:54 AM
                  2986-AMAGID     4658    291             99          2/1/2016 3:14:54 AM
                  2986-AMAGID     4658    144             99          2/1/2016 3:26:30 AM
                  2986-AMAGID     4658    250             99          2/1/2016 3:26:30 AM
                  2986-AMAGID     4658    290             99          2/1/2016 3:26:30 AM
                  4133-AMAGID     6263    142             99          2/1/2016 2:44:08 AM
                  4133-AMAGID     6263    249             99          2/1/2016 2:44:08 AM
                  4133-AMAGID     6263    141             34          2/1/2016 2:44:20 AM
                  4133-AMAGID     6263    248             34          2/1/2016 2:44:20 AM
                  4414-AMAGID     6684    145             99          2/1/2016 3:08:06 AM
                  4414-AMAGID     6684    251             99          2/1/2016 3:08:06 AM
                  4414-AMAGID     6684    291             99          2/1/2016 3:08:06 AM
                  4414-AMAGID     6684    145             22          2/1/2016 3:19:12 AM
                  4414-AMAGID     6684    251             22          2/1/2016 3:19:12 AM
                  4414-AMAGID     6684    291             22          2/1/2016 3:19:12 AM
                  4414-AMAGID     6684    145             99          2/1/2016 4:14:28 AM
                  4414-AMAGID     6684    251             99          2/1/2016 4:14:28 AM
                  4414-AMAGID     6684    291             99          2/1/2016 4:14:28 AM
                  4484-AMAGID     6837    142             99          2/1/2016 2:51:14 AM
                  4484-AMAGID     6837    249             99          2/1/2016 2:51:14 AM
                  4484-AMAGID     6837    141             99          2/1/2016 2:51:26 AM
                  4484-AMAGID     6837    248             99          2/1/2016 2:51:26 AM
                  4484-AMAGID     6837    141             99          2/1/2016 3:05:12 AM
                  4484-AMAGID     6837    248             99          2/1/2016 3:05:12 AM
                  4484-AMAGID     6837    141             99          2/1/2016 3:08:58 AM
                  4484-AMAGID     6837    248             99          2/1/2016 3:08:58 AM
                  

                  推荐答案

                  试试以下:

                  # Transform data in first dataframe
                  df1 = pd.DataFrame(data)
                  
                  # Save the data in another datframe
                  df2 = pd.DataFrame(data)
                  
                  # Rename column names of second dataframe 
                  df2.rename(index=str, columns={'Reader_ID1': 'Reader_ID1_x', 'SITE_ID1': 'SITE_ID1_x', 'EVENT_TS1': 'EVENT_TS1_x'}, inplace=True)
                  
                  # Merge the dataframes into another dataframe based on PERSONID and Badge_ID
                  df3 = pd.merge(df1, df2, how='outer', on=['PERSONID', 'Badge_ID'])
                  
                  # Use df.loc() to fetch the data you want
                  df3.loc[(df3.Reader_ID1 < df3.Reader_ID1_x) & (df3.SITE_ID1 != df3.SITE_ID1_x) & (pd.to_datetime(df3['EVENT_TS1']) - pd.to_datetime(df3['EVENT_TS1_x'])<=datetime.timedelta(hours=event_time_diff))]
                  

                  这篇关于 pandas :加入有条件的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:在 Python 中分配类布尔值 下一篇:raise 条件表达式上的语句

                  相关文章

                    <bdo id='5O4J4'></bdo><ul id='5O4J4'></ul>
                  <tfoot id='5O4J4'></tfoot>
                • <small id='5O4J4'></small><noframes id='5O4J4'>

                    <legend id='5O4J4'><style id='5O4J4'><dir id='5O4J4'><q id='5O4J4'></q></dir></style></legend>
                  1. <i id='5O4J4'><tr id='5O4J4'><dt id='5O4J4'><q id='5O4J4'><span id='5O4J4'><b id='5O4J4'><form id='5O4J4'><ins id='5O4J4'></ins><ul id='5O4J4'></ul><sub id='5O4J4'></sub></form><legend id='5O4J4'></legend><bdo id='5O4J4'><pre id='5O4J4'><center id='5O4J4'></center></pre></bdo></b><th id='5O4J4'></th></span></q></dt></tr></i><div id='5O4J4'><tfoot id='5O4J4'></tfoot><dl id='5O4J4'><fieldset id='5O4J4'></fieldset></dl></div>