• <bdo id='z61og'></bdo><ul id='z61og'></ul>

  • <legend id='z61og'><style id='z61og'><dir id='z61og'><q id='z61og'></q></dir></style></legend>

      <small id='z61og'></small><noframes id='z61og'>

      <tfoot id='z61og'></tfoot>

        <i id='z61og'><tr id='z61og'><dt id='z61og'><q id='z61og'><span id='z61og'><b id='z61og'><form id='z61og'><ins id='z61og'></ins><ul id='z61og'></ul><sub id='z61og'></sub></form><legend id='z61og'></legend><bdo id='z61og'><pre id='z61og'><center id='z61og'></center></pre></bdo></b><th id='z61og'></th></span></q></dt></tr></i><div id='z61og'><tfoot id='z61og'></tfoot><dl id='z61og'><fieldset id='z61og'></fieldset></dl></div>
      1. 使用 SqlAlchemy 和 cx_Oracle 将 Pandas DataFrame 写入 Oracle 数据库时加

        时间:2023-09-19
        <legend id='JrP4E'><style id='JrP4E'><dir id='JrP4E'><q id='JrP4E'></q></dir></style></legend><tfoot id='JrP4E'></tfoot>
        • <i id='JrP4E'><tr id='JrP4E'><dt id='JrP4E'><q id='JrP4E'><span id='JrP4E'><b id='JrP4E'><form id='JrP4E'><ins id='JrP4E'></ins><ul id='JrP4E'></ul><sub id='JrP4E'></sub></form><legend id='JrP4E'></legend><bdo id='JrP4E'><pre id='JrP4E'><center id='JrP4E'></center></pre></bdo></b><th id='JrP4E'></th></span></q></dt></tr></i><div id='JrP4E'><tfoot id='JrP4E'></tfoot><dl id='JrP4E'><fieldset id='JrP4E'></fieldset></dl></div>

            <tbody id='JrP4E'></tbody>

          <small id='JrP4E'></small><noframes id='JrP4E'>

                <bdo id='JrP4E'></bdo><ul id='JrP4E'></ul>
                1. 本文介绍了使用 SqlAlchemy 和 cx_Oracle 将 Pandas DataFrame 写入 Oracle 数据库时加快 to_sql()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  使用pandas dataframe 的to_sql 方法,我可以很容易地将少量行写入oracle 数据库中的表:

                  Using pandas dataframe's to_sql method, I can write a small number of rows to a table in oracle database pretty easily:

                  from sqlalchemy import create_engine
                  import cx_Oracle
                  dsn_tns = "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<host>)(PORT=1521))
                         (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=<servicename>)))"
                  pwd = input('Please type in password:')
                  engine = create_engine('oracle+cx_oracle://myusername:' + pwd + '@%s' % dsn_tns)
                  df.to_sql('test_table', engine.connect(), if_exists='replace')
                  

                  但是对于任何常规大小的数据帧(我的有 60k 行,不是那么大),代码变得无法使用,因为它在我愿意等待的时间内从未完成(绝对超过 10 分钟).我已经用谷歌搜索并搜索了很多次,最接近的解决方案是 ansonw 在 这个问题.但那是关于mysql,而不是oracle.正如 Ziggy Eunicien 指出的那样,它不适用于 oracle.有什么想法吗?

                  But with any regular-sized dataframes (mine has 60k rows, not so big), the code became unusable as it never finished in the time I was willing to wait (definitely more than 10 min). I've googled and searched quite a few times and the closest solution was the answer given by ansonw in this question. But that one was about mysql, not oracle. As Ziggy Eunicien pointed out, it did not work for oracle. Any ideas?

                  编辑

                  这是数据框中的行示例:

                  Here's a sample of rows in the dataframe:

                  id          name            premium     created_date    init_p  term_number uprate  value   score   group   action_reason
                  160442353   LDP: Review     1295.619617 2014-01-20  1130.75     1           7       -42 236.328243  6       pass
                  164623435   TRU: Referral   453.224880  2014-05-20  0.00        11          NaN     -55 38.783290   1       suppress
                  

                  这里是 df 的数据类型:

                  and here is the data types for the df:

                  id               int64
                  name             object
                  premium          float64
                  created_date     object
                  init_p           float64
                  term_number      float64
                  uprate           float64
                  value            float64
                  score            float64
                  group            int64
                  action_reason    object
                  

                  推荐答案

                  Pandas + SQLAlchemy 默认在 Oracle DB 中将所有 object(字符串)列保存为 CLOB,这使插入极其缓慢.

                  Pandas + SQLAlchemy per default save all object (string) columns as CLOB in Oracle DB, which makes insertion extremely slow.

                  以下是一些测试:

                  import pandas as pd
                  import cx_Oracle
                  from sqlalchemy import types, create_engine
                  
                  #######################################################
                  ### DB connection strings config
                  #######################################################
                  tns = """
                    (DESCRIPTION =
                      (ADDRESS = (PROTOCOL = TCP)(HOST = my-db-scan)(PORT = 1521))
                      (CONNECT_DATA =
                        (SERVER = DEDICATED)
                        (SERVICE_NAME = my_service_name)
                      )
                    )
                  """
                  
                  usr = "test"
                  pwd = "my_oracle_password"
                  
                  engine = create_engine('oracle+cx_oracle://%s:%s@%s' % (usr, pwd, tns))
                  
                  # sample DF [shape: `(2000, 11)`]
                  # i took your 2 rows DF and replicated it: `df = pd.concat([df]* 10**3, ignore_index=True)`
                  df = pd.read_csv('/path/to/file.csv')
                  

                  DF 信息:

                  In [61]: df.shape
                  Out[61]: (2000, 11)
                  
                  In [62]: df.info()
                  <class 'pandas.core.frame.DataFrame'>
                  RangeIndex: 2000 entries, 0 to 1999
                  Data columns (total 11 columns):
                  id               2000 non-null int64
                  name             2000 non-null object
                  premium          2000 non-null float64
                  created_date     2000 non-null datetime64[ns]
                  init_p           2000 non-null float64
                  term_number      2000 non-null int64
                  uprate           1000 non-null float64
                  value            2000 non-null int64
                  score            2000 non-null float64
                  group            2000 non-null int64
                  action_reason    2000 non-null object
                  dtypes: datetime64[ns](1), float64(4), int64(4), object(2)
                  memory usage: 172.0+ KB
                  

                  让我们看看将它存储到 Oracle DB 需要多长时间:

                  Let's check how long will it take to store it to Oracle DB:

                  In [57]: df.shape
                  Out[57]: (2000, 11)
                  
                  In [58]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace')
                  1 loop, best of 1: 16 s per loop
                  

                  在 Oracle DB 中(注意 CLOB):

                  In Oracle DB (pay attention at CLOB's):

                  AAA> desc test.test_table
                   Name                            Null?    Type
                   ------------------------------- -------- ------------------
                   ID                                       NUMBER(19)
                   NAME                                     CLOB        #  !!!
                   PREMIUM                                  FLOAT(126)
                   CREATED_DATE                             DATE
                   INIT_P                                   FLOAT(126)
                   TERM_NUMBER                              NUMBER(19)
                   UPRATE                                   FLOAT(126)
                   VALUE                                    NUMBER(19)
                   SCORE                                    FLOAT(126)
                   group                                    NUMBER(19)
                   ACTION_REASON                            CLOB        #  !!!
                  

                  现在让我们指示 pandas 将所有 object 列保存为 VARCHAR 数据类型:

                  Now let's instruct pandas to save all object columns as VARCHAR data types:

                  In [59]: dtyp = {c:types.VARCHAR(df[c].str.len().max())
                      ...:         for c in df.columns[df.dtypes == 'object'].tolist()}
                      ...:
                  
                  In [60]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace', dtype=dtyp)
                  1 loop, best of 1: 335 ms per loop
                  

                  这次大约是.快 48 倍

                  This time it was approx. 48 times faster

                  签入 Oracle 数据库:

                  Check in Oracle DB:

                   AAA> desc test.test_table
                   Name                          Null?    Type
                   ----------------------------- -------- ---------------------
                   ID                                     NUMBER(19)
                   NAME                                   VARCHAR2(13 CHAR)        #  !!!
                   PREMIUM                                FLOAT(126)
                   CREATED_DATE                           DATE
                   INIT_P                                 FLOAT(126)
                   TERM_NUMBER                            NUMBER(19)
                   UPRATE                                 FLOAT(126)
                   VALUE                                  NUMBER(19)
                   SCORE                                  FLOAT(126)
                   group                                  NUMBER(19)
                   ACTION_REASON                          VARCHAR2(8 CHAR)        #  !!!
                  

                  让我们用 200.000 行 DF 测试它:

                  Let's test it with 200.000 rows DF:

                  In [69]: df.shape
                  Out[69]: (200000, 11)
                  
                  In [70]: %timeit -n 1 -r 1 df.to_sql('test_table', engine, index=False, if_exists='replace', dtype=dtyp, chunksize=10**4)
                  1 loop, best of 1: 4.68 s per loop
                  

                  在我的测试(不是最快的)环境中,20 万行 DF 花费了大约 5 秒.

                  It took ~5 seconds for 200K rows DF in my test (not the fastest) environment.

                  结论:使用以下技巧,以便在将 DataFrame 保存到 Oracle DB 时为 object dtype 的所有 DF 列显式指定 dtype.否则会保存为CLOB数据类型,需要特殊处理,速度很慢

                  Conclusion: use the following trick in order to explicitly specify dtype for all DF columns of object dtype when saving DataFrames to Oracle DB. Otherwise it'll be saved as CLOB data type, which requires special treatment and makes it very slow

                  dtyp = {c:types.VARCHAR(df[c].str.len().max())
                          for c in df.columns[df.dtypes == 'object'].tolist()}
                  
                  df.to_sql(..., dtype=dtyp)
                  

                  这篇关于使用 SqlAlchemy 和 cx_Oracle 将 Pandas DataFrame 写入 Oracle 数据库时加快 to_sql()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:Oracle - 如何使用 FAST REFRESH 和 JOINS 创建物化视图 下一篇:检查当前日期是否在两个日期之间 Oracle SQL

                  相关文章

                2. <small id='atgz3'></small><noframes id='atgz3'>

                  • <bdo id='atgz3'></bdo><ul id='atgz3'></ul>
                3. <tfoot id='atgz3'></tfoot>

                  1. <i id='atgz3'><tr id='atgz3'><dt id='atgz3'><q id='atgz3'><span id='atgz3'><b id='atgz3'><form id='atgz3'><ins id='atgz3'></ins><ul id='atgz3'></ul><sub id='atgz3'></sub></form><legend id='atgz3'></legend><bdo id='atgz3'><pre id='atgz3'><center id='atgz3'></center></pre></bdo></b><th id='atgz3'></th></span></q></dt></tr></i><div id='atgz3'><tfoot id='atgz3'></tfoot><dl id='atgz3'><fieldset id='atgz3'></fieldset></dl></div>

                      <legend id='atgz3'><style id='atgz3'><dir id='atgz3'><q id='atgz3'></q></dir></style></legend>