<i id='UAFDL'><tr id='UAFDL'><dt id='UAFDL'><q id='UAFDL'><span id='UAFDL'><b id='UAFDL'><form id='UAFDL'><ins id='UAFDL'></ins><ul id='UAFDL'></ul><sub id='UAFDL'></sub></form><legend id='UAFDL'></legend><bdo id='UAFDL'><pre id='UAFDL'><center id='UAFDL'></center></pre></bdo></b><th id='UAFDL'></th></span></q></dt></tr></i><div id='UAFDL'><tfoot id='UAFDL'></tfoot><dl id='UAFDL'><fieldset id='UAFDL'></fieldset></dl></div>
    <legend id='UAFDL'><style id='UAFDL'><dir id='UAFDL'><q id='UAFDL'></q></dir></style></legend>

    • <bdo id='UAFDL'></bdo><ul id='UAFDL'></ul>

    <small id='UAFDL'></small><noframes id='UAFDL'>

    <tfoot id='UAFDL'></tfoot>

      1. 使用 python pandas 将现有的 excel 表附加到新的数据框

        时间:2023-09-28
          <tbody id='Hsbm0'></tbody>

            <small id='Hsbm0'></small><noframes id='Hsbm0'>

                <tfoot id='Hsbm0'></tfoot>
                  <bdo id='Hsbm0'></bdo><ul id='Hsbm0'></ul>
                • <i id='Hsbm0'><tr id='Hsbm0'><dt id='Hsbm0'><q id='Hsbm0'><span id='Hsbm0'><b id='Hsbm0'><form id='Hsbm0'><ins id='Hsbm0'></ins><ul id='Hsbm0'></ul><sub id='Hsbm0'></sub></form><legend id='Hsbm0'></legend><bdo id='Hsbm0'><pre id='Hsbm0'><center id='Hsbm0'></center></pre></bdo></b><th id='Hsbm0'></th></span></q></dt></tr></i><div id='Hsbm0'><tfoot id='Hsbm0'></tfoot><dl id='Hsbm0'><fieldset id='Hsbm0'></fieldset></dl></div>
                  <legend id='Hsbm0'><style id='Hsbm0'><dir id='Hsbm0'><q id='Hsbm0'></q></dir></style></legend>
                • 本文介绍了使用 python pandas 将现有的 excel 表附加到新的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我目前有这个代码.效果很好.

                  它遍历文件夹中的 excel 文件,删除前 2 行,然后将它们保存为单独的 excel 文件,它还将循环中的文件保存为附加文件.

                  当前每次运行代码时,附加文件覆盖现有文件.

                  我需要将新数据追加到已经存在的excel表格('master_data.xlsx)

                  的底部

                  dfList = []路径 = 'C:\Test\TestRawFile'newpath = 'C:\Path\To\New\Folder'对于 os.listdir(path) 中的 fn:# 绝对文件路径文件 = os.path.join(路径,fn)如果 os.path.isfile(file):# 导入excel文件并命名为xlsx_filexlsx_file = pd.ExcelFile(文件)# 查看excel文件工作表名称xlsx_file.sheet_names# 加载 xlsx 文件数据表作为数据框df = xlsx_file.parse('Sheet1',header= None)df_NoHeader = df[2:]数据 = df_NoHeader# 保存单个数据框data.to_excel(os.path.join(newpath, fn))dfList.append(数据)appended_data = pd.concat(dfList)appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

                  我认为这将是一项简单的任务,但我想不会.我想我需要将 master_data.xlsx 文件作为数据框引入,然后将索引与新的附加数据匹配,然后将其保存回来.或者也许有更简单的方法.任何帮助表示赞赏.

                  解决方案


                  更新 [2022-01-08]:似乎从 1.4.0 版开始,Pandas 将支持开箱即用"附加到现有 Excel 工作表!

                  熊猫团队干得好!

                  根据

                  PS 如果您不想重复列名,您可能还需要指定 header=None...

                  更新:您可能还想查看这个旧解决方案

                  I currently have this code. It works perfectly.

                  It loops through excel files in a folder, removes the first 2 rows, then saves them as individual excel files, and it also saves the files in the loop as an appended file.

                  Currently the appended file overwrites the existing file each time I run the code.

                  I need to append the new data to the bottom of the already existing excel sheet ('master_data.xlsx)

                  dfList = []
                  path = 'C:\Test\TestRawFile' 
                  newpath = 'C:\Path\To\New\Folder'
                  
                  for fn in os.listdir(path): 
                    # Absolute file path
                    file = os.path.join(path, fn)
                    if os.path.isfile(file): 
                      # Import the excel file and call it xlsx_file 
                      xlsx_file = pd.ExcelFile(file) 
                      # View the excel files sheet names 
                      xlsx_file.sheet_names 
                      # Load the xlsx files Data sheet as a dataframe 
                      df = xlsx_file.parse('Sheet1',header= None) 
                      df_NoHeader = df[2:] 
                      data = df_NoHeader 
                      # Save individual dataframe
                      data.to_excel(os.path.join(newpath, fn))
                  
                      dfList.append(data) 
                  
                  appended_data = pd.concat(dfList)
                  appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))
                  

                  I thought this would be a simple task, but I guess not. I think I need to bring in the master_data.xlsx file as a dataframe, then match the index up with the new appended data, and save it back out. Or maybe there is an easier way. Any Help is appreciated.

                  解决方案


                  UPDATE [2022-01-08]: it seems starting from version 1.4.0 Pandas will support appending to existing Excel sheet "out of the box"!

                  Good job Pandas Team!

                  According to the DocString in pandas-dev github, ExcelWriter will support parameter if_sheet_exists='overlay'

                  if_sheet_exists : {'error', 'new', 'replace', 'overlay'}, default 'error'
                      How to behave when trying to write to a sheet that already
                      exists (append mode only).
                      * error: raise a ValueError.
                      * new: Create a new sheet, with a name determined by the engine.
                      * replace: Delete the contents of the sheet before writing to it.
                      * overlay: Write contents to the existing sheet without removing the old
                        contents.
                      .. versionadded:: 1.3.0
                      .. versionchanged:: 1.4.0
                         Added ``overlay`` option
                  


                  For Pandas versions < 1.4.0 please find below a helper function for appending a Pandas DataFrame to an existing Excel file.

                  If an Excel file doesn't exist then it will be created.


                  UPDATE [2021-09-12]: fixed for Pandas 1.3.0+

                  The following functions have been tested with:

                  • Pandas 1.3.2
                  • OpenPyxl 3.0.7

                  from pathlib import Path
                  from copy import copy
                  from typing import Union, Optional
                  import numpy as np
                  import pandas as pd
                  import openpyxl
                  from openpyxl import load_workbook
                  from openpyxl.utils import get_column_letter
                  
                  
                  def copy_excel_cell_range(
                          src_ws: openpyxl.worksheet.worksheet.Worksheet,
                          min_row: int = None,
                          max_row: int = None,
                          min_col: int = None,
                          max_col: int = None,
                          tgt_ws: openpyxl.worksheet.worksheet.Worksheet = None,
                          tgt_min_row: int = 1,
                          tgt_min_col: int = 1,
                          with_style: bool = True
                  ) -> openpyxl.worksheet.worksheet.Worksheet:
                      """
                      copies all cells from the source worksheet [src_ws] starting from [min_row] row
                      and [min_col] column up to [max_row] row and [max_col] column
                      to target worksheet [tgt_ws] starting from [tgt_min_row] row
                      and [tgt_min_col] column.
                  
                      @param src_ws:  source worksheet
                      @param min_row: smallest row index in the source worksheet (1-based index)
                      @param max_row: largest row index in the source worksheet (1-based index)
                      @param min_col: smallest column index in the source worksheet (1-based index)
                      @param max_col: largest column index in the source worksheet (1-based index)
                      @param tgt_ws:  target worksheet.
                                      If None, then the copy will be done to the same (source) worksheet.
                      @param tgt_min_row: target row index (1-based index)
                      @param tgt_min_col: target column index (1-based index)
                      @param with_style:  whether to copy cell style. Default: True
                  
                      @return: target worksheet object
                      """
                      if tgt_ws is None:
                          tgt_ws = src_ws
                  
                      # https://stackoverflow.com/a/34838233/5741205
                      for row in src_ws.iter_rows(min_row=min_row, max_row=max_row,
                                                  min_col=min_col, max_col=max_col):
                          for cell in row:
                              tgt_cell = tgt_ws.cell(
                                  row=cell.row + tgt_min_row - 1,
                                  column=cell.col_idx + tgt_min_col - 1,
                                  value=cell.value
                              )
                              if with_style and cell.has_style:
                                  # tgt_cell._style = copy(cell._style)
                                  tgt_cell.font = copy(cell.font)
                                  tgt_cell.border = copy(cell.border)
                                  tgt_cell.fill = copy(cell.fill)
                                  tgt_cell.number_format = copy(cell.number_format)
                                  tgt_cell.protection = copy(cell.protection)
                                  tgt_cell.alignment = copy(cell.alignment)
                      return tgt_ws
                  
                  
                  def append_df_to_excel(
                          filename: Union[str, Path],
                          df: pd.DataFrame,
                          sheet_name: str = 'Sheet1',
                          startrow: Optional[int] = None,
                          max_col_width: int = 30,
                          autofilter: bool = False,
                          fmt_int: str = "#,##0",
                          fmt_float: str = "#,##0.00",
                          fmt_date: str = "yyyy-mm-dd",
                          fmt_datetime: str = "yyyy-mm-dd hh:mm",
                          truncate_sheet: bool = False,
                          storage_options: Optional[dict] = None,
                          **to_excel_kwargs
                  ) -> None:
                      """
                      Append a DataFrame [df] to existing Excel file [filename]
                      into [sheet_name] Sheet.
                      If [filename] doesn't exist, then this function will create it.
                  
                      @param filename: File path or existing ExcelWriter
                                       (Example: '/path/to/file.xlsx')
                      @param df: DataFrame to save to workbook
                      @param sheet_name: Name of sheet which will contain DataFrame.
                                         (default: 'Sheet1')
                      @param startrow: upper left cell row to dump data frame.
                                       Per default (startrow=None) calculate the last row
                                       in the existing DF and write to the next row...
                      @param max_col_width: maximum column width in Excel. Default: 40
                      @param autofilter: boolean - whether add Excel autofilter or not. Default: False
                      @param fmt_int: Excel format for integer numbers
                      @param fmt_float: Excel format for float numbers
                      @param fmt_date: Excel format for dates
                      @param fmt_datetime: Excel format for datetime's
                      @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                                             before writing DataFrame to Excel file
                      @param storage_options: dict, optional
                          Extra options that make sense for a particular storage connection, e.g. host, port,
                          username, password, etc., if using a URL that will be parsed by fsspec, e.g.,
                          starting "s3://", "gcs://".
                      @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                                              [can be a dictionary]
                      @return: None
                  
                      Usage examples:
                  
                      >>> append_df_to_excel('/tmp/test.xlsx', df, autofilter=True,
                                             freeze_panes=(1,0))
                  
                      >>> append_df_to_excel('/tmp/test.xlsx', df, header=None, index=False)
                  
                      >>> append_df_to_excel('/tmp/test.xlsx', df, sheet_name='Sheet2',
                                             index=False)
                  
                      >>> append_df_to_excel('/tmp/test.xlsx', df, sheet_name='Sheet2',
                                             index=False, startrow=25)
                  
                      >>> append_df_to_excel('/tmp/test.xlsx', df, index=False,
                                             fmt_datetime="dd.mm.yyyy hh:mm")
                  
                      (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
                      """
                      def set_column_format(ws, column_letter, fmt):
                          for cell in ws[column_letter]:
                              cell.number_format = fmt
                      filename = Path(filename)
                      file_exists = filename.is_file()
                      # process parameters
                      # calculate first column number
                      # if the DF will be written using `index=True`, then `first_col = 2`, else `first_col = 1`
                      first_col = int(to_excel_kwargs.get("index", True)) + 1
                      # ignore [engine] parameter if it was passed
                      if 'engine' in to_excel_kwargs:
                          to_excel_kwargs.pop('engine')
                      # save content of existing sheets
                      if file_exists:
                          wb = load_workbook(filename)
                          sheet_names = wb.sheetnames
                          sheet_exists = sheet_name in sheet_names
                          sheets = {ws.title: ws for ws in wb.worksheets}
                  
                      with pd.ExcelWriter(
                          filename.with_suffix(".xlsx"),
                          engine="openpyxl",
                          mode="a" if file_exists else "w",
                          if_sheet_exists="new" if file_exists else None,
                          date_format=fmt_date,
                          datetime_format=fmt_datetime,
                          storage_options=storage_options
                      ) as writer:
                          if file_exists:
                              # try to open an existing workbook
                              writer.book = wb
                              # get the last row in the existing Excel sheet
                              # if it was not specified explicitly
                              if startrow is None and sheet_name in writer.book.sheetnames:
                                  startrow = writer.book[sheet_name].max_row
                              # truncate sheet
                              if truncate_sheet and sheet_name in writer.book.sheetnames:
                                  # index of [sheet_name] sheet
                                  idx = writer.book.sheetnames.index(sheet_name)
                                  # remove [sheet_name]
                                  writer.book.remove(writer.book.worksheets[idx])
                                  # create an empty sheet [sheet_name] using old index
                                  writer.book.create_sheet(sheet_name, idx)
                              # copy existing sheets
                              writer.sheets = sheets
                          else:
                              # file doesn't exist, we are creating a new one
                              startrow = 0
                  
                          # write out the DataFrame to an ExcelWriter
                          df.to_excel(writer, sheet_name=sheet_name, **to_excel_kwargs)
                          worksheet = writer.sheets[sheet_name]
                  
                          if autofilter:
                              worksheet.auto_filter.ref = worksheet.dimensions
                  
                          for xl_col_no, dtyp in enumerate(df.dtypes, first_col):
                              col_no = xl_col_no - first_col
                              width = max(df.iloc[:, col_no].astype(str).str.len().max(),
                                          len(df.columns[col_no]) + 6)
                              width = min(max_col_width, width)
                              column_letter = get_column_letter(xl_col_no)
                              worksheet.column_dimensions[column_letter].width = width
                              if np.issubdtype(dtyp, np.integer):
                                  set_column_format(worksheet, column_letter, fmt_int)
                              if np.issubdtype(dtyp, np.floating):
                                  set_column_format(worksheet, column_letter, fmt_float)
                  
                      if file_exists and sheet_exists:
                          # move (append) rows from new worksheet to the `sheet_name` worksheet
                          wb = load_workbook(filename)
                          # retrieve generated worksheet name
                          new_sheet_name = set(wb.sheetnames) - set(sheet_names)
                          if new_sheet_name:
                              new_sheet_name = list(new_sheet_name)[0]
                          # copy rows written by `df.to_excel(...)` to
                          copy_excel_cell_range(
                              src_ws=wb[new_sheet_name],
                              tgt_ws=wb[sheet_name],
                              tgt_min_row=startrow + 1,
                              with_style=True
                          )
                          # remove new (generated by Pandas) worksheet
                          del wb[new_sheet_name]
                          wb.save(filename)
                          wb.close()
                  


                  Old version (tested with Pandas 1.2.3 and Openpyxl 3.0.5):

                  import os
                  from openpyxl import load_workbook
                  
                  
                  def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                                         truncate_sheet=False, 
                                         **to_excel_kwargs):
                      """
                      Append a DataFrame [df] to existing Excel file [filename]
                      into [sheet_name] Sheet.
                      If [filename] doesn't exist, then this function will create it.
                  
                      @param filename: File path or existing ExcelWriter
                                       (Example: '/path/to/file.xlsx')
                      @param df: DataFrame to save to workbook
                      @param sheet_name: Name of sheet which will contain DataFrame.
                                         (default: 'Sheet1')
                      @param startrow: upper left cell row to dump data frame.
                                       Per default (startrow=None) calculate the last row
                                       in the existing DF and write to the next row...
                      @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                                             before writing DataFrame to Excel file
                      @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                                              [can be a dictionary]
                      @return: None
                  
                      Usage examples:
                  
                      >>> append_df_to_excel('d:/temp/test.xlsx', df)
                  
                      >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)
                  
                      >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                                             index=False)
                  
                      >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                                             index=False, startrow=25)
                  
                      (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
                      """
                      # Excel file doesn't exist - saving and exiting
                      if not os.path.isfile(filename):
                          df.to_excel(
                              filename,
                              sheet_name=sheet_name, 
                              startrow=startrow if startrow is not None else 0, 
                              **to_excel_kwargs)
                          return
                      
                      # ignore [engine] parameter if it was passed
                      if 'engine' in to_excel_kwargs:
                          to_excel_kwargs.pop('engine')
                  
                      writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
                  
                      # try to open an existing workbook
                      writer.book = load_workbook(filename)
                      
                      # get the last row in the existing Excel sheet
                      # if it was not specified explicitly
                      if startrow is None and sheet_name in writer.book.sheetnames:
                          startrow = writer.book[sheet_name].max_row
                  
                      # truncate sheet
                      if truncate_sheet and sheet_name in writer.book.sheetnames:
                          # index of [sheet_name] sheet
                          idx = writer.book.sheetnames.index(sheet_name)
                          # remove [sheet_name]
                          writer.book.remove(writer.book.worksheets[idx])
                          # create an empty sheet [sheet_name] using old index
                          writer.book.create_sheet(sheet_name, idx)
                      
                      # copy existing sheets
                      writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
                  
                      if startrow is None:
                          startrow = 0
                  
                      # write out the new sheet
                      df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
                  
                      # save the workbook
                      writer.save()


                  Usage examples:

                  filename = r'C:OCC.xlsx'
                  
                  append_df_to_excel(filename, df)
                  
                  append_df_to_excel(filename, df, header=None, index=False)
                  
                  append_df_to_excel(filename, df, sheet_name='Sheet2', index=False)
                  
                  append_df_to_excel(filename, df, sheet_name='Sheet2', index=False, startrow=25)
                  


                  c:/temp/test.xlsx:

                  PS you may also want to specify header=None if you don't want to duplicate column names...

                  UPDATE: you may also want to check this old solution

                  这篇关于使用 python pandas 将现有的 excel 表附加到新的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:通过一次追加一行来创建 Pandas 数据框 下一篇:一次附加多个 pandas 数据帧

                  相关文章

                    <bdo id='uBqRZ'></bdo><ul id='uBqRZ'></ul>

                • <legend id='uBqRZ'><style id='uBqRZ'><dir id='uBqRZ'><q id='uBqRZ'></q></dir></style></legend>

                  <small id='uBqRZ'></small><noframes id='uBqRZ'>

                  <tfoot id='uBqRZ'></tfoot>

                    <i id='uBqRZ'><tr id='uBqRZ'><dt id='uBqRZ'><q id='uBqRZ'><span id='uBqRZ'><b id='uBqRZ'><form id='uBqRZ'><ins id='uBqRZ'></ins><ul id='uBqRZ'></ul><sub id='uBqRZ'></sub></form><legend id='uBqRZ'></legend><bdo id='uBqRZ'><pre id='uBqRZ'><center id='uBqRZ'></center></pre></bdo></b><th id='uBqRZ'></th></span></q></dt></tr></i><div id='uBqRZ'><tfoot id='uBqRZ'></tfoot><dl id='uBqRZ'><fieldset id='uBqRZ'></fieldset></dl></div>