尝试比较两个 csv 文件并将差异写入输出

时间：2023-11-07

本文介绍了尝试比较两个 csv 文件并将差异写入输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我正在开发一个脚本，它获取 2 个 csv 文件之间的差异，并制作一个新的 csv 文件作为输出，但只有当两个输入文件之间的相同 2 行(指行号)包含不同的数据时，例如第 3 行在文件 1 中有mike"、basketball player"，在文件 2 中第 3 行有mike"、baseball player".输出 csv 会抓取这些打印它们并将它们写入 csv.它有效，但存在一些问题(我知道这个问题之前也被问过好几次，但其他人对我的做法不同，而且由于我对编程还很陌生，所以我不太了解他们的代码).

I'm developing a script which takes the difference between 2 csv files and makes a new csv file as output with the differences BUT only if the same 2 rows (refers to row number) between the two input files contain different data e.g. row 3 has "mike", "basketball player" in file 1 and row 3 in file 2 has "mike", "baseball player". The output csv would grab these print them and write them to a csv. It works but there are some issues (I know that this question has also been asked several times before but others have done it differently to me and since I'm fairly new to programming I don't quite understand their codes).

新 csv 文件中的输出在每个单元格中都有输出的每个字母(见下图)，我相信这与第 37 行的分隔符/引号字符/引号有关.我希望它们在自己的单元格中没有任何句号、多个空格、逗号或|".

The output in the new csv file has each letter of the output in each cell (see image below) and I believe its something to do with the delimiter/quotechar/quoting line 37. I want them in their own cells without any fullstops, multiple spaces, commas or "|".

另一个问题是它需要很长时间才能运行.我正在处理多达 50,000 行的数据集，运行可能需要一个多小时.为什么会这样?有什么建议可以加快速度?也许在 for 循环之外放一些东西?我之前确实尝试过 difflib 方法，但我只能打印整个input_file1"，但不能将该文件与另一个文件进行比较.

Another issue is that it takes a long time to run. I'm working with datasets of up to 50,000 rows and it can take over an hour to run. Why is this and what advice would be useful to speed it up? Put something outside of the for loop maybe? I did try the difflib method earlier on but I was only able to print the entire "input_file1" but not compare that file with another.

# aim of script is to compare csv files and output difference as a new csv

# import necessary libraries
import csv

# File1 = open(raw_input("path:"),"r") #filename, mode
# File2 = open(raw_input("path:"),"r") #filename, mode

# selects the 2 input files to be compared
input_file1 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book1.csv"
input_file2 = "G:/savestuffhereqwerty/electorate_meshblocks/teststuff/Book2.csv"
# creates the blank output csv file
output_path = "G:/savestuffhereqwerty/electorate_meshblocks/outputs/output2.csv"
a = open(input_file1, "r")
output_file = open(output_path,"w")
output_file.close()
count = 0

with open(input_file1) as fp1:


    for row_number1, row_value1 in enumerate(fp1):
        if row_number1 == count:
            print "got to 1st point"
            value1 = row_value1

            with open(input_file2) as fp2:
                for row_number2, row_value2 in enumerate(fp2):
                    if row_number2 == count:
                        print "got to 2nd point"
                        value2 = row_value2

                        if value1 == value2:
                            print value1, value2
                        else:
                            print value1, value2
                            with open(output_path, 'wb') as f:
                                writer = csv.writer(f, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
                                # testing to see if the code writes text to the csv
                                writer.writerow(["test1"])
                                writer.writerow(["test2", "test3", "test4"])
                                writer.writerows([value1, value2])
                                print "code reached writing stage"
        count += 1
        print count
print "done"
# replace(",",".")

尝试比较两个 csv 文件并将差异写入输出

问题描述

推荐答案

相关文章