如何将一个 CSV 中的一行与另一个 CSV 文件中的所有行进行比较?

时间：2023-04-03

本文介绍了如何将一个 CSV 中的一行与另一个 CSV 文件中的所有行进行比较?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我有两个 CSV 文件:

I have two CSV files:

Identity(no,name,Age) 有 10 行
Location(Address,no,City) 有 100 行

Identity(no,name,Age) which has 10 rows
Location(Address,no,City) which has 100 rows

我需要提取行并使用 Location CSV 文件检查 Identity 中的 no 列.

I need to extract rows and check the no column in the Identity with Location CSV files.

从 Identity CSV 文件中获取单行并检查 Identity.no 和 Location.no 在 Location<中有 100 行/code> CSV 文件.


Get the single row from Identity CSV file and check Identity.no with Location.no having 100 rows in Location CSV file.
如果匹配则在Identity, Location
注意:我需要将 Identity 的第一行与 Location CSV 文件中的 100 行进行比较，然后将第二行与 100 行进行比较.它将在 Identity CSV 文件中继续最多 10 行.
Note: I need to get 1st row from Identity compare it with 100 rows in Location CSV file and then get the 2nd row compare it with 100 rows. It will be continue up to 10 rows in Identity CSV file.
并将整体结果转换为 Json.然后将结果移入 SQL Server.
And overall results convert into Json.Then move the results in to SQL Server.
是否可以在 Apache Nifi 中使用?
感谢任何帮助.
推荐答案
您可以在 NiFi 中使用 DistributedMapCache 功能执行此操作，该功能实现了用于查找的键/值存储.该设置需要一个分布式地图缓存，以及两个流 - 一个用于使用您的地址记录填充缓存，另一个用于通过 no 字段查找地址.
You can do this in NiFi by using the DistributedMapCache feature, which implements a key/value store for lookups.  The setup requires a distributed map cache, plus two flows - one to populate the cache with your Address records, and one to look up the address by the no field.
DistributedMapCache 由两个控制器服务定义，一个 DistributedMapCacheServer 和 DistributeMapCacheClientService.如果您的数据集很小，您可以使用localhost"作为服务器.

The DistributedMapCache is defined by two controller services, a DistributedMapCacheServer and a DistributeMapCacheClientService.  If your data set is small, you can just use "localhost" as the server.
填充缓存需要读取地址文件、拆分记录、提取no 键，并将键/值对放入缓存.大致流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
Populating the cache requires reading the Address file, splitting the records, extracting the no key, and putting key/value pairs to the cache.  An approximate flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> PutDistributedMapCache.
查找您的身份记录实际上与上面的流程非常相似，因为它需要读取身份文件、拆分记录、提取no 键，然后获取地址记录.处理器流程可能包括 GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
Looking up your identity records is actually fairly similar to the flow above, in that it requires reading the Identity file, splitting the records, extracting the no key, and then fetching the address record.  Processor flow might include GetFile -> SplitText -> ExtractText -> UpdateAttribute -> FetchDistributedMapCache.
您可以使用 AttributesToJSON 或 ExecuteScript 将整个或部分从 CSV 转换为 JSON.
You can convert the whole or parts from CSV to JSON with AttributesToJSON, or maybe ExecuteScript.

                        这篇关于如何将一个 CSV 中的一行与另一个 CSV 文件中的所有行进行比较?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：NIFI - QueryDatabaseTable 处理器.如何查询被修改的行? 
下一篇：covertJSONtoSQL 在 NiFi 中返回空值 

 
相关文章

     
    
如何在 NiFi 中映射流文件中的列数据?
Nifi PutSQL 时间戳/日期时间错误无法转换错误
将 SQL 连接到 apache nifi
Confluent JDBC Source 连接器的问题
在 Debezium Mysql Connector 中将更多表列入白名单的有效方法
如何使用kafka connect将kafka主题下沉到oracle?
为什么 Kafka jdbc 将插入数据作为 BLOB 而不是 varchar 连接
kafka-connect-jdbc:SQLException:仅在使用分布式模式时没有合适的驱动程序
如何从Apache Kafka中的远程数据库中提取数据?
Kafka JDBC 源连接器时间戳模式对 sqlite3 失败