本文介绍了删除重复的行只留下最旧的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!
问题描述
我有一个数据表,并且有很多用户提交的重复条目.
I have a table of data and there are many duplicate entries from user submissions.
我想根据字段 subscriberEmail
删除所有重复的行,只留下原始提交.
I want to delete all duplicates rows based on the field subscriberEmail
, leaving only the original submission.
换句话说,我想搜索所有重复的电子邮件,并删除那些行,只留下原始的.
In other words, I want to search for all duplicate emails, and delete those rows, leaving only the original.
如何在不交换表的情况下执行此操作?
我的表格包含每一行的唯一 ID.
How can I do this without swapping tables?
My table contains unique IDs for each row.
推荐答案
由于您使用 id 列作为指示哪个记录是原始"的:
Since you're using the id column as an indicator of which record is 'original':
delete x
from myTable x
join myTable z on x.subscriberEmail = z.subscriberEmail
where x.id > z.id
这将为每个电子邮件地址留下一条记录.
This will leave one record per email address.
编辑添加:
为了解释上面的查询...
To explain the query above...
这里的想法是加入表格反对自己.假设您有该表的两个副本,每个副本的名称都不同.然后您可以将它们相互比较,并找到最低的 id 或每个电子邮件地址.然后您会看到稍后创建的重复记录并可以删除它们.(我在考虑这个时正在想象 Excel.)
The idea here is to join the table against itself. Pretend that you have two copies of the table, each named something different. Then you could compare them to each other, and find the lowest id or for each email address. You'd then see the duplicate records that were created later on and could delete them. (I was visualizing Excel when thinking about this.)
为了在表上执行该操作,将其与自身进行比较并能够识别每一侧,您可以使用表别名.x
是表别名.它在 from
子句中被赋值,如下所示:from <别名>.x
现在可以在同一查询中的其他地方使用,以作为快捷方式引用该表.
In order to do that operation on a table, compare it to itself and be able to identify each side, you use table aliases. x
is a table alias. It is assigned in the from
clause like so: from <table> <alias>
. x
can now be used elsewhere in the same query to refer to that table as a shortcut.
delete x
以我们的动作和目标开始查询.我们将执行查询以从多个表中选择记录,并且我们希望删除出现在 x
中的记录.
delete x
starts the query off with our action and target. We're going to perform a query to select records from multiple tables, and we want to delete records that appear in x
.
别名用于指代表的两个实例".from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail
在电子邮件匹配的地方将表格撞到自己身上.如果没有后面的 where 子句,每条记录都会被选中,因为它可以连接到自身.
Aliases are used to refer to both 'instances' of the table. from myTable x join myTable z on x.subscriberEmail = z.subscriberEmail
bumps the table up against itself where the emails match. Without the where clause that follows, every record would be selected as it could be joined up against itself.
where
子句限制了选择的记录. where x.id >z.id
允许别名为 x
的实例"仅包含与电子邮件匹配但具有更高 id
值的记录.表中您真正想要的数据,唯一的电子邮件地址(具有最低 id)不会成为 x
的一部分,也不会被删除.x
中的唯一记录将是重复记录(电子邮件地址),其 id
高于该电子邮件地址的原始记录.
The where
clause limits the records that are selected. where x.id > z.id
allows the 'instance' aliased x
to contain only the records that match emails but have a higher id
value. The data that you really want in the table, unique email addresses (with the lowest id) will not be part of x
and will not be deleted. The only records in x
will be duplicate records (email addresses) that have a higher id
than the original record for that email address.
在这种情况下可以组合连接和 where 子句:
The join and where clauses could be combined in this case:
delete x
from myTable x
join myTable z
on x.subscriberEmail = z.subscriberEmail
and x.id > z.id
为防止重复,请考虑将subscriberEmail 列设为UNIQUE 索引列.
For preventing duplicates, consider making the subscriberEmail column a UNIQUE indexed column.
这篇关于删除重复的行只留下最旧的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!