Duplicated Data
Assume that a translation table has a column messageId
and a column createdAt
. Which accidentally has more than one records:
messageId createdAt translation 123 1600000000000 I am James 123 1700000000000 I am James 123 1800000000000 I am James 321 1750000000000 Hi James 321 1850000000000 Hi James, nice to meet you
We want to reduce these groups of records into simply one per group, let's say only the latest one:
messageId createdAt translation 123 1800000000000 I am James 321 1850000000000 Hi James, nice to meet you
Here is how we do it easily:
SQL Script to Remove Duplicated Records
delete from "MessageTranslation" where id not in ( -- all translation that is latest among dulicated message select distinct on ("messageId") "MessageTranslation".id from "MessageTranslation" order by "MessageTranslation"."messageId", "MessageTranslation"."createdAt" desc )
-
The highlighted lines represent the
id
's of translation record that is latest among the set of messages having the samemessageId
.Reason.
-
We first order the results by
messageId
-
next we further order them by
createdAt desc
-
We
select distinct on ("messageId")
to select the first occurence of results "grouped by" theorder by
clause.Note that if we need to order by () columns, we would need to
distinct on
the first columns.
-
-
Finally we have selected desired results, we delete those that are not desired.