重复的雪花

在我的用例中,计划好的作业读取CSV并写入雪花。

当我计划从CSV读取并每小时写入雪花时,我看到雪花中有多个重复项。尽管我的ID是PRIMARY KEY(ALTER TABLE表名,ADD PRIMARY KEY,第1列)。

我了解Snowflake支持定义和维护约束,但不强制执行约束,但始终强制执行的NOT NULL约束除外。我需要帮助来解决此问题。

为了详细说明,让我们考虑以下场景:

Step 1: At 9AM insert data from CSV to Snowflake ID Customer name Price 1111 John Mathew 10 1112 David Becham 20

Step 2: At 10PM I get one additional row hence my CSV is ID Customer name Price 1111 John Mathew 10 1112 David Becham 20 1113 Hello World 40

预计在雪花

ID Customer name Price 1111 John Mathew 10 1112 David Becham 20 1113 Hello World 40

What I get is duplicates as below ID Customer name Price 1111 John Mathew 10 1112 David Becham 20 1113 Hello World 40 1111 John Mathew 10 1112 David Becham 20

评论
  • 水立方
    水立方 回复

    It would help if you provided your code. It looks like you are updating your CSV, which means Snowflake sees the entire file as a new file to be loaded, which will then load the entire file, again. If you are just running a COPY INTO command with no downstream logic, then that is what will happen.

    两种选择:

    1) don't update the CSV file...just create a new one with just the new data. Then, the COPY INTO command will work fine.

    2) if you are also receiving updates to previous records, then you should run a COPY INTO into a temporary table and then MERGE that data into your final table on the primary key.