How to Compare 2 Hive Partitions?

Nixon Data How to Compare 2 Hive Partitions?

To compare two Hive partitions, you can use a similar approach as when comparing two Hive tables. Here are a few options:

  • Use the EXCEPT operator: You can use the EXCEPT operator to compare the rows in two partitions and return only the rows that are present in the first partition but not in the second. For example:

SELECT * FROM table1
WHERE partition_column = ‘partition1’
EXCEPT
SELECT * FROM table1
WHERE partition_column = ‘partition2’;

  • Use the MINUS operator: You can use the MINUS operator, which is equivalent to the EXCEPT operator, to compare the rows in two partitions and return only the rows that are present in the first partition but not in the second. For example:

SELECT * FROM table1
WHERE partition_column = ‘partition1’
MINUS
SELECT * FROM table1
WHERE partition_column = ‘partition2’;

  • Use the NOT IN operator: You can use the NOT IN operator to compare the rows in two partitions and return only the rows that are present in the first partition but not in the second. For example:

SELECT * FROM table1
WHERE partition_column = ‘partition1’
AND column1 NOT IN (SELECT column1 FROM table1 WHERE partition_column = ‘partition2’);

  • Use the LEFT JOIN operator: You can use the LEFT JOIN operator to compare the rows in two partitions and return all the rows from the left partition (partition1), along with any matching rows from the right partition (partition2). If there is no match, the right side will be filled with NULL values. For example:

SELECT * FROM table1 partition1
LEFT JOIN table1 partition2
ON partition1.column1 = partition2.column1
AND partition1.partition_column = ‘partition1’
AND partition2.partition_column = ‘partition2’;

  • You can then use the WHERE clause to filter out the rows where the values are the same in both partitions. For example:

SELECT * FROM table1 partition1
LEFT JOIN table1 partition2
ON partition1.column1 = partition2.column1
AND partition1.partition_column = ‘partition1’
AND partition2.partition_column = ‘partition2’
WHERE partition1.column2 <> partition2.column2 OR partition2.column2 IS NULL;

This will return all the rows from partition1 that have different values in column2 compared to partition2, or where there is no matching row in partition2.