Generally, there is no acceptable or defined value for the table skew. But the thumb rule is to keep it less than 4.
To understand this, let’s see an example.
You have a table with 150 rows and the cluster has 3 nodes.
- Node 1 – 100 rows
- Node 2 – 48 rows
- Node 3 – 2 rows
How the skew is calculated?
The ratio between the max rows on a node and the least rows on a node.
100/2 = 50. So the skew here is 50.
But its a very small table, even though the skew is high, there is no impact, but think about a large table.
- Node 1 – 50000000
- Node 2 – 30000000
- Node 3 – 40000000
Skew is 1.67
Here the skew is very small but the impact is too high while scanning your data.
So it’s up to you to decide for a particular table is this skew fine or needs to be optimized.
CLICK HERE to find out more related problems solutions.