Compaction is the process in which HBase combines small files (HStoreFiles) into bigger ones.
Its of two types
Minor : When it take FEW number of files which are placed together and make them one.
Major : When it takes all the files in region and make them one.
This post covers the minor compaction.
If you want to read about major compaction , please read other post. How HBase major compaction works . I suggest you to read minor compaction first.
Lets see what decides the term FEW in minor compaction
The following properties effect minor compaction
hbase.hstore.compaction.min | Minimum number of StoreFiles per Store to be selected for a compaction to occur (default 2). |
hbase.hstore.compaction.max | Maximum number of StoreFiles to compact per minor compaction (default 10). |
hbase.hstore.compaction.min.size | Any StoreFile smaller than this setting with automatically be a candidate for compaction. |
hbase.hstore.compaction.max.size | Any StoreFile larger than this setting with automatically be excluded from compaction |
hbase.store.compaction.ratio | Ratio used in compaction file selection algorithm |
The file which would be used for minor compaction is decided based on following logic
Note the size of file
selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.
Quoting example from official book
Consider following configuration settings
hbase.store.compaction.ratio = 1.0f
hbase.hstore.compaction.min = 3 (files)
hbase.hstore.compaction.max = 5 (files)
hbase.hstore.compaction.min.size = 10 (bytes)
hbase.hstore.compaction.max.size = 1000 (bytes)
The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
Remember the logic
selects a file for compaction when the file size <= sum(smaller_files_size) * hbase.hstore.compaction.ratio.
100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
50 --> No, because sum(23, 12, 12) * 1.0 = 47.
23 --> Yes, because sum(12, 12) * 1.0 = 24.
12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
Hope this helps in understanding HBase minor compaction
Hadoop Hadooping :)
No comments:
Post a Comment
Please share your views and comments below.
Thank You.