ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Open Source CQL Reference Compaction

Caution

You're viewing documentation for a previous version. Switch to the latest stable version.

Compaction¶

This document describes the compaction strategy options available when creating a table. For more information about creating a table in Scylla, refer to the CQL Reference.

By default, Scylla starts a compaction task whenever a new SSTable is written. Compaction merges several SSTables into a new SSTable, which contains only the live data from the input SSTables. Merging several sorted files to get a sorted result is an efficient process, and this is the main reason why SSTables are kept sorted.

The following compaction strategies are supported by Scylla:

  • Size-tiered Compaction Strategy (STCS)

  • Leveled Compaction Strategy (LCS)

  • Incremental Compaction Strategy (ICS)

  • Time-window Compaction Strategy (TWCS)

  • Date-tiered Compaction Strategy (DTCS) - use TWCS instead

This page concentrates on the parameters to use when creating a table with a compaction strategy. If you are unsure which strategy to use or want general information on the compaction strategies which are available to Scylla, refer to Compaction Strategies.

Common options¶

The following options are available for all compaction strategies.

compaction = {
  'class' : 'compaction_strategy_name',
  'enabled' : (true | false),
  'tombstone_threshold' : ratio,
  'tombstone_compaction_interval' : sec}
class (default: SizeTieredCompactionStrategy)

Selects the compaction strategy.

It can be one of the following. If you are unsure which one to choose, refer to Compaction Strategies :

  • SizeTieredCompactionStrategy

  • TimeWindowCompactionStrategy

  • LeveledCompactionStrategy


enabled (default: true)

Runs background compaction (known as “minor compaction”). Can be one of the following:

  • true - runs minor compaction

  • false - disable minor compaction


tombstone_threshold (default: 0.2)

The ratio (expressed as a decimal) of garbage-collectable tombstones compared to the data. When this threshold is exceeded on a specific table, a single SSTable compaction begins. Acceptable values are numbers in the range 0 -1.


tombstone_compaction_interval (default: 86400s (1 day))

An SSTable that is suitable for single SSTable compaction, according to tombstone_threshold will not be compacted if it is newer than tombstone_compaction_interval.


Size Tiered Compaction Strategy (STCS)¶

When using STCS, SSTables are put in different buckets depending on their size. When an SSTable is bucketed, the average size of the tables is compared to the new table as well as the high and low threshold levels.

The database compares each SSTable size to the average of all SSTable sizes on the node. It calculates bucket_low * avg_bucket_size and bucket_high * avg_bucket_size and then compares the result with the average SSTable size. The conditions set for bucket_high and bucket_low dictate if successive tables will be added to the same bucket. When compaction begins it merges SSTables whose size in KB are within [average-size * bucket_low] and [average-size * bucket_high].

Once the min_threashold is reached, minor compaction begins.

STCS options¶

The following options only apply to SizeTieredCompactionStrategy:

compaction = {
  'class' : 'SizeTieredCompactionStrategy',
  'bucket_high' : factor,
  'bucket_low' : factor,
  'min_sstable_size' : int,
  'min_threshold' : num_sstables,
  'max_threshold' : num_sstables}
bucket_high (default: 1.5)

A new SSTable is added to the bucket if the SSTable size is less than bucket_high * the average size of that bucket (and if the bucket_low condition also holds).

For example, if ‘bucket_high = 1.5’ and the SSTable size = 14MB, does it belong to a bucket with an average size of 10MB?

Yes, because the SSTable size = 14`, which is less than ‘bucket_high’ * average bucket size = 15.

So, the SSTable will be added to the bucket, and the bucket’s average size will be recalculated.


bucket_low (default: 0.5)

A new SSTable is added to the bucket if the SSTable size is more than bucket_low* the average size of that bucket (and if the bucket_high condition also holds).

For example, if ‘bucket_high = 0.5’ and the SSTable size is 10MB, does it belong to a bucket with an average size is 15MB?

Yes, because the SSTable size = 10 which is more than ‘bucket_low’ * average bucket size = 7.5.

So, the SSTable will be added to the bucket, and the bucket’s average size will be recalculated.


min_sstable_size (default: 50)

All SSTables smaller than this number of bytes are put into the same bucket.


min_threshold (default: 4)

Minimum number of SSTables that need to belong to the same size bucket before compaction is triggered on that bucket. If your SSTables are small, use min_sstable_size to define a size threshold (in bytes) below which all SSTables belong to one unique bucket.

Note

Enforcement of min_threshold is controlled by the compaction_enforce_min_threshold configuration option in the scylla.yaml configuration settings. By default, compaction_enforce_min_threshold: false, meaning the Size-Tiered Compaction Strategy will compact any bucket containing 2 or more SSTables. Otherwise, if compaction_enforce_min_threshold: true, the value of min_threshold is considered and only those buckets that contain at least min_threshold SSTables will be compacted.


max_threshold (default: 32)

Maximum number of SSTables that will be compacted together in one compaction step.

Leveled Compaction Strategy (LCS)¶

The compaction class LeveledCompactionStrategy (LCS) creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is ten times as large as the previous level.

LCS options¶

compaction = {
  'class' : 'LeveledCompactionStrategy',
  'sstable_size_in_mb' : int}
sstable_size_in_mb (default: 160)

This is the target size in megabytes, that will be used as the goal for an SSTable size following a compression. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible that compaction could produce a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large.


Incremental Compaction Strategy (ICS)¶

Added in version 2019.1.4: Scylla Enterprise

Note

ICS is only available for Scylla Enterprise customers

When using ICS, SSTable runs are put in different buckets depending on their size. When an SSTable run is bucketed, the average size of the runs in the bucket is compared to the new run, as well as the bucket_high and bucket_low levels.

The database compares each SSTable-run size to the average of all SSTable-run sizes on all buckets in the node. It calculates bucket_low * avg_bucket_size and bucket_high * avg_bucket_size and then compares the result with the average SSTable-run size. The conditions set for bucket_high and bucket_low dictate if successive runs will be added to the same bucket. When compaction begins it merges SSTable runs whose size in KB are within [average-size * bucket_low] and [average-size * bucket_high].

Once there are multiple runs in a bucket, minor compaction begins. The minimum number of SSTable runs that triggers minor compaction is either 2 or min_threshold, if the compaction_enforce_min_threshold configuration option is set in the scylla.yaml configuration file.

ICS options¶

The following options only apply to IncrementalCompactionStrategy:

compaction = {
  'class' : 'IncrementalCompactionStrategy',
  'bucket_high' : factor,
  'bucket_low' : factor,
  'min_sstable_size' : int,
  'min_threshold' : num_sstables,
  'max_threshold' : num_sstables,
  'sstable_size_in_mb' : int,
  'space_amplification_goal' : double}

bucket_high (default: 1.5)

A new SSTable is added to the bucket if the SSTable size is less than bucket_high * the average size of that bucket (and if the bucket_low condition also holds). For example, if ‘bucket_high = 1.5’ and the SSTable size = 14MB, does the SSTable belong to a bucket with an average size of 10MB? Yes, because the SSTable size = 14, which is less than ‘bucket_high’ * average bucket size = 15. So, the SSTable will be added to the bucket, and the bucket’s average size will be recalculated.


bucket_low (default: 0.5)

A new SSTable is added to the bucket if the SSTable size is more than bucket_low * the average size of that bucket (and if the bucket_high condition also holds). For example, if ‘bucket_high = 0.5’ and the SSTable size is 10MB, does the SSTable belong to a bucket with an average size of 15MB? Yes, because the SSTable size = 10, which is more than ‘bucket_low’ * average bucket size = 7.5. So, the SSTable will be added to the bucket, and the bucket’s average size will be recalculated.


min_sstable_size (default: 50)

All SSTables smaller than this number of megabytes are put into the same bucket.

Unlike Apache Cassandra, scylla uses uncompressed size when bucketing similar-sized tiers together. Since compaction works on uncompressed data, SSTables containing similar amounts of data should be compacted together, even when they have different compression ratios.


min_threshold (default: 4)

Minimum number of SSTable runs that need to belong to the same size bucket before compaction is triggered on that bucket.

Note

Enforcement of min_threshold is controlled by the compaction_enforce_min_threshold configuration option in the scylla.yaml configuration settings. By default, compaction_enforce_min_threshold=false, meaning the Incremental Compaction Strategy will compact any bucket containing 2 or more SSTable runs. Otherwise, if compaction_enforce_min_threshold=true, the value of min_threshold is considered and only those buckets that contain at least min_threshold SSTable runs will be compacted.


max_threshold (default: 32)

Maximum number of SSTables that will be compacted together in one compaction step.


sstable_size_in_mb (default: 1000)

This is the target size in megabytes, that will be used as the goal for an SSTable size (fragment size) following a compression.


space_amplification_goal (default: null)

ScyllaDB Enterprise

Added in version 2020.1.6.

This is a threshold of the ratio of the sum of the sizes of the two largest tiers to the size of the largest tier, above which ICS will automatically compact the second largest and largest tiers together to eliminate stale data that may have been overwritten, expired, or deleted. The space_amplification_goal is given as a double-precision floating point number that must be greater than 1.0.

For example, if ‘space_amplification_goal = 1.25’ and the largest tier holds 1000GB, when the second-largest tier accumulates SSTables with the total size of 250GB or more, the space_amplification_goal threshold is crossed and all the SSTables in the largest and second-largest tiers will be compacted together.


Time Window CompactionStrategy (TWCS)¶

The basic concept is that TimeWindowCompactionStrategy will create 1 SSTable per file for a given time window.

Caution

  • We strongly recommend using a single TTL value for any given table.

  • This means sticking to the default time to live as specified in the table’s schema.

  • Using multiple TTL values for a given table may lead to inefficiency when purging expired data, because an SSTable will remain until all of its data is expired.

  • Tombstone compaction can be enabled to remove data from partially expired SSTables, but this creates additional WA (write amplification).

Caution

Avoid overwriting data and deleting data explicitly at all costs, as this can potentially block an expired SSTable from being purged, due to the checks that are performed to avoid data resurrection.

TWCS options¶

compaction = {
  'class' : 'TimeWindowCompactionStrategy',
  'compaction_window_unit' : string,
  'compaction_window_size' : int,
  'expired_sstable_check_frequency_seconds' : int,
  'min_threshold' : num_sstables,
  'max_threshold' : num_sstables}
compaction_window_unit (default: DAYS)

A time unit used to determine the window size which can be one of the following:

  • 'MINUTES'

  • 'HOURS'

  • 'DAYS'


compaction_window_size (default: 1)

The number of units which will make up a window.


expired_sstable_check_frequency_seconds (default: 600)

Specifies (in seconds) how often Scylla will check for fully expired SSTables, which can be immediately dropped.


min_threshold (default: 4)

Minimum number of SSTables that need to belong to the same size bucket before compaction is triggered on that bucket.


max_threshold (default: 32)

Maximum number of SSTables that will be compacted together in one compaction step.


See Also¶

  • Apache Cassandra Query Language (CQL) Reference

  • Compaction Strategies

  • Compaction Overview

Copyright

© 2016, The Apache Software Foundation.

Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Was this page helpful?

PREVIOUS
Appendices
NEXT
Consistency Levels
  • Create an issue
  • Edit this page

On this page

  • Compaction
    • Common options
    • Size Tiered Compaction Strategy (STCS)
      • STCS options
    • Leveled Compaction Strategy (LCS)
      • LCS options
    • Incremental Compaction Strategy (ICS)
      • ICS options
    • Time Window CompactionStrategy (TWCS)
      • TWCS options
    • See Also
ScyllaDB Open Source
  • 5.2
    • master
    • 6.2
    • 6.1
    • 6.0
    • 5.4
    • 5.2
    • 5.1
  • Getting Started
    • Install ScyllaDB
      • ScyllaDB Web Installer for Linux
      • ScyllaDB Unified Installer (relocatable executable)
      • Air-gapped Server Installation
      • What is in each RPM
      • ScyllaDB Housekeeping and how to disable it
      • ScyllaDB Developer Mode
      • ScyllaDB Configuration Reference
    • Configure ScyllaDB
    • ScyllaDB Requirements
      • System Requirements
      • OS Support by Linux Distributions and Version
      • ScyllaDB in a Shared Environment
    • Migrate to ScyllaDB
      • Migration Process from Cassandra to Scylla
      • Scylla and Apache Cassandra Compatibility
      • Migration Tools Overview
    • Integration Solutions
      • Integrate Scylla with Spark
      • Integrate Scylla with KairosDB
      • Integrate Scylla with Presto
      • Integrate Scylla with Elasticsearch
      • Integrate Scylla with Kubernetes
      • Integrate Scylla with the JanusGraph Graph Data System
      • Integrate Scylla with DataDog
      • Integrate Scylla with Kafka
      • Integrate Scylla with IOTA Chronicle
      • Integrate Scylla with Spring
      • Shard-Aware Kafka Connector for Scylla
      • Install Scylla with Ansible
      • Integrate Scylla with Databricks
    • Tutorials
  • ScyllaDB for Administrators
    • Administration Guide
    • Procedures
      • Cluster Management
      • Backup & Restore
      • Change Configuration
      • Maintenance
      • Best Practices
      • Benchmarking Scylla
      • Migrate from Cassandra to Scylla
      • Disable Housekeeping
    • Security
      • ScyllaDB Security Checklist
      • Enable Authentication
      • Enable and Disable Authentication Without Downtime
      • Generate a cqlshrc File
      • Reset Authenticator Password
      • Enable Authorization
      • Grant Authorization CQL Reference
      • Role Based Access Control (RBAC)
      • ScyllaDB Auditing Guide
      • Encryption: Data in Transit Client to Node
      • Encryption: Data in Transit Node to Node
      • Generating a self-signed Certificate Chain Using openssl
      • Encryption at Rest
      • LDAP Authentication
      • LDAP Authorization (Role Management)
    • Admin Tools
      • Nodetool Reference
      • CQLSh
      • REST
      • Tracing
      • Scylla SStable
      • Scylla Types
      • SSTableLoader
      • cassandra-stress
      • SSTabledump
      • SSTable2json
      • Scylla Logs
      • Seastar Perftune
      • Virtual Tables
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
    • ScyllaDB Manager
    • Upgrade Procedures
      • ScyllaDB Open Source Upgrade
      • ScyllaDB Open Source to ScyllaDB Enterprise Upgrade
      • ScyllaDB Image
      • ScyllaDB Enterprise
    • System Configuration
      • System Configuration Guide
      • scylla.yaml
      • ScyllaDB Snitches
    • Benchmarking ScyllaDB
  • ScyllaDB for Developers
    • Learn To Use ScyllaDB
      • Scylla University
      • Course catalog
      • Scylla Essentials
      • Basic Data Modeling
      • Advanced Data Modeling
      • MMS - Learn by Example
      • Care-Pet an IoT Use Case and Example
    • Scylla Alternator
    • Scylla Features
      • Scylla Open Source Features
      • Scylla Enterprise Features
    • Scylla Drivers
      • Scylla CQL Drivers
      • Scylla DynamoDB Drivers
    • Workload Attributes
  • CQL Reference
    • CQLSh: the CQL shell
    • Appendices
    • Compaction
    • Consistency Levels
    • Consistency Level Calculator
    • Data Definition
    • Data Manipulation
    • Data Types
    • Definitions
    • Global Secondary Indexes
    • Additional Information
    • Expiring Data with Time to Live (TTL)
    • Additional Information
    • Functions
    • JSON Support
    • Materialized Views
    • Non-Reserved CQL Keywords
    • Reserved CQL Keywords
    • ScyllaDB CQL Extensions
  • ScyllaDB Architecture
    • ScyllaDB Ring Architecture
    • ScyllaDB Fault Tolerance
    • Consistency Level Console Demo
    • ScyllaDB Anti-Entropy
      • Scylla Hinted Handoff
      • Scylla Read Repair
      • Scylla Repair
    • SSTable
      • ScyllaDB SSTable - 2.x
      • ScyllaDB SSTable - 3.x
    • Compaction Strategies
    • Raft Consensus Algorithm in ScyllaDB
  • Troubleshooting ScyllaDB
    • Errors and Support
      • Report a Scylla problem
      • Error Messages
      • Change Log Level
    • ScyllaDB Startup
      • Ownership Problems
      • Scylla will not Start
      • Scylla Python Script broken
    • Upgrade
      • Inaccessible configuration files after ScyllaDB upgrade
    • Cluster and Node
      • Failed Decommission Problem
      • Cluster Timeouts
      • Node Joined With No Data
      • SocketTimeoutException
      • NullPointerException
    • Data Modeling
      • Scylla Large Partitions Table
      • Scylla Large Rows and Cells Table
      • Large Partitions Hunting
    • Data Storage and SSTables
      • Space Utilization Increasing
      • Disk Space is not Reclaimed
      • SSTable Corruption Problem
      • Pointless Compactions
      • Limiting Compaction
    • CQL
      • Time Range Query Fails
      • COPY FROM Fails
      • CQL Connection Table
      • Reverse queries fail
    • ScyllaDB Monitor and Manager
      • Manager and Monitoring integration
      • Manager lists healthy nodes as down
  • Knowledge Base
    • Upgrading from experimental CDC
    • Compaction
    • Counting all rows in a table is slow
    • CQL Query Does Not Display Entire Result Set
    • When CQLSh query returns partial results with followed by “More”
    • Run Scylla and supporting services as a custom user:group
    • Decoding Stack Traces
    • Snapshots and Disk Utilization
    • DPDK mode
    • Debug your database with Flame Graphs
    • How to Change gc_grace_seconds for a Table
    • Gossip in Scylla
    • Increase Permission Cache to Avoid Non-paged Queries
    • How does Scylla LWT Differ from Apache Cassandra ?
    • Map CPUs to Scylla Shards
    • Scylla Memory Usage
    • NTP Configuration for Scylla
    • Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
    • POSIX networking for Scylla
    • Scylla consistency quiz for administrators
    • Recreate RAID devices
    • How to Safely Increase the Replication Factor
    • Scylla and Spark integration
    • Increase Scylla resource limits over systemd
    • Scylla Seed Nodes
    • How to Set up a Swap Space
    • Scylla Snapshots
    • Scylla payload sent duplicated static columns
    • Stopping a local repair
    • System Limits
    • How to flush old tombstones from a table
    • Time to Live (TTL) and Compaction
    • Scylla Nodes are Unresponsive
    • Update a Primary Key
    • Using the perf utility with Scylla
    • Configure Scylla Networking with Multiple NIC/IP Combinations
  • ScyllaDB FAQ
  • Contribute to ScyllaDB
  • Glossary
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • ScyllaDB Alternator for DynamoDB users
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 08 May 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6