ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Open Source ScyllaDB for Administrators Procedures Scylla Best Practices Maximizing Scylla Performance

Caution

You're viewing documentation for a previous version. Switch to the latest stable version.

Maximizing Scylla PerformanceCopy

The purpose of this guide is to provide an overview of the best practices for maximizing the performance of Scylla, the next-generation NoSQL database. Even though Scylla auto-tunes for optimal performance, users still need to apply best practices in order to get the most out of their Scylla deployments.

Performance Tips SummaryCopy

If you are not planning to read this document fully, then here are the most important parts of this guide:

  • use the best hardware you can reasonably afford

  • install Scylla Monitoring Stack

  • run scylla_setup script

  • use Cassandra stress test

  • expect to get at least 12.5K operations per second (OPS) per physical core for simple operations on selected hardware

Scylla Design AdvantagesCopy

Scylla is different from any other NoSQL database. It achieves the highest levels of performance and takes full control of the hardware by utilizing all of the server cores in order to provide strict SLAs for low-latency operations. If you run Scylla in an over-committed environment, performance won’t just be linearly slower &emdash; it will tank completely.

This is because Scylla has a reactor design that runs on all the (configured) cores and a scheduler that assumes a 0.5 ms tick. Scylla does everything it can to control queues in userspace and not in the OS/drives. Thus, it assumes the bandwidth that was measured by scylla_setup.

It is not that difficult to get the best performance out of Scylla. Mostly, it is automatically tuned as long as you do not work against the system. The remainder of this document contains the best practices to follow to make sure that Scylla keeps tuning itself and tht your performance has maximum results.

Install Scylla Monitoring StackCopy

Install and use the Scylla Monitoring Stack; it gives excellent additional value beyond performance. If you don’t know what your bottleneck is, you have not configured your system correctly. The Scylla monitoring stack dashboards will help you sort this out.

With the recent addition of the Scylla Advisor to the Scylla Monitoring Stack, it is even easier to find potential issues.

Install Scylla ManagerCopy

Install and use Scylla Manager <https://manager.docs.scylladb.com> together with the Scylla Monitoring Stack. Scylla Manager provides automated backups and repairs of your database. Scylla Manager can manage multiple Scylla clusters and run cluster-wide tasks in a controlled and predictable way. For example, with Scylla Manager you can control the intensity of a repair, increasing it to speed up the process, or lower the intensity to ensure it minimizes impact on ongoing operations.

Run scylla_setupCopy

Before running Scylla, it is critical that the scylla_setup script has been executed. Scylla doesn’t require manual optimization &emdash; it is the task of the scylla_setup script to determine the optimal configuration. But, if scylla_setup has not run, the system won’t be configured optimally. Refer to the System Configuration guide for details.

Benchmarking Best PracticesCopy

Use a Representative EnvironmentCopy

  • Execute benchmarks on an environment that reflects your production.

  • Benchmarking on the wrong environment can easily lead to an order of magnitude performance difference. For example, on a laptop, you could do 20K OPS while on a dedicated server you could easily do 200K OPS. So unless you have your production system running on a laptop, do not benchmark on a laptop.

  • It is recommended to automate benchmarking with tools such as Terraform/Ansible to ensure repetitive tests run smoothly.

  • If you are using shared hardware in a containerized/virtualized environment, note that a single guest can increase latency in the other guests.

  • Make sure that you do not underprovision your load generators to prevent the load generators from becoming the bottleneck.

Use a Representative Data ModelCopy

Tools, such as cassandra-stress, use a default data model that does not reflect the actions that you will perform in production. For example, the cassandra-stress default data model has a replication factor set to 1 and uses the LOCAL_ONE as a consistency level.

Although cassandra_stress is convenient to get some initial performance impressions, it is critical to benchmark the same/similar data model that is used in production. Therefore we recommend that you use a custom data model. For more information refer to the user mode section.

Use Representative DatasetsCopy

If you run the benchmark with a dataset that is smaller than your production data, you may have misleading or incorrect results due to the reduced number of I/O operations. Therefore, it is critical to configure the size of the dataset to reflect your production dataset size.

Use a Representative LoadCopy

Run the benchmark using a load that represents, as closely as possible, the load you anticipate will be used in production. This includes the queries submitted by the load generator. The read/write ratio is important due to the overhead of compaction and finding the right data on disk.

Proper Warmup and DurationCopy

When benchmarking, it is important that the system is given time to warm up. This allows the database to fill the cache. In addition, it is critical to run the benchmarks long enough so that at least one compaction is triggered.

Latency Tests vs. Throughput TestsCopy

When performing a load test you need to differentiate between a latency test and a throughput test. With a throughput test, you measure the maximum throughput by sending a new request as soon as the previous request completes. With a latency test, you pin the throughput at a fixed rate. In both cases, latency is measured.

Most engineers will start with a throughput test, but often a latency test is a better choice because the desired throughput is known e.g. 1M op/s. Especially if your production depends on meeting the needs of the SLA For example, the 99.99 percentile should have a latency less than 10ms.

Coordinated OmissionCopy

A common problem when measuring latencies is the coordinated omission problem that causes the worst latencies to be omitted from the metrics. As a result, it renders the higher percentiles useless. A tool such as cassandra-stress prevents coordinated omissions from occurring. For more information, read this article.

Don’t Average PercentilesCopy

Another typical problem with benchmarks is that when a load is generated by multiple load generators, the percentiles are averaged. The correct way to determine the percentiles over multiple load generators is to merge the latency distribution of each load generator and then to determine the percentiles. If this isn’t an option, then the next best alternative is to take the maximum (the p99, for example) of each of the load generators. The actual p99 will be equal or smaller than the maximum p99. For more information on percentiles, read this blog.

Use Proven Benchmark ToolsCopy

Instead of rolling out custom benchmarks, use proven tools like cassandra-stress. It is very flexible and takes care of coordinated omission. Yahoo! Cloud Serving Benchmark (YCSB) is also an option, but needs to be configured correctly to prevent coordinated omission. TLP-stress is not recommended because it suffers from coordinated omission. When benchmarking make sure that cassandra-stress that is part of the Scylla distribution is used because it contains the shard aware drivers.

Use the Same Benchmark ToolCopy

When benchmarking with different tools, it is very easy to run into an apples vs. oranges comparison. When comparing products, use the same benchmark tool if possible.

Reproducible ResultsCopy

Make sure that the outcomes of the benchmark are reproducible; so execute your tests at least twice. If the outcomes are different, then the benchmark results are unreliable. One potential cause could be that the old data set of a previous benchmark has not been cleaned and this can make a performance difference for writes.

Query RecommendationsCopy

Correct Data ModelingCopy

The key to a well-performing system is using a properly defined data model. A poorly structured data model can easily lead to an order of magnitude performance difference compared to that of a proper model.

A few of the most important tips:

  • Choose the right partition key and clustering keys. Reduce or even eliminate the amount of data that needs to be scanned.

  • Add indexes where appropriate.

  • Partitions that are accessed more than others (hot partitions) should be avoided because it causes load imbalances between CPUs and nodes.

  • Large partitions, large rows and large cells should be avoided because it can cause high latencies.

Use Prepared StatementsCopy

Prepared statements provide better performance because: parsing is done once, token/shard aware routing and less data is sent. Apart from performance improvements, prepared statements also increase security because it prevents CQL injection. Read more about Stop Wasting Scylla’s CPU Time by Not Being Prepared.

Use Paged QueriesCopy

It is best to run queries that return a small number of rows. However, if a query can return many rows, then the unpaged query can lead to a huge memory bubble. This will eventually cause Scylla to kill the query. With a paged query, the execution collects a page’s worth of data and new pages are retrieved on demand, leading to smaller memory bubbles. Read about More Efficient Query Paging.

Don’t Use Reverse QueriesCopy

When using a query with an ORDER BY clause, you need to make sure that the order is the same as the order in the data model. Otherwise you run into a problem called reverse queries which can cause unbound memory usage and killed queries.

Use Workload PrioritizationCopy

In a typical application there are operational workloads that require low latency. Sometimes these run in parallel with analytic workloads that process high volumes of data and do not require low latency. With workload prioritization, one can prevent that the analytic workloads lead to an unwanted high latency on operational workload. Workload prioritization is only available with Scylla Enterprise.

Bypass CacheCopy

There are certain workloads, e.g. analytical workloads, that scan through all data. By default ScyllaDB will try to use cache, but since the data won’t be used again, it leads to cache pollution: i.e. good data gets pushes out of the cache and replaced by useless data,

As a consequence it can lead to bad latency on operational workloads due to increased rate of cache misses. To prevent this problem, queries from analytical workloads can bypass the cache using the ‘bypass cache’ option.

See Bypass Cache for more information.

BatchingCopy

Multiple CQL queries to the same partition can be batched into a single query. Imagine a query where the round trip time is 0.9 ms and the service time is 0.1 ms. Without batching the total latency would be 10x(0.9+0.1)=10.0 ms. But if you created a batch of 10 instructions, the total time would be 0.9+10*0.1=1.9 ms. This is 19% of the latency compared to no batching.

Driver GuidelinesCopy

Use the Scylla drivers that are available for Java, Python, Go, and C/C++. They provide much better performance than third-party drivers because they are shard aware &emdash; they can route requests to the right CPU core (shard). When the driver starts, it gets the topology of the cluster and therefore it knows exactly which CPU core should get a request. Our latest shard-aware drivers also improve the efficiency of our Change Data Capture (CDC) feature. If the Scylla drivers are not an option, make sure that at least a token aware driver is used so that one round trip is removed.

Check if there are sufficient connections created by the client, otherwise performance could suffer. The general rule is between 1-3 connections per Scylla CPU per node.

Hardware GuidelinesCopy

CPU Core Count guidelinesCopy

Scylla, by default, will make use of all of its CPUs cores and is designed to perform well on powerful machines and as a consequence fewer machines are needed. The recommended minimum number of CPU cores per node for operational workloads is 20.

The rule of thumb is that a single physical CPU can process 12.5 K queries per second with a payload of up to 1 KB. If a single node should process 400K queries per second, at least 32 physical CPUs or 64 hyper-threaded cores are required. In cloud environments hyper-threaded cores are often called virtual CPUs (vCPUs) or just cores. So it is important to determine if a virtual CPU is the same as a physical CPU or if it is a hyper threaded CPU.

Scylla relies on temporarily spinning the CPU instead of blocking and waiting for data to arrive. This is done to reduce latency due to reduced context switching. The drawback is that the CPUs are 100% utilized and you might falsely conclude that Scylla can’t keep up with the load. Read more about Scylla System Requirements.

Memory GuidelinesCopy

During startup, Scylla claims nearly all of the available memory for itself. This is done for caching purposes to reduce the number of I/O operations. So the more memory available, the better the performance.

Scylla recommends at least 2 GB of memory per core and a minimum of 16 GB of memory for a system (pick the highest value). This means if you have a 64 core system, you should have at least 2x64=128 GB of memory.

The max recommended ratio of storage/memory for good performance is 30:1. So for a system with 128 GB of memory, the recommended upper bound on the storage capacity is 3.8 TB per node of data. To store 6 TB of data per node, the minimum recommended amount of memory is 200 GB.

Read more about Scylla System Requirements or Starting Scylla in a Shared Environment.

Storage GuidelinesCopy

Scylla utilizes the full potential of modern NVMe SSDs; so the faster drive, the better the performance. If there is more than one SSD, it is recommended to use them as RAID 0 for the best performance. This is configured during scylla_setup and Scylla will create the RAID device automatically. If there is limited SSD capacity, the commit log should be placed on the SSD.

The recommended file system is XFS because of its asynchronous appending write support and is the primary file system ScyllaDB is tested with.

As SSD’s wear out over time, it is recommended to re-run the iotune tool every few months. This helps Scylla’s IO scheduler to make the best performing choices.

Read more about Scylla System Requirements.

Networking GuidelinesCopy

For operational workloads the minimum recommended network bandwidth is 10 Gbps. The scylla_setup script takes care of optimizing the kernel parameters, IRQ handling etc.

Read more about Scylla Network Requirements.

Cloud Compute Instance RecommendationsCopy

Scylla is designed to utilize all hardware resources. Bare metal instances are preferred for best performance.

Read more about Starting Scylla in a Shared Environment.

Image GuidelinesCopy

Use the Scylla provided AMI on AWS EC2 or the Google Cloud Platform (CGP) image, if possible. They have already been correctly configured for use in those public cloud environments.

AWSCopy

AWS EC2 i3, i3en and cd5 bare metal instances are highly recommended because they are optimized for high I/O.

Read more about Scylla Supported Platforms.

If bare metal isn’t an option, use Nitro based instances and run with ‘host’ as tenancy policy to prevent the instance being shared with other VM’s. If Nitro isn’t possible, then use instance storage over EBS. If instance store is not an option, use an io2 IOPS provisioned SSD for best performance. If there is limited support for instance storage, place the commit log there. There is a new instance type available called r5b that has high EBS performance.

GCPCopy

For GCP we recommend n1/n2-highmem with local SSDs.

Read more at: https://docs.scylladb.com/getting-started/system-requirements/#google-compute-engine-gce

AzureCopy

For Azure we recommend the Lsv2 series. They feature high throughput and low latency and have local NVMe storage. Read more about Azure Requirements.

DockerCopy

When running in Docker platform, please use CPU pinning and host networking for best performance. Read more about The Cost of Containerization for Your Scylla.

KubernetesCopy

Just as with Docker, on a Kubernetes environment CPU pinning should be used as well. In this case the pod should be pinned to a CPU so that no sharing takes place.

Read more about Exploring Scylla on Kubernetes.

Data CompactionCopy

When records get updated or deleted, the old data eventually needs to be deleted. This is done using compaction. The compaction settings can make a huge difference.

  • Use the following Compaction Strategy Matrix to use the correct compaction strategy for your workload.

  • ICS is an incremental compaction strategy that combines the low space amplification of LCS with the low write amplification of STCS. It is only available with Scylla Enterprise.

  • If you have time series data, the TWCS should be used.

Read more about Compaction Strategies

Consistency LevelCopy

The consistency level determines how many nodes the coordinator should wait for, for the read or write is considered a success. The consistency level is determined by the application based on the requirement for consistency, availability and performance. The higher the consistency, the lower the availability and the performance.

For single data center setups a frequently used consistency level for both reads and writes is QUORUM. It gives a nice balance between consistency and availability/performance. For multi datacenter setups it is best to use LOCAL_QUORUM.

Read more about Fault Tolerance

Replication FactorCopy

The default replication factor is set to 3 and in most cases this is a sensible default because it provides a good balance between performance and availability. Keep in mind that a write will always be sent to all replicas, no matter the consistency level.

Asynchronous RequestsCopy

Use asynchronous requests can help to increase the throughput of the system. If the latency would be 1 ms, then 1 thread at most could do 1000 QPS. But if the service time an operation takes 100 us, with pipelining the throughput could increase to 10.000 QPS.

To prevent overload due to asynchronous requests, the drivers limit the number of pending requests to prevent overloading the server.

Read more about Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases for more information.

ConclusionCopy

Maximizing Scylla performance does require some effort even though Scylla will do its best to reduce the amount of configuration. If the best practices are correctly applied, then most common performance problems will be prevented.

Was this page helpful?

PREVIOUS
How to Avoid Node Mismanagement
NEXT
Benchmarking ScyllaDB
  • Create an issue
  • Edit this page

On this page

  • Maximizing Scylla Performance
    • Performance Tips Summary
    • Scylla Design Advantages
    • Install Scylla Monitoring Stack
    • Install Scylla Manager
    • Run scylla_setup
    • Benchmarking Best Practices
      • Use a Representative Environment
      • Use a Representative Data Model
      • Use Representative Datasets
      • Use a Representative Load
      • Proper Warmup and Duration
      • Latency Tests vs. Throughput Tests
      • Coordinated Omission
      • Don’t Average Percentiles
      • Use Proven Benchmark Tools
      • Use the Same Benchmark Tool
      • Reproducible Results
    • Query Recommendations
      • Correct Data Modeling
      • Use Prepared Statements
      • Use Paged Queries
      • Don’t Use Reverse Queries
      • Use Workload Prioritization
      • Bypass Cache
      • Batching
    • Driver Guidelines
    • Hardware Guidelines
      • CPU Core Count guidelines
      • Memory Guidelines
      • Storage Guidelines
      • Networking Guidelines
    • Cloud Compute Instance Recommendations
      • Image Guidelines
      • AWS
      • GCP
      • Azure
      • Docker
      • Kubernetes
    • Data Compaction
    • Consistency Level
    • Replication Factor
    • Asynchronous Requests
    • Conclusion
ScyllaDB Open Source
  • 5.2
    • master
    • 6.2
    • 6.1
    • 6.0
    • 5.4
    • 5.2
    • 5.1
  • Getting Started
    • Install ScyllaDB
      • ScyllaDB Web Installer for Linux
      • ScyllaDB Unified Installer (relocatable executable)
      • Air-gapped Server Installation
      • What is in each RPM
      • ScyllaDB Housekeeping and how to disable it
      • ScyllaDB Developer Mode
      • ScyllaDB Configuration Reference
    • Configure ScyllaDB
    • ScyllaDB Requirements
      • System Requirements
      • OS Support by Linux Distributions and Version
      • ScyllaDB in a Shared Environment
    • Migrate to ScyllaDB
      • Migration Process from Cassandra to Scylla
      • Scylla and Apache Cassandra Compatibility
      • Migration Tools Overview
    • Integration Solutions
      • Integrate Scylla with Spark
      • Integrate Scylla with KairosDB
      • Integrate Scylla with Presto
      • Integrate Scylla with Elasticsearch
      • Integrate Scylla with Kubernetes
      • Integrate Scylla with the JanusGraph Graph Data System
      • Integrate Scylla with DataDog
      • Integrate Scylla with Kafka
      • Integrate Scylla with IOTA Chronicle
      • Integrate Scylla with Spring
      • Shard-Aware Kafka Connector for Scylla
      • Install Scylla with Ansible
      • Integrate Scylla with Databricks
    • Tutorials
  • ScyllaDB for Administrators
    • Administration Guide
    • Procedures
      • Cluster Management
      • Backup & Restore
      • Change Configuration
      • Maintenance
      • Best Practices
      • Benchmarking Scylla
      • Migrate from Cassandra to Scylla
      • Disable Housekeeping
    • Security
      • ScyllaDB Security Checklist
      • Enable Authentication
      • Enable and Disable Authentication Without Downtime
      • Generate a cqlshrc File
      • Reset Authenticator Password
      • Enable Authorization
      • Grant Authorization CQL Reference
      • Role Based Access Control (RBAC)
      • ScyllaDB Auditing Guide
      • Encryption: Data in Transit Client to Node
      • Encryption: Data in Transit Node to Node
      • Generating a self-signed Certificate Chain Using openssl
      • Encryption at Rest
      • LDAP Authentication
      • LDAP Authorization (Role Management)
    • Admin Tools
      • Nodetool Reference
      • CQLSh
      • REST
      • Tracing
      • Scylla SStable
      • Scylla Types
      • SSTableLoader
      • cassandra-stress
      • SSTabledump
      • SSTable2json
      • Scylla Logs
      • Seastar Perftune
      • Virtual Tables
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
    • ScyllaDB Manager
    • Upgrade Procedures
      • ScyllaDB Open Source Upgrade
      • ScyllaDB Open Source to ScyllaDB Enterprise Upgrade
      • ScyllaDB Image
      • ScyllaDB Enterprise
    • System Configuration
      • System Configuration Guide
      • scylla.yaml
      • ScyllaDB Snitches
    • Benchmarking ScyllaDB
  • ScyllaDB for Developers
    • Learn To Use ScyllaDB
      • Scylla University
      • Course catalog
      • Scylla Essentials
      • Basic Data Modeling
      • Advanced Data Modeling
      • MMS - Learn by Example
      • Care-Pet an IoT Use Case and Example
    • Scylla Alternator
    • Scylla Features
      • Scylla Open Source Features
      • Scylla Enterprise Features
    • Scylla Drivers
      • Scylla CQL Drivers
      • Scylla DynamoDB Drivers
    • Workload Attributes
  • CQL Reference
    • CQLSh: the CQL shell
    • Appendices
    • Compaction
    • Consistency Levels
    • Consistency Level Calculator
    • Data Definition
    • Data Manipulation
    • Data Types
    • Definitions
    • Global Secondary Indexes
    • Additional Information
    • Expiring Data with Time to Live (TTL)
    • Additional Information
    • Functions
    • JSON Support
    • Materialized Views
    • Non-Reserved CQL Keywords
    • Reserved CQL Keywords
    • ScyllaDB CQL Extensions
  • ScyllaDB Architecture
    • ScyllaDB Ring Architecture
    • ScyllaDB Fault Tolerance
    • Consistency Level Console Demo
    • ScyllaDB Anti-Entropy
      • Scylla Hinted Handoff
      • Scylla Read Repair
      • Scylla Repair
    • SSTable
      • ScyllaDB SSTable - 2.x
      • ScyllaDB SSTable - 3.x
    • Compaction Strategies
    • Raft Consensus Algorithm in ScyllaDB
  • Troubleshooting ScyllaDB
    • Errors and Support
      • Report a Scylla problem
      • Error Messages
      • Change Log Level
    • ScyllaDB Startup
      • Ownership Problems
      • Scylla will not Start
      • Scylla Python Script broken
    • Upgrade
      • Inaccessible configuration files after ScyllaDB upgrade
    • Cluster and Node
      • Failed Decommission Problem
      • Cluster Timeouts
      • Node Joined With No Data
      • SocketTimeoutException
      • NullPointerException
    • Data Modeling
      • Scylla Large Partitions Table
      • Scylla Large Rows and Cells Table
      • Large Partitions Hunting
    • Data Storage and SSTables
      • Space Utilization Increasing
      • Disk Space is not Reclaimed
      • SSTable Corruption Problem
      • Pointless Compactions
      • Limiting Compaction
    • CQL
      • Time Range Query Fails
      • COPY FROM Fails
      • CQL Connection Table
      • Reverse queries fail
    • ScyllaDB Monitor and Manager
      • Manager and Monitoring integration
      • Manager lists healthy nodes as down
  • Knowledge Base
    • Upgrading from experimental CDC
    • Compaction
    • Counting all rows in a table is slow
    • CQL Query Does Not Display Entire Result Set
    • When CQLSh query returns partial results with followed by “More”
    • Run Scylla and supporting services as a custom user:group
    • Decoding Stack Traces
    • Snapshots and Disk Utilization
    • DPDK mode
    • Debug your database with Flame Graphs
    • How to Change gc_grace_seconds for a Table
    • Gossip in Scylla
    • Increase Permission Cache to Avoid Non-paged Queries
    • How does Scylla LWT Differ from Apache Cassandra ?
    • Map CPUs to Scylla Shards
    • Scylla Memory Usage
    • NTP Configuration for Scylla
    • Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
    • POSIX networking for Scylla
    • Scylla consistency quiz for administrators
    • Recreate RAID devices
    • How to Safely Increase the Replication Factor
    • Scylla and Spark integration
    • Increase Scylla resource limits over systemd
    • Scylla Seed Nodes
    • How to Set up a Swap Space
    • Scylla Snapshots
    • Scylla payload sent duplicated static columns
    • Stopping a local repair
    • System Limits
    • How to flush old tombstones from a table
    • Time to Live (TTL) and Compaction
    • Scylla Nodes are Unresponsive
    • Update a Primary Key
    • Using the perf utility with Scylla
    • Configure Scylla Networking with Multiple NIC/IP Combinations
  • ScyllaDB FAQ
  • Contribute to ScyllaDB
  • Glossary
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • ScyllaDB Alternator for DynamoDB users
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 08 May 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6
Yes
No
User Slack channel
Community forum
Collapse