Was this page helpful?
Caution
You're viewing documentation for an unstable version of ScyllaDB Open Source. Switch to the latest stable version.
This is an article on how to use the ScyllaDB Docker image to start up a ScyllaDB node, access nodetool
and cqlsh
utilities, start a cluster of ScyllaDB nodes, configure data volume for storage, configure resource limits of the Docker container, use additional command line flags and overwrite scylla.yaml
settings. Finally, there is an additional section with some basic usage of ScyllaDB within Docker.
See also the image description on Docker Hub or our original blog.
Please note that these instructions assume that you have configured Docker so that you can run it as a regular user. Usually, this is done by adding the user to a Docker group. See your platform-specific Docker installation documentation on how to do that (see, for example, instructions for Fedora and Ubuntu). If you have not configured a Docker group, you need to prefix the Docker commands with sudo to have sufficient permissions to run them.
NOTE: You should allocate a minimum of 1.5 GB of RAM per container.
To start a single ScyllaDB node instance in a Docker container, run:
docker run --name some-scylla -d scylladb/scylla
If you’re on macOS and plan to start a multi-node cluster (3 nodes or more), start ScyllaDB with
–reactor-backend=epoll
to override the default linux-aio
reactor backend:
docker run --name some-scylla -d scylladb/scylla --reactor-backend=epoll
The docker run
command starts a new Docker instance in the background named some-scylla that runs the ScyllaDB server:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
616ee646cb9d scylladb/scylla "/docker-entrypoint.p" 4 seconds ago Up 4 seconds 7000-7001/tcp, 9042/tcp, 9160/tcp, 10000/tcp some-scylla
As seen from the docker ps
output, the image exposes ports 7000-7001 (Inter-node RPC), 9042 (CQL), and 10000 (REST API).
To access ScyllaDB server logs, you can use the docker logs
command:
docker logs some-scylla | tail
INFO 2016-11-09 10:27:48,191 [shard 6] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 4] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 3] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 1] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
The Docker image also has ScyllaDB’s utilities installed. Nodetool is a command line tool for querying and managing a ScyllaDB cluster. The simplest nodetool
command is nodetool status
, which displays information about the cluster state:
docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.2 125 KB 256 100.0% c1906b2b-ce0c-4890-a9d4-8c360f111ad0 rack1
The cqlsh
tool (CQL Shell) is an interactive Cassandra Query Language (CQL) shell for querying and manipulating data in the ScyllaDB cluster.
To start an interactive session, run the following command:
docker exec -it some-scylla cqlsh
Connected to Test Cluster at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
and then run CQL queries against the cluster:
cqlsh> SELECT cluster_name FROM system.local;
cluster_name
--------------
Test Cluster
(1 rows)
With a single some-scylla
instance running, joining new nodes to form a cluster is easy:
docker run --name some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
If you’re on macOS, ensure to add the –reactor-backend=epoll
option when adding new nodes:
docker run --name some-scylla2 -d scylladb/scylla --reactor-backend=epoll --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
To query when the node is up and running (and view the status of the entire cluster) use the nodetool status
command:
docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.3 177.48 KB 256 100.0% 097caff5-892d-412f-af78-11d572795d6f rack1
UN 172.17.0.2 125 KB 256 100.0% c1906b2b-ce0c-4890-a9d4-8c360f111ad0 rack1
The Docker image uses supervisord
to manage ScyllaDB processes. You can restart ScyllaDB in a Docker container using:
docker exec -it some-scylla supervisorctl restart scylla
The default filesystem in Docker is inadequate for anything else than just testing out ScyllaDB, but you can use Docker volumes for improving storage performance.
To use data volumes, ensure first that it’s on a ScyllaDB-supported filesystem like XFS, then create a ScyllaDB data directory /var/lib/scylla
on the host. This will be used by ScyllaDB container to store all data:
sudo mkdir -p /var/lib/scylla/data /var/lib/scylla/commitlog
Then launch ScyllaDB instances using Docker’s --volume
command line option to mount the created host directory as a data volume in the container and disable ScyllaDB’s developer mode to run I/O tuning before starting up the ScyllaDB node.
docker run --name some-scylla --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla --developer-mode=0
Sometimes, it’s not possible to adjust ScyllaDB-specific settings (including non-network properties, like cluster_name
) directly from the command line when ScyllaDB is running within Docker.
Instead, it may be necessary to incrementally override scylla.yaml
settings by passing an external, master ScyllaDB.yaml file when starting the Docker container for the node.
To do this, you can use the --volume (-v)
command as before to specify the overriding .yaml
file:
NOTE: you can create a master_scylla.yaml
in current host dir: just copy the file from https://github.com/scylladb/scylla/blob/master/conf/scylla.yaml
.
On the host, create and edit master_scylla.yaml
, for example. Uncomment and change the “cluster_name” parameter.
Start the ScyllaDB node, with the command to override scylla.yaml
with master_scylla.yaml
:
docker run --name some-scylla --volume ~/master_scylla.yaml:/etc/scylla/scylla.yaml -d scylladb/scylla
NOTE: You can start a Docker node with any other alternate parameter configured in scylla.yaml
using this technique.
Finally, you can check that the setting was changed:
docker exec -it some-scylla nodetool describecluster
Cluster Information:
Name: Doobie Snarf
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
34259144-0f3f-305f-a777-2811e30e17b3: [172.17.0.2]
By default, our Docker image defaults to a mode where ScyllaDB’s architectural optimizations are not enabled. With these command-line settings, you can introduce incremental changes that boost your ScyllaDB performance on Docker even more.
ScyllaDB uses all CPUs and memory by default. To configure resource limits for your Docker container, you can use the --smp
, --memory
, and --cpuset
command line options documented in the section “Network and command-line settings” below.
The recommended way to run multiple ScyllaDB instances on the same physical hardware is by statically partitioning all resources. For example, using the --cpuset
option to assign cores 0
and 1
to one instance, and 2
and 3
to another.
In scenarios in which static partitioning is not desired - like mostly-idle cluster without hard latency requirements, the --overprovisioned
command-line option is recommended. This enables certain optimizations for ScyllaDB to run efficiently in an overprovisioned environment.
NOTE: You should allocate a minimum of 1.5 GB of RAM per container.
The ScyllaDB image supports many command line options that are passed to the Docker run command. Keep in mind that these command-line settings override the corresponding settings in your scylla.yaml
.
The --seeds
command line option configures ScyllaDB’s seed nodes. If no --seeds
option is specified, ScyllaDB uses its own IP address as the seed.
For example, to configure ScyllaDB to run with two seed nodes 192.168.0.100
and 192.168.0.200
.
docker run --name some-scylla -d scylladb/scylla --seeds 192.168.0.100,192.168.0.200
The --listen-address
command line option configures the IP address the ScyllaDB instance listens for client connections.
For example, to configure ScyllaDB to use listen address 10.0.0.5
:
docker run --name some-scylla -d scylladb/scylla --listen-address 10.0.0.5
The --broadcast-address
command line option configures the IP address the ScyllaDB instance tells other ScyllaDB nodes in the cluster to connect to.
For example, to configure ScyllaDB to use broadcast address 10.0.0.5
:
docker run --name some-scylla -d scylladb/scylla --broadcast-address 10.0.0.5
The --broadcast-rpc-address
command line option configures the IP address the ScyllaDB instance tells clients to connect to.
For example, to configure ScyllaDB to use broadcast RPC address 10.0.0.5
:
docker run --name some-scylla -d scylladb/scylla --broadcast-rpc-address 10.0.0.5
The --smp
command line option restricts ScyllaDB to COUNT
number of CPUs. The option does not, however, mandate a specific placement of CPUs. See the --cpuset
command line option if you need ScyllaDB to run on specific CPUs.
For example, to restrict ScyllaDB to 2 CPUs:
docker run --name some-scylla -d scylladb/scylla --smp 2
The --memory
command line option restricts ScyllaDB to use up to AMOUNT
of memory. The AMOUNT
value supports both M
unit for megabytes and G
unit for gigabytes.
For example, to restrict ScyllaDB to 4 GB of memory:
docker run --name some-scylla -d scylladb/scylla --memory 4G
**NOTE: You should allocate a minimum of 1.5 GB of RAM per container.**
The --overprovisioned
command line option enables or disables optimizations for running ScyllaDB in an overprovisioned environment. If no --overprovisioned
option is specified, ScyllaDB defaults to running with optimizations enabled.
For example, to enable optimizations for running in an overprovisioned environment:
docker run --name some-scylla -d scylladb/scylla --overprovisioned 1
The --cpuset
command line option restricts ScyllaDB to run on only on CPUs specified by CPUSET
. The CPUSET
value is either a single CPU (e.g. --cpuset 1
), a range (e.g. --cpuset 2-3
), or a list (e.g. --cpuset 1,2,5
), or a combination of the last two options (e.g. --cpuset 1-2,5
).
For example, to restrict ScyllaDB to run on physical CPUs 0 to 2 and 4:
docker run --name some-scylla -d scylladb/scylla --cpuset 0-2,4
The --developer-mode
command line option enables ScyllaDB’s developer mode, which relaxes checks for things like XFS and enables ScyllaDB to run on unsupported configurations (which usually results in suboptimal performance). If no --developer-mode
command line option is defined, ScyllaDB defaults to running with developer mode enabled.
It is highly recommended to disable developer mode for production deployments to ensure ScyllaDB is able to run with maximum performance.
To disable developer mode:
docker run --name some-scylla -d scylladb/scylla --developer-mode 0
The --experimental-features
command line option enables ScyllaDB’s experimental feature individually. If no feature flags are specified, ScyllaDB runs with only stable features enabled.
Running experimental features in production environments is not recommended.
For example, to enable the User Defined Functions (UDF) feature:
docker run --name some-scylla -d scylladb/scylla --experimental-features=udf
docker exec -it some-scylla scylla --version
First, download the file locally to the node:
sudo docker exec -it some-scylla.2.0.1 curl -o file.csv https://<url>.com/<path>/<path>/<file>.csv
Once you have the .csv
downloaded, you can use the CQL COPY FROM
command as explained here to load the data into ScyllaDB.
Such a copy command might look like this:
cqlsh:my_keyspace> COPY <table_name> FROM 'file.csv' WITH HEADER=true;
scylla.yaml
can be found at /etc/scylla/scylla.yaml
. In this case, you can search for a specific entry in the file. For example, if you wanted to determine if a setup was experimental and were to search for experimental
in the file, you could try:
docker exec -it some-scylla grep -H 'experimental' /etc/scylla/scylla.yaml
Copyright
© 2016, The Apache Software Foundation.
Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Was this page helpful?