Was this page helpful?
Warning
This statement is not final and is subject to change without notice and in backwards-incompatible ways.
Note
The target audience of this statement and therefore that of this document is people who are familiar with the internals of ScyllaDB.
The SELECT * FROM MUTATION_FRAGMENTS()
statement allows for reading the raw underlying mutations (data) from a table.
This is intended to be used as a diagnostics tool to debug performance or correctness issues, where inspecting the raw underlying data, as scylla stores it, is desired.
So far this was only possible with sstables, using a tool like ScyllaDB SStable.
This statement allows inspecting the content of the row-cache, as well as that of individual memtables, in addition to individual sstables.
The statement has to be used on an existing table, by using a regular SELECT
query, which wraps the table name in MUTATION_FRAGMENTS()
. For example, to dump all mutations from my_keyspace.my_table
:
SELECT * FROM MUTATION_FRAGMENTS(my_keyspace.my_table);
The schema of the statement, and therefore the columns available to select and to restrict, are different from that of the underlying table. The output schema is computed from the schema of the underlying table, as follows:
The partition key columns are copied as-is
The clustering key is computed as follows:
mutation_source text
partition_region byte
The clustering columns of the underlying table
position_weight byte
Regular columns:
mutation_fragment_kind text
metadata text
value text
So for a table with the following definition:
CREATE TABLE my_keyspace.my_table (
pk1 int,
pk2 text,
ck1 byte,
ck2 text,
col1 text,
col2 text,
PRIMARY KEY ((pk1, pk2), ck1, ck2)
);
The transformed schema would look like this:
CREATE TABLE "my_keyspace.my_table_$mutation_fragments"(
pk1 int,
pk2 text,
mutation_source text,
partition_region byte,
ck1 byte,
ck2 text,
position_weight byte,
mutation_fragment_kind text,
metadata text,
value text,
PRIMARY KEY ((pk1, pk2), mutation_source, partition_region, ck1, ck2, position_weight)
);
Note how the partition-key columns are identical, the clustering key columns are derived from that of the underlying table and the regular columns are completely replaced.
Each row in the output represents a mutation-fragment in the underlying representation, and each partition in the output represents a mutation in the underlying representation.
The mutation source the mutation originates from. It has the following format: ${mutation_source_kind}[:${mutation_source_id}]
.
Where mutation_source_kind
is one of:
memtable
row-cache
sstable
And the mutation_source_id
is used to distinguish individual mutation sources of the same kind, where applicable:
memtable
- a numeric id, starting from 0
row-cache
- N/A, there is only a single cache per table
sstable
- the path of the sstable
The numeric representation of the enum
with the same name:
enum class partition_region : uint8_t {
partition_start, // 0
static_row, // 1
clustered, // 2
partition_end, // 3
};
The reason for using the underlying numeric representation, instead of the name, is to sort mutation-fragments in their natural order.
The position-weight of the underlying mutation-fragment, describing its relation to the clustering key in its position. This is either:
-1
- before
0
- at
1
- after
The reason for using the underlying numeric representation, instead of the human-readeable text, is to sort mutation-fragments in their natural order.
The kind of the mutation fragment, the row represents. One of:
partition start
static row
clustering row
range tombstone change
partition end
This is the text representation of the enum class mutation_fragment_v2_kind
. Since this is a regular column, the human readable name is used.
The content of the mutation-fragment represented as JSON, without the values, if applicable.
This is uses the same JSON schema as scylla sstable dump-data.
Content of metadata
column for various mutation fragment kinds:
mutation fragment kind |
Content |
---|---|
partition start |
|
static row |
|
clustering row |
|
range tombstone change |
|
partition end |
N/A |
JSON symbols are represented as $SYMBOL_NAME
, the definition of these can be found in scylla sstable dump-data.
The value of the mutation-fragment, represented as JSON, if applicable.
Only static row
and clustering row
fragments have values.
The JSON schema of both is that of the $COLUMNS
JSON symbol.
See scylla sstable dump-data for the definition of these.
Only the value
field is left in cell objects ($REGULAR_CELL
, $COUNTER_SHARDS_CELL
, $COUNTER_UPDATE_CELL
and $FROZEN_COLLECTION
) and the cells
field in collection objects ($COLLECTION
).
The reason for extracting this out into a separate column, is to allow deselecting the potentially large values, de-cluttering the CQL output, and reducing the amount of data that has to transferred.
Data is read locally, from the node which receives the query, so replica is always the same node as the coordinator. The query cannot be migrated between nodes. If a query is paged, all its pages have to be served by the same coordinator. This is enforced, and any attempt to migrate the query to another coordinator will result in the query being aborted. Note that by default, drivers use round robin load balancing policies, and consequently they will attempt to read each page from a different coordinator.
The statement can output rows with a non-full clustering prefix.
Given a table, with the following definition:
CREATE TABLE ks.tbl (
pk int,
ck int,
v int,
PRIMARY KEY (pk, ck)
);
And the following content:
cqlsh> DELETE FROM ks.tbl WHERE pk = 0;
cqlsh> DELETE FROM ks.tbl WHERE pk = 0 AND ck > 0 AND ck < 2;
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 0, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 1, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 2, 0);
cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (1, 0, 0);
cqlsh> SELECT * FROM ks.tbl;
pk | ck | v
----+----+---
1 | 0 | 0
0 | 0 | 0
0 | 1 | 0
0 | 2 | 0
(4 rows)
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl);
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
1 | memtable:0 | 0 | | | {"tombstone":{}} | partition start | null
1 | memtable:0 | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | memtable:0 | 3 | | | null | partition end | null
0 | memtable:0 | 0 | | | {"tombstone":{"timestamp":1688122848686316,"deletion_time":"2023-06-30 11:00:48z"}} | partition start | null
0 | memtable:0 | 2 | 0 | 0 | {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} | clustering row | {"v":"0"}
0 | memtable:0 | 2 | 0 | 1 | {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} | range tombstone change | null
0 | memtable:0 | 2 | 1 | 0 | {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} | clustering row | {"v":"0"}
0 | memtable:0 | 2 | 2 | -1 | {"tombstone":{}} | range tombstone change | null
0 | memtable:0 | 2 | 2 | 0 | {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} | clustering row | {"v":"0"}
0 | memtable:0 | 3 | | | null | partition end | null
(10 rows)
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 1;
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
1 | memtable:0 | 0 | | | {"tombstone":{}} | partition start | null
1 | memtable:0 | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | memtable:0 | 3 | | | null | partition end | null
(3 rows)
This works just like selecting a partition from the base table.
Note how after insertion, all data is in the memtable (see above). After flushing the memtable, this will look like this:
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 1;
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+------------------------------------------------------------------------------------------------------------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 0 | | | {"tombstone":{}} | partition start | null
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 3 | | | null | partition end | null
(3 rows)
After executing a read on the queried partition of the underlying table, the data will also be included in the row-cache:
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 1;
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+------------------------------------------------------------------------------------------------------------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
1 | row-cache | 0 | | | {"tombstone":{}} | partition start | null
1 | row-cache | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | row-cache | 3 | | | null | partition end | null
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 0 | | | {"tombstone":{}} | partition start | null
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 3 | | | null | partition end | null
(6 rows)
It is possible to restrict the output to a single mutation source, or mutation source kind:
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 1 AND mutation_source = 'row-cache';
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
1 | row-cache | 0 | | | {"tombstone":{}} | partition start | null
1 | row-cache | 2 | 0 | 0 | {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} | clustering row | {"v":"0"}
1 | row-cache | 3 | | | null | partition end | null
(3 rows)
Select only clustering elements:
cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 0 AND partition_region = 2 ALLOW FILTERING;
pk | mutation_source | partition_region | ck | position_weight | metadata | mutation_fragment_kind | value
----+------------------------------------------------------------------------------------------------------------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+-----------
0 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 0 | 0 | {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} | clustering row | {"v":"0"}
0 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 0 | 1 | {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} | range tombstone change | null
0 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 1 | 0 | {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} | clustering row | {"v":"0"}
0 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 2 | -1 | {"tombstone":{}} | range tombstone change | null
0 | sstable:/var/lib/scylla/data/ks/tbl-259b2520104011ee822ed2e489876007/me-3g79_0ur3_48e402ejkwsvj7viqr-big-Data.db | 2 | 2 | 0 | {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} | clustering row | {"v":"0"}
(5 rows)
Count range tombstone changes:
cqlsh> SELECT COUNT(*) FROM MUTATION_FRAGMENTS(ks.tbl) WHERE pk = 0 AND mutation_fragment_kind = 'range tombstone change' ALLOW FILTERING;
count
-------
2
(1 rows)
Was this page helpful?
On this page