Was this page helpful?
Caution
You're viewing documentation for a previous version of ScyllaDB Open Source. Switch to the latest stable version.
CQL supports two main categories of functions:
The scalar functions, which simply take a number of values and produce an output with it.
The aggregate functions, which are used to aggregate multiple rows of results from a
SELECT
statement.
In both cases, CQL provides a number of native “hard-coded” functions as well as the ability to create new user-defined functions.
Note
Although user-defined functions are sandboxed, protecting the system from a “rogue” function, user-defined functions are disabled by default for extra security.
See the enable_user_defined_functions
in scylla.yaml
to enable them.
Additionally, user-defined functions are still experimental and need to be explicitly enabled by adding udf
to the list of
experimental_features
configuration options in scylla.yaml
, or turning on the experimental
flag.
See Enabling Experimental Features for details.
Supported starting from ScyllaDB version 2.1
The cast
function can be used to convert one native datatype to another.
The following table describes the conversions supported by the cast
function. ScyllaDB will silently ignore any cast converting a cast datatype into its own datatype.
From |
To |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The conversions rely strictly on Java’s semantics. For example, the double value 1 will be converted to the text value ‘1.0’. For instance:
SELECT avg(cast(count as double)) FROM myTable
The token
function computes a token for a given partition key. The exact signature of the token function
depends on the table concerned and on the partitioner used by the cluster.
The arguments of the token
depend on the type of the partition key columns that are used. The return type depends on
the partitioner in use:
For Murmur3Partitioner, the return type is bigint
.
For instance, in a cluster using the default Murmur3Partitioner, if a table is defined by:
CREATE TABLE users (
userid text PRIMARY KEY,
username text,
)
The token
function accepts single argument of type text
(in that case, the partition key is userid
(there are no clustering columns, so the partition key is the same as the primary key)), and the return type will be
bigint
.
The uuid
function takes no parameters and generates a random type 4 uuid suitable for use in INSERT
or
UPDATE
statements.
Note
We called the values generated by minTimeuuid
and maxTimeuuid
fake UUID because they do no respect
the Time-Based UUID generation process specified by the RFC 4122. In
particular, the value returned by these two methods will not be unique. This means you should only use those methods
for querying (as in the example above). Inserting the result of those methods is almost certainly a bad idea.
Added in version 2.3.
The following functions can be used to retrieve the date/time at the time where the function is invoked:
Function name |
Output type |
---|---|
|
|
|
|
|
|
|
|
For example, to retrieve data up to today, run the following query:
SELECT * FROM myTable WHERE date >= currentDate()
A number of functions are provided to “convert” a timeuuid
, a timestamp
, or a date
into another native
type.
Function name |
Input type |
Description |
---|---|---|
|
|
Converts the |
|
|
Converts the |
|
|
Converts the |
|
|
Converts the |
|
|
Converts the |
|
|
Converts the |
|
|
Converts the |
|
|
Similar to |
|
|
Similar to |
A number of functions are provided to “convert” the native types into binary data (blob
). For every
<native-type>
type
supported by CQL (a notable exception is a blob
, for obvious reasons), the function
typeAsBlob
takes an argument of type type
and returns it as a blob
. Conversely, the function blobAsType
takes a 64-bit blob
argument and converts it to a bigint
value. For example, bigintAsBlob(3)
is
0x0000000000000003
and blobAsBigint(0x0000000000000003)
is 3
.
User-defined functions (UDFs) execute user-provided code in ScyllaDB. Supported languages are currently Lua and WebAssembly.
UDFs are part of the ScyllaDB schema and are automatically propagated to all nodes in the cluster. UDFs can be overloaded, so that multiple UDFs with different argument types can have the same function name, for example:
CREATE FUNCTION sample ( arg int ) ...;
CREATE FUNCTION sample ( arg text ) ...;
When calling a user-defined function, arguments can be literals or terms. Prepared statement placeholders can be used, too.
Creating a new user-defined function uses the CREATE FUNCTION
statement. For example:
CREATE OR REPLACE FUNCTION div(dividend double, divisor double)
RETURNS NULL ON NULL INPUT
RETURNS double
LANGUAGE LUA
AS 'return dividend/divisor;';
CREATE FUNCTION
with the optional OR REPLACE
keywords creates either a function
or replaces an existing one with the same signature. A CREATE FUNCTION
without OR REPLACE
fails if a function with the same signature already exists. If the optional IF NOT EXISTS
keywords are used, the function will only be created only if another function with the same
signature does not exist. OR REPLACE
and IF NOT EXISTS
cannot be used together.
Behavior for null input values must be defined for each function:
RETURNS NULL ON NULL INPUT
declares that the function will always return null (without being executed) if any of the input arguments is null.
CALLED ON NULL INPUT
declares that the function will always be executed.
Signatures are used to distinguish individual functions. The signature consists of a fully-qualified function name of the <keyspace>.<function_name> and a concatenated list of all the argument types.
Note that keyspace names, function names and argument types are subject to the default naming conventions and case-sensitivity rules.
Functions belong to a keyspace; if no keyspace is specified, the current keyspace is used. User-defined functions are not allowed in the system keyspaces.
Dropping a function uses the DROP FUNCTION
statement. For example:
DROP FUNCTION myfunction;
DROP FUNCTION mykeyspace.afunction;
DROP FUNCTION afunction ( int );
DROP FUNCTION afunction ( text );
You must specify the argument types of the function, the arguments_signature, in the drop command if there are multiple overloaded functions with the same name but different signatures.
DROP FUNCTION
with the optional IF EXISTS
keywords drops a function if it exists, but does not throw an error if it doesn’t.
Aggregate functions work on a set of rows. They receive values for each row and return one value for the whole set.
If normal
columns, scalar functions
, UDT
fields, writetime
, or ttl
are selected together with
aggregate functions, the values returned for them will be the ones of the first row matching the query.
Note
The LIMIT
and PER PARTITION LIMIT
used in the SELECT
query will have no effect if the limit is greater
than or equal to 1 because they are applied to the output of the aggregate functions (which return one value for
the whole set of rows).
The count
function can be used to count the rows returned by a query. Example:
SELECT COUNT (*) FROM plays;
SELECT COUNT (1) FROM plays;
It also can be used to count the non-null value of a given column:
SELECT COUNT (scores) FROM plays;
Note
Counting all rows in a table may be time-consuming and exceed the default timeout. In such a case, see Counting all rows in a table is slow for instructions.
The max
and min
functions can be used to compute the maximum and the minimum value returned by a query for a
given column. For instance:
SELECT MIN (players), MAX (players) FROM plays WHERE game = 'quake';
The sum
function can be used, to sum up all the values returned by a query for a given column. For instance:
SELECT SUM (players) FROM plays;
The avg
function can be used to compute the average of all the values returned by a query for a given column. For
instance:
SELECT AVG (players) FROM plays;
User-defined aggregates allow the creation of custom aggregate functions. User-defined aggregates can be used in SELECT statement.
Each aggregate requires an initial state of type STYPE
defined with the INITCOND
value (default value: null
). The first argument of the state function must have type STYPE. The remaining arguments of the state function must match the types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by the state function becomes the new state. After all rows are processed, the optional FINALFUNC is executed with the last state value as its argument.
The STYPE
value is mandatory in order to distinguish possibly overloaded versions of the state and/or final function, since the overload can appear after creation of the aggregate.
A complete working example for user-defined aggregates (assuming that a keyspace has been selected using the USE
statement):
CREATE FUNCTION accumulate_len(acc tuple<bigint,bigint>, a text)
RETURNS NULL ON NULL INPUT
RETURNS tuple<bigint,bigint>
LANGUAGE lua as 'return {acc[1] + 1, acc[2] + #a}';
CREATE OR REPLACE FUNCTION present(res tuple<bigint,bigint>)
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE lua as
'return "The average string length is " .. res[2]/res[1] .. "!"';
CREATE OR REPLACE AGGREGATE avg_length(text)
SFUNC accumulate_len
STYPE tuple<bigint,bigint>
FINALFUNC present
INITCOND (0,0);
The CREATE AGGREGATE
command with the optional OR REPLACE
keywords creates either an aggregate or replaces an existing one with the same signature. A CREATE AGGREGATE
without OR REPLACE
fails if an aggregate with the same signature already exists. The CREATE AGGREGATE
command with the optional IF NOT EXISTS
keywords creates an aggregate if it does not already exist. The OR REPLACE
and IF NOT EXISTS
phrases cannot be used together.
The STYPE
value defines the type of the state value and must be specified. The optional INITCOND
defines the initial state value for the aggregate; the default value is null. A non-null INITCOND
must be specified for state functions that are declared with RETURNS NULL ON NULL INPUT
.
The SFUNC
value references an existing function to use as the state-modifying function. The first argument of the state function must have type STYPE
. The remaining arguments of the state function must match the types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by the state function becomes the new state. State is not updated for state functions declared with RETURNS NULL ON NULL INPUT
and called with null. After all rows are processed, the optional FINALFUNC
is executed with last state value as its argument. It must take only one argument with type STYPE
, but the return type of the FINALFUNC
may be a different type. A final function declared with RETURNS NULL ON NULL INPUT
means that the aggregate’s return value will be null, if the last state is null.
If no FINALFUNC
is defined, the overall return type of the aggregate function is STYPE
. If a FINALFUNC
is defined, it is the return type of that function.
Dropping an user-defined aggregate function uses the DROP AGGREGATE statement. For example:
DROP AGGREGATE myAggregate;
DROP AGGREGATE myKeyspace.anAggregate;
DROP AGGREGATE someAggregate ( int );
DROP AGGREGATE someAggregate ( text );
The DROP AGGREGATE
statement removes an aggregate created using CREATE AGGREGATE
. You must specify the argument types of the aggregate to drop if there are multiple overloaded aggregates with the same name but a different signature.
The DROP AGGREGATE
command with the optional IF EXISTS
keywords drops an aggregate if it exists, and does nothing if a function with the signature does not exist.
Apache Cassandra Query Language
Copyright
© 2016, The Apache Software Foundation.
Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Was this page helpful?
On this page