ScyllaDB: High-Performance NoSQL Database
ScyllaDB is a high-performance, distributed NoSQL database that's compatible with Apache Cassandra and Amazon DynamoDB. It's designed to provide ultra-low latency and high throughput for modern applications that require massive scale.
Overview
ScyllaDB is a drop-in replacement for Apache Cassandra that offers significantly better performance through its C++ implementation and shared-nothing architecture. It's designed for applications that need:
- Ultra-low latency: Sub-millisecond response times
- High throughput: Millions of operations per second
- Linear scalability: Add nodes to increase capacity
- Fault tolerance: Built-in replication and consistency
- Cassandra compatibility: Drop-in replacement for existing Cassandra applications
Key Features
🚀 Performance Features
- C++ Implementation: Native performance without JVM overhead
- Shared-Nothing Architecture: No shared resources between nodes
- Async I/O: Non-blocking operations for maximum concurrency
- Memory-First Design: Optimized for modern hardware
- Smart Caching: Intelligent data caching strategies
🔧 Operational Features
- Cassandra Compatibility: Drop-in replacement for existing applications
- DynamoDB Compatibility: ScyllaDB Cloud supports DynamoDB API
- Multi-DC Support: Geographic distribution and disaster recovery
- Backup & Restore: Point-in-time recovery capabilities
- Monitoring: Built-in metrics and observability
📊 Data Model Features
- Wide-Column Store: Flexible schema design
- Time-Series Support: Optimized for time-based data
- JSON Support: Native JSON data type
- Counter Support: Distributed counters
- TTL Support: Automatic data expiration
Architecture
Core Components
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Client Node │ │ Client Node │ │ Client Node │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
┌─────────────┴─────────────┐
│ Load Balancer │
└─────────────┬─────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
┌───────▼────────┐ ┌───────────▼──────────┐ ┌───────▼────────┐
│ ScyllaDB Node │ │ ScyllaDB Node │ │ ScyllaDB Node │
│ (Data Center) │ │ (Data Center) │ │ (Data Center) │
└────────────────┘ └──────────────────────┘ └────────────────┘
Data Distribution
- Partitioning: Consistent hashing for data distribution
- Replication: Configurable replication factor per keyspace
- Consistency Levels: Tunable consistency for CAP theorem trade-offs
- Snitch: Network topology awareness for optimal routing
Installation
Docker Installation
# Pull ScyllaDB image
docker pull scylladb/scylla:latest
# Run single-node cluster
docker run --name scylla-node \
-p 9042:9042 \
-p 7000:7000 \
-p 7001:7001 \
-p 9160:9160 \
-p 9180:9180 \
-p 10000:10000 \
scylladb/scylla:latest \
--smp 1 --memory 750M --overprovisioned 1
Multi-Node Cluster
# Node 1
docker run --name scylla-node1 \
-p 9042:9042 \
-p 7000:7000 \
-p 7001:7001 \
-p 9160:9160 \
-p 9180:9180 \
-p 10001:10000 \
-e SEEDS="scylla-node1" \
scylladb/scylla:latest \
--smp 1 --memory 750M --overprovisioned 1
# Node 2
docker run --name scylla-node2 \
-p 9043:9042 \
-p 7002:7000 \
-p 7003:7001 \
-p 9161:9160 \
-p 9181:9180 \
-p 10001:10000 \
-e SEEDS="scylla-node1,scylla-node2" \
scylladb/scylla:latest \
--smp 1 --memory 750M --overprovisioned 1
Native Installation (Ubuntu/Debian)
# Add ScyllaDB repository
curl -fsSL https://downloads.scylladb.com/deb/ubuntu/scylladb-2023.1.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/scylladb-2023.1.gpg
echo "deb [arch=amd64] https://downloads.scylladb.com/deb/ubuntu jammy scylladb-2023.1" | sudo tee /etc/apt/sources.list.d/scylladb.list
# Install ScyllaDB
sudo apt update
sudo apt install scylla
# Configure and start
sudo scylla_setup
sudo systemctl start scylla-server
sudo systemctl enable scylla-server
Configuration
Basic Configuration (/etc/scylla/scylla.yaml
)
# Cluster configuration
cluster_name: "MyCluster"
seeds: "192.168.1.10,192.168.1.11,192.168.1.12"
# Network configuration
listen_address: 192.168.1.10
rpc_address: 192.168.1.10
broadcast_address: 192.168.1.10
broadcast_rpc_address: 192.168.1.10
# Performance tuning
num_tokens: 256
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/scylla/data
commitlog_directory: /var/lib/scylla/commitlog
saved_caches_directory: /var/lib/scylla/saved_caches
# Memory configuration
memtable_total_space_in_mb: 2048
memtable_flush_writers: 2
# Compaction configuration
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: false
# Security
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
Advanced Configuration
# Performance optimizations
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
# Memory management
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
# Caching
key_cache_size_in_mb: 100
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
# Logging
logback_conf: /etc/scylla/logback.xml
Data Modeling
Keyspace Creation
-- Create keyspace with replication strategy
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3,
'datacenter2': 2
};
-- Use the keyspace
USE my_keyspace;
Table Design
-- User profiles table
CREATE TABLE users (
user_id uuid PRIMARY KEY,
username text,
email text,
first_name text,
last_name text,
created_at timestamp,
updated_at timestamp
);
-- User sessions with clustering key
CREATE TABLE user_sessions (
user_id uuid,
session_id uuid,
login_time timestamp,
logout_time timestamp,
ip_address inet,
user_agent text,
PRIMARY KEY (user_id, session_id)
);
-- Time-series data
CREATE TABLE sensor_readings (
sensor_id text,
timestamp timestamp,
temperature double,
humidity double,
pressure double,
PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Secondary Indexes
-- Create secondary index
CREATE INDEX ON users (email);
CREATE INDEX ON users (username);
-- Create custom index
CREATE CUSTOM INDEX user_email_idx ON users (email)
USING 'org.apache.cassandra.index.sasi.SASIIndex';
CQL (Cassandra Query Language)
Basic Operations
-- Insert data
INSERT INTO users (user_id, username, email, first_name, last_name, created_at)
VALUES (uuid(), 'john_doe', '[email protected]', 'John', 'Doe', toTimestamp(now()));
-- Select data
SELECT * FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Update data
UPDATE users
SET email = '[email protected]', updated_at = toTimestamp(now())
WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
-- Delete data
DELETE FROM users WHERE user_id = 123e4567-e89b-12d3-a456-426614174000;
Batch Operations
-- Batch insert
BEGIN BATCH
INSERT INTO users (user_id, username, email) VALUES (uuid(), 'user1', '[email protected]');
INSERT INTO users (user_id, username, email) VALUES (uuid(), 'user2', '[email protected]');
INSERT INTO users (user_id, username, email) VALUES (uuid(), 'user3', '[email protected]');
APPLY BATCH;
Aggregation Queries
-- Count users
SELECT COUNT(*) FROM users;
-- Group by with aggregation
SELECT sensor_id, AVG(temperature) as avg_temp, MAX(temperature) as max_temp
FROM sensor_readings
WHERE timestamp > '2023-01-01'
GROUP BY sensor_id;
Application Integration
Python with ScyllaDB
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import uuid
from datetime import datetime
# Connect to ScyllaDB
cluster = Cluster(['localhost'], port=9042)
session = cluster.connect()
# Create keyspace and table
session.execute("""
CREATE KEYSPACE IF NOT EXISTS my_app
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}
""")
session.execute("USE my_app")
session.execute("""
CREATE TABLE IF NOT EXISTS users (
user_id uuid PRIMARY KEY,
username text,
email text,
created_at timestamp
)
""")
# Insert data
user_id = uuid.uuid4()
session.execute("""
INSERT INTO users (user_id, username, email, created_at)
VALUES (%s, %s, %s, %s)
""", (user_id, 'john_doe', '[email protected]', datetime.now()))
# Query data
rows = session.execute("SELECT * FROM users WHERE user_id = %s", (user_id,))
for row in rows:
print(f"User: {row.username}, Email: {row.email}")
Java with ScyllaDB
import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.*;
import java.util.UUID;
public class ScyllaDBExample {
public static void main(String[] args) {
// Connect to ScyllaDB
CqlSession session = CqlSession.builder()
.withKeyspace("my_app")
.build();
// Insert data
UUID userId = UUID.randomUUID();
PreparedStatement insertStmt = session.prepare(
"INSERT INTO users (user_id, username, email, created_at) VALUES (?, ?, ?, ?)"
);
session.execute(insertStmt.bind(
userId,
"john_doe",
"[email protected]",
java.time.Instant.now()
));
// Query data
PreparedStatement selectStmt = session.prepare(
"SELECT * FROM users WHERE user_id = ?"
);
ResultSet rs = session.execute(selectStmt.bind(userId));
for (Row row : rs) {
System.out.println("User: " + row.getString("username"));
}
}
}
Node.js with ScyllaDB
const cassandra = require('cassandra-driver');
const { v4: uuidv4 } = require('uuid');
// Connect to ScyllaDB
const client = new cassandra.Client({
contactPoints: ['localhost'],
localDataCenter: 'datacenter1',
keyspace: 'my_app'
});
async function main() {
await client.connect();
// Insert data
const userId = uuidv4();
const query = 'INSERT INTO users (user_id, username, email, created_at) VALUES (?, ?, ?, ?)';
await client.execute(query, [userId, 'john_doe', 'john@example.com, new Date()], { prepare: true });
// Query data
const selectQuery = 'SELECT * FROM users WHERE user_id = ?';
const result = await client.execute(selectQuery, [userId], { prepare: true });
result.rows.forEach(row => {
console.log(`User: ${row.username}, Email: ${row.email}`);
});
await client.shutdown();
}
main().catch(console.error);
Performance Optimization
Query Optimization
-- Use ALLOW FILTERING sparingly
SELECT * FROM users WHERE email = '[email protected]' ALLOW FILTERING;
-- Use IN clause for multiple partition keys
SELECT * FROM users WHERE user_id IN (uuid1, uuid2, uuid3);
-- Use LIMIT for pagination
SELECT * FROM user_sessions
WHERE user_id = ?
ORDER BY session_id
LIMIT 10;
Indexing Strategies
-- Create materialized views for complex queries
CREATE MATERIALIZED VIEW user_by_email AS
SELECT user_id, username, email, created_at
FROM users
WHERE email IS NOT NULL AND user_id IS NOT NULL
PRIMARY KEY (email, user_id);
-- Use SASI indexes for text search
CREATE CUSTOM INDEX user_name_idx ON users (username)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer'
};
Consistency Levels
from cassandra import ConsistencyLevel
# Strong consistency
session.execute("INSERT INTO users (user_id, username) VALUES (?, ?)",
(user_id, username),
consistency_level=ConsistencyLevel.QUORUM)
# Eventual consistency for better performance
session.execute("INSERT INTO users (user_id, username) VALUES (?, ?)",
(user_id, username),
consistency_level=ConsistencyLevel.ONE)
Monitoring and Maintenance
Health Checks
# Check node status
nodetool status
# Check cluster information
nodetool info
# Check table statistics
nodetool tablestats my_keyspace.users
# Check compaction status
nodetool compactionstats
Backup and Restore
# Create snapshot
nodetool snapshot my_keyspace
# Create incremental backup
nodetool backup my_keyspace
# Restore from snapshot
sstableloader /path/to/snapshot/files
Performance Monitoring
-- Check table metrics
SELECT * FROM system_metrics.table_metrics;
-- Monitor query performance
SELECT * FROM system_traces.sessions;
-- Check cluster health
SELECT * FROM system.local;
Security
Authentication and Authorization
-- Create user
CREATE USER john_doe WITH PASSWORD 'secure_password';
-- Grant permissions
GRANT ALL PERMISSIONS ON KEYSPACE my_keyspace TO john_doe;
GRANT SELECT ON TABLE my_keyspace.users TO john_doe;
-- Create role
CREATE ROLE app_user;
GRANT SELECT, INSERT, UPDATE ON TABLE my_keyspace.users TO app_user;
SSL/TLS Configuration
# Enable SSL
server_encryption_options:
internode_encryption: all
keystore: /etc/scylla/keystore.jks
keystore_password: keystore_password
truststore: /etc/scylla/truststore.jks
truststore_password: truststore_password
client_encryption_options:
enabled: true
optional: false
keystore: /etc/scylla/keystore.jks
keystore_password: keystore_password
Troubleshooting
Common Issues
-
Connection Issues
# Check if ScyllaDB is running
sudo systemctl status scylla-server
# Check network connectivity
telnet localhost 9042 -
Performance Issues
# Check memory usage
nodetool info | grep "Heap Memory"
# Check disk usage
df -h /var/lib/scylla/
# Check compaction status
nodetool compactionstats -
Data Consistency Issues
# Repair data
nodetool repair my_keyspace
# Check data consistency
nodetool scrub my_keyspace
Log Analysis
# Check ScyllaDB logs
sudo tail -f /var/log/scylla/scylla.log
# Check system logs
sudo journalctl -u scylla-server -f
# Check GC logs
sudo tail -f /var/log/scylla/gc.log
Best Practices
Data Modeling
- Design for Queries: Model data based on access patterns
- Avoid Wide Partitions: Keep partition sizes manageable
- Use Appropriate Data Types: Choose efficient data types
- Plan for Growth: Design for future data volume
Performance
- Use Prepared Statements: Avoid query parsing overhead
- Batch Operations: Group related operations
- Monitor Metrics: Track performance indicators
- Tune Consistency: Balance consistency vs performance
Operations
- Regular Backups: Implement automated backup strategies
- Monitor Health: Set up comprehensive monitoring
- Plan Scaling: Design for horizontal scaling
- Test Recovery: Regularly test backup and restore procedures
Resources and References
Official Resources
- Documentation: docs.scylladb.com
- GitHub: github.com/scylladb/scylla
- Community: community.scylladb.com
Related Tools
- ScyllaDB Cloud: Managed ScyllaDB service
- ScyllaDB Manager: Cluster management tool
- ScyllaDB Monitoring: Built-in monitoring stack
- ScyllaDB Tools: Utility tools for operations
Learning Resources
- CQL Reference: Complete CQL documentation
- Performance Tuning Guide: Optimization best practices
- Migration Guide: Cassandra to ScyllaDB migration
- Architecture Guide: Deep dive into ScyllaDB architecture