Cosine: A Cloud-Cost Optimized NoSQL Storage Engine
Overview
Welcome to this demonstration where we present a self-designing key-value storage engine, Cosine, which can always take the shape of the close to “perfect” engine architecture given an input workload, a cloud budget, a target performance, and required cloud SLAs. By identifying and formalizing the first principles of storage engine layouts and core key-value algorithms, Cosine constructs a massive design space comprising of sextillion (10^36) possible storage engine designs over a diverse space of hardware and cloud pricing policies for three cloud providers – AWS, GCP, and Azure. Cosine spans across diverse designs such as Log-Structured Merge-trees, B-trees, Log-Structured Hash-tables, in-memory accelerators for filters and indexes as well as trillions of hybrid designs that do not appear in the literature or industry but emerge as valid combinations of the above.
At its core, Cosine includes a unified distribution-aware I/O model and a learned concurrency-aware CPU model that with high accuracy can calculate the performance and cloud cost of any possible design on any workload and virtual machines. Cosine can then search through that space in a matter of seconds to find the best design.
Try Out Cosine
— Let Cosine design the best storage engine for you, and decide which cloud provider and VMs you should use.
Cloud Provider
+
Hardware
+
Data Structure
⟶
Design Steps
1
Set inputs
2
Set SLA
3
Click the button to continue
3
Check suggested storage engine designs
4
Compare with existing storage engines
5
Like to explore more? Switch to Interactive Mode using the tab above
Service Level Agreement
Parameters
Requirements
Description
This is offered as a cloud service that you can purchase on top of the core storage and computing resources. You can subscribe to this service to migrate your data from one VM type to another as needed once per month.
This constitutes additional software solutions that you can deploy on top of your core data store. This includes building and testing, automated code-deploy, version control, and custom analytics.
This service includes a monthly backup of your storage charged on a per-GB basis.
This refers to the promised percentage of time during which the the VMs of an application are promised to be up and running. The values specified to the left indicate the availability requirements in terms of monthly uptime percentage. For all providers that do not meet your availability requirements, we will exclude them from the optimal configurations.
This parameter refers to the durability guarantees measured as the number of 9’s after the decimal of 99. For all providers that do not meet your durability requirements, we will exclude them from the optimal configurations.
Inputs (The mandatory inputs are indicated by *)
data*
Total data items to store# Entries
Key-Value pair size in bytes (e.g., 16).Entry size (bytes)
Key size in bytes (e.g., 8).Key size (bytes)
You can manually enter data specifications or alternatively upload a data file. A sample file is here.
0%
workload*
No. of operations in the workloadNumber of queries
Proportion of Lookup OperationsLookups %
Proportion of Existing Point Lookup OperationPoint Lookups %
Proportion of Non-result Point Lookup OperationZero result Point Lookups %
Proportion of Write OperationsWrites %
Proportion of Inserts in the Workload Inserts %
Proportion of Blind Updates in the Workload Blind Updates %
Proportion of Read Modify Updates in the Workload Read Modify Updates %
Proportion of Range Query OperationsRange Queries %
Proportion of Short Range Lookup OperationShort Range Lookup %
Non-Empty Range Lookup %
Empty Range Lookup %
Target Range Size
You can manually enter workload specifications or alternatively upload a workload file. A sample file is here.
0%
budget*
The total amount of money you are willing to spend on the cloud to run this workload.Budget
performance
The target performance in terms of latency that you want to achieve for this workload.Latency (hours)
cloud
CrimsonDB: Design Continuum
GCP AWS Azure
Growth Factor (T)
Hot merge threshold (K)
Cold merge threshold (Z)
Levels (L)
Mf (bits per entry)
Buffer size (MB)
Write
Range Lookup
Point Lookup
Zero-result Lookup
Storage Space
Memory Footprint
Throughput
Design Storage Engine
0%
Interactive Mode: Off
Cosine Configuration vs Existing SystemsThis mode only shows the optimal design configuration and enables comparison with existing storage engines. It does not allow interactive questioning.
Interactive What-If DesignThis mode enables users to ask diverse what-if questions and check resulting configurations on the fly.
Statistical Analysis
This mode supports analysis of statistics over the entire search space of the input workload.
Top % cloud provider
cheapest budget
Break-up of I/O cost
Data structure participation
Top performance
Discovery of hybrid designs
Performance improvement over existing engines
Cost coverage
Latency coverage
Interactive Mode: Off
Compare with existing systems: Off
Compare with existing systems: Off
Question Panel
Design Steps
1
Select a what-if question from the question panel
2
Click the button Check out the new results in the interaction panel
3
Turn on the radio button for "Explore More Designs"
What-If Questions
by
$
to
by
%
to
%
to
%
Statistical queries
of
%
Interactive Panel
Existing Storage Engines
Existing Systems
Off
On
RocksDB
WiredTiger
FASTER-A
FASTER-H
Explore More Designs: Off
Off
On
Browse the configuration continuum of the top designs within your budget