Documentation Index
Fetch the complete documentation index at: https://cubed3-docs-cub-2416-update-semantic-snowflake-semantic-vie.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
The Performance page in Cube Cloud displays charts that help
analyze the performance of your deployment and fine-tune its configuration.
It’s recommended to review Performance Insights when the workload changes
or if you face any performance-related issues with your deployment.
Charts provide insights into different aspects of your deployment.
API instances
The API instances chart shows the number of API instances
that served queries to the deployment.
You can use this chart to fine-tune the
auto-scaling configuration of API instances, e.g.,
increase the minimum and maximum number of API instances.
For example, the following chart shows a deployment with sane auto-scaling
limits that don’t need adjusting. It looks like the deployment needs to
sustain just a few infrequent load bursts per day and auto-scaling to 3 API
instances does the job just fine:
The next chart shows a deployment with auto-scaling limits that definitely
need an adjustment. It looks like the load is so high that most of the time
this deployment has to use at least 4-6 API instances. So, it would be wise
to increase the minimum auto-scaling limit to 6 API instances:
When in doubt, consider using a higher minimum auto-scaling limit: when an
additional API instance starts, it needs some time to compile the data model
before it would be able to serve the requests. So, over-provisioning API
instances with a higher minimum auto-scaling limit would allow to decrease
the number of requests that had to wait for the data model
compilation.
Also, you can use this chart to fine-tune the
auto-suspension configuration, e.g., by turning
auto-suspension off or increasing the auto-suspension threshold.
For example, the following chart shows a Shared
deployment that is only accessed a few times
a day and automatically suspends after a short period of inactivity:
The next chart shows a misconfigured Dedicated
deployment that serves the requests throughout the whole
day but was configured to auto-suspend with a tiny threshold:
Cache type
The Requests by cache type chart shows the number of API
requests that were fulfilled by specific cache types,
e.g., pre-aggregations, in-memory cache, no cache, etc. For example, the
following chart shows a deployment that fulfills about 50% of requests by
using pre-aggregations:
The Avg. response time by cache type shows the difference
in the response time for requests that hit pre-aggregations, in-memory cache,
or no cache (i.e., the upstream data source). The next chart shows that
pre-aggregations usually provide sub-second response times while queries to
the data source take much longer:
You can use these charts to see if you’d like to have more queries that hit
the cache and have lower response time. In that case, consider adding more
pre-aggregations in Cube Store or fine-tune the
existing ones, e.g., by using indexes to speed up
pre-aggregations with suboptimal query plans.
Data model compilation
The Requests by data model compilation chart shows the
number of API requests that had or had not to wait for the data model
compilation. For example, the following chart shows a deployment that
only has a tiny fraction of requests that require the data model to be
compiled:
The Wait time for data model compilation chart
shows the total time requests had to wait for the data model compilation.
The next chart shows that at certain points of time requests had to wait
dozens of seconds while the data model was being compiled:
You can use these charts to fine-tune the auto-suspension
configuration (e.g., turn it off or increase the threshold so that API
instances suspend less frequently), identify multitenancy
misconfiguration (e.g., suboptimal bucketing via
context_to_app_id), or
consider using a Multi-cluster deployment to
distribute requests to different tenants over a number of Dedicated
deployments.
Cube Store
The Saturation for queries by Cube Store workers chart
shows if Cube Store workers are overloaded with serving queries.
High saturation for queries prevents Cube Store workers from fulfilling
requests and results in wait time displayed at the Wait time for
queries by Cube Store workers chart.
For example, the following chart shows a deployment that uses 4 Cube Store
workers and almost never lets them come to saturation, resulting in no wait
time for queries:
Similarly, the Saturation for jobs by Cube Store workers
and Wait time for jobs by Cube Store workers charts show if
Cube Store Workers are overloaded with serving jobs, i.e., building
pre-aggregations or performing internal tasks such as data compaction.
For example, the following chart shows a misconfigured deployment that uses
8 Cube Store workers and keeps them at full saturation during prolonged
intervals, resulting in huge wait time and, in case of jobs, delayed refresh
of pre-aggregations:
The next chart shows that oversaturated Cube Store workers might yield
hours of wait time for queries and jobs:
You can use these charts to fine-tune the number of Cube Store
workers used by your deployment, e.g.,
increase it until you see that there’s no saturation and no wait time
for queries and jobs.