Tableau Course
Extracts in Tableau
An extract saves a compressed snapshot of your data as a local .hyper file — giving you faster query performance, offline access, and the ability to reduce the data Tableau loads by filtering and aggregating at connection time.
Live Connection vs Extract
Every data connection in Tableau is either Live or an Extract. With a Live connection, every interaction in your worksheet sends a fresh query to the source — the database, the Excel file, the Google Sheet. The data is always current, but every query must wait for the source to respond.
An Extract takes a point-in-time copy of the data and stores it in Tableau's own highly optimised columnar format (the .hyper file). Queries run against the local .hyper file instead of the original source, which is typically much faster — especially for large datasets or slow database connections. The trade-off is that the extract reflects the data as it was when last refreshed, not the current live state.
Creating an Extract
Data Source Tab — Live vs Extract Toggle Mockup
Extract Data Dialog Options
Clicking Edit before creating the extract opens a dialog with four key settings that control how much data is included in the .hyper file:
| Option | What it does | When to use it |
|---|---|---|
| Filters | Restricts which rows are included in the extract — e.g. only rows where Year = 2024 | When you only need a subset of the data to keep the extract small and fast |
| Aggregation | Pre-aggregates the data at a specified level of detail before storing | When the source has millions of rows but you only ever analyse summarised totals |
| Number of Rows | Limits the extract to a top N or sampled set of rows | During development — build and test with a sample, switch to full data for production |
| Hide Unused Fields | Excludes columns that are hidden in the workbook from the .hyper file | When the source has many columns you do not need — reduces extract file size significantly |
Refreshing an Extract
Once an extract exists, the data inside it does not update automatically. To bring it up to date you must refresh it. There are two refresh types:
Deletes the existing .hyper file entirely and rebuilds it from scratch by re-querying the source. Every row in the source is re-imported. Use this when the source data has deletions or corrections — not just new rows added.
Adds only new rows to the existing .hyper file based on a key field — typically an ID or date that increases over time. Faster than a full refresh for large datasets where only new records are appended.
To trigger a manual refresh in Tableau Desktop, go to Data → Extract → Refresh. On Tableau Server or Tableau Cloud, you can schedule automatic refreshes so the extract updates on a set cadence — hourly, daily, or weekly.
The .twbx File and Embedded Extracts
When you save a workbook as a .twbx (Tableau Packaged Workbook), the .hyper extract file is bundled inside it alongside the workbook definition. This makes the .twbx completely self-contained — the recipient can open it and interact with the full data without having access to the original source. This is the standard way to share Tableau workbooks with people outside your organisation or without database access.
A .twb file does not include the extract — it only stores the workbook instructions and a pointer to the data source location. If the .hyper file moves or the database is unavailable, the .twb workbook will fail to load.
The extract vs live decision is one of the most practical choices you will make in every Tableau project. A simple rule: if your source has more than 500,000 rows or the database is slow, start with an extract. If your stakeholders need data that is current to the minute, use live. For most business reporting — dashboards that update daily or weekly — an extract on a scheduled refresh is the right default. The .hyper format is exceptionally fast for aggregations and filters, so dashboards that feel sluggish on a live connection often become instant after switching to an extract. Try it on any workbook that feels slow before reaching for other performance optimisations.
Practice Questions
1. What file extension does Tableau use when saving an extract to disk?
2. Which refresh type adds only new rows to an existing extract based on a key field, rather than rebuilding from scratch?
3. Which Tableau file format bundles the workbook and the extract together into a single self-contained file that can be shared without access to the original source?
Quiz
1. A workbook connected to a large slow database takes several seconds to render every chart interaction. Which connection type should you switch to in order to improve performance?
2. The source data has had rows corrected and some deleted since the last extract was created. Which refresh type should you use?
3. What are the four main options available in the Extract Data dialog that control how much data is included in the .hyper file?
Next up — Lesson 19: Data source filters — restricting data at the connection level so filtered rows never enter Tableau at all.