Tableau Lesson 18 – Extracts | Dataplexa
Section II — Lesson 18

Extracts in Tableau

An extract saves a compressed snapshot of your data as a local .hyper file — giving you faster query performance, offline access, and the ability to reduce the data Tableau loads by filtering and aggregating at connection time.

Live Connection vs Extract

Every data connection in Tableau is either Live or an Extract. With a Live connection, every interaction in your worksheet sends a fresh query to the source — the database, the Excel file, the Google Sheet. The data is always current, but every query must wait for the source to respond.

An Extract takes a point-in-time copy of the data and stores it in Tableau's own highly optimised columnar format (the .hyper file). Queries run against the local .hyper file instead of the original source, which is typically much faster — especially for large datasets or slow database connections. The trade-off is that the extract reflects the data as it was when last refreshed, not the current live state.

Live Connection
Always shows current data
No refresh needed
Query speed depends on source
Requires network access at all times
Extract (.hyper)
Fast queries — local columnar storage
Works offline — no source needed after creation
Data is only as fresh as the last refresh
Requires scheduled or manual refreshes

Creating an Extract

1
On the Data Source tab, locate the connection toggle in the top-right corner. Switch it from Live to Extract. The Extract radio button activates and an Edit link appears beside it.
2
Optionally click Edit to open the Extract Data dialog. Here you can add extract filters, aggregate data, choose a number of rows to sample, or hide unused fields to reduce the extract size. Click OK when done.
3
Click Sheet 1 or any worksheet tab. Tableau prompts you to save the extract as a .hyper file. Choose a location and click Save. Tableau queries the source, builds the .hyper file, and connects to it — all subsequent queries run against the extract.
4
A small cylinder icon appears on the data source name in the Data pane — this indicates an extract is active. The connection toggle on the Data Source tab also shows Extract selected.

Data Source Tab — Live vs Extract Toggle Mockup

Data Source Tab — Connection Toggle
Superstore (Excel)
Live
Extract Edit
🗄️ Extract saved: Superstore.hyper — 9,994 rows · Last refreshed: today

Extract Data Dialog Options

Clicking Edit before creating the extract opens a dialog with four key settings that control how much data is included in the .hyper file:

Option What it does When to use it
Filters Restricts which rows are included in the extract — e.g. only rows where Year = 2024 When you only need a subset of the data to keep the extract small and fast
Aggregation Pre-aggregates the data at a specified level of detail before storing When the source has millions of rows but you only ever analyse summarised totals
Number of Rows Limits the extract to a top N or sampled set of rows During development — build and test with a sample, switch to full data for production
Hide Unused Fields Excludes columns that are hidden in the workbook from the .hyper file When the source has many columns you do not need — reduces extract file size significantly

Refreshing an Extract

Once an extract exists, the data inside it does not update automatically. To bring it up to date you must refresh it. There are two refresh types:

Full Refresh

Deletes the existing .hyper file entirely and rebuilds it from scratch by re-querying the source. Every row in the source is re-imported. Use this when the source data has deletions or corrections — not just new rows added.

Incremental Refresh

Adds only new rows to the existing .hyper file based on a key field — typically an ID or date that increases over time. Faster than a full refresh for large datasets where only new records are appended.

To trigger a manual refresh in Tableau Desktop, go to Data → Extract → Refresh. On Tableau Server or Tableau Cloud, you can schedule automatic refreshes so the extract updates on a set cadence — hourly, daily, or weekly.

The .twbx File and Embedded Extracts

When you save a workbook as a .twbx (Tableau Packaged Workbook), the .hyper extract file is bundled inside it alongside the workbook definition. This makes the .twbx completely self-contained — the recipient can open it and interact with the full data without having access to the original source. This is the standard way to share Tableau workbooks with people outside your organisation or without database access.

A .twb file does not include the extract — it only stores the workbook instructions and a pointer to the data source location. If the .hyper file moves or the database is unavailable, the .twb workbook will fail to load.

📌 Teacher's Note

The extract vs live decision is one of the most practical choices you will make in every Tableau project. A simple rule: if your source has more than 500,000 rows or the database is slow, start with an extract. If your stakeholders need data that is current to the minute, use live. For most business reporting — dashboards that update daily or weekly — an extract on a scheduled refresh is the right default. The .hyper format is exceptionally fast for aggregations and filters, so dashboards that feel sluggish on a live connection often become instant after switching to an extract. Try it on any workbook that feels slow before reaching for other performance optimisations.

Practice Questions

1. What file extension does Tableau use when saving an extract to disk?

2. Which refresh type adds only new rows to an existing extract based on a key field, rather than rebuilding from scratch?

3. Which Tableau file format bundles the workbook and the extract together into a single self-contained file that can be shared without access to the original source?

Quiz

1. A workbook connected to a large slow database takes several seconds to render every chart interaction. Which connection type should you switch to in order to improve performance?


2. The source data has had rows corrected and some deleted since the last extract was created. Which refresh type should you use?


3. What are the four main options available in the Extract Data dialog that control how much data is included in the .hyper file?


Next up — Lesson 19: Data source filters — restricting data at the connection level so filtered rows never enter Tableau at all.