Metabase
Important Capabilities
Capability | Status | Notes |
---|---|---|
Platform Instance | ✅ | Enabled by default |
Table-Level Lineage | ✅ | Supported by default |
This plugin extracts Charts, dashboards, and associated metadata. This plugin is in beta and has only been tested on PostgreSQL and H2 database.
Collection
/api/collection endpoint is used to retrieve the available collections.
/api/collection/<COLLECTION_ID>/items?models=dashboard endpoint is used to retrieve a given collection and list their dashboards.
Dashboard
/api/dashboard/<DASHBOARD_ID> endpoint is used to retrieve a given Dashboard and grab its information.
- Title and description
- Last edited by
- Owner
- Link to the dashboard in Metabase
- Associated charts
Chart
/api/card endpoint is used to retrieve the following information.
- Title and description
- Last edited by
- Owner
- Link to the chart in Metabase
- Datasource and lineage
The following properties for a chart are ingested in DataHub.
Name | Description |
---|---|
Dimensions | Column names |
Filters | Any filters applied to the chart |
Metrics | All columns that are being used for aggregation |
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[metabase]'
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
connect_uri string | Metabase host URL. Default: localhost:3000 |
database_alias_map object | Database name map to use when constructing dataset URN. |
database_id_to_instance_map map(str,string) | |
default_schema string | Default schema name to use when schema is not provided in an SQL query Default: public |
engine_platform_map map(str,string) | |
password string(password) | Metabase password. |
platform_instance_map map(str,string) | |
username string | Metabase username. |
env string | The environment that all assets produced by this connector belong to Default: PROD |
The JSONSchema for this configuration is inlined below.
{
"title": "MetabaseConfig",
"description": "Any non-Dataset source that produces lineage to Datasets should inherit this class.\ne.g. Orchestrators, Pipelines, BI Tools etc.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance_map": {
"title": "Platform Instance Map",
"description": "A holder for platform -> platform_instance mappings to generate correct dataset urns",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"connect_uri": {
"title": "Connect Uri",
"description": "Metabase host URL.",
"default": "localhost:3000",
"type": "string"
},
"username": {
"title": "Username",
"description": "Metabase username.",
"type": "string"
},
"password": {
"title": "Password",
"description": "Metabase password.",
"type": "string",
"writeOnly": true,
"format": "password"
},
"database_alias_map": {
"title": "Database Alias Map",
"description": "Database name map to use when constructing dataset URN.",
"type": "object"
},
"engine_platform_map": {
"title": "Engine Platform Map",
"description": "Custom mappings between metabase database engines and DataHub platforms",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"database_id_to_instance_map": {
"title": "Database Id To Instance Map",
"description": "Custom mappings between metabase database id and DataHub platform instance",
"type": "object",
"additionalProperties": {
"type": "string"
}
},
"default_schema": {
"title": "Default Schema",
"description": "Default schema name to use when schema is not provided in an SQL query",
"default": "public",
"type": "string"
}
},
"additionalProperties": false
}
Metabase databases will be mapped to a DataHub platform based on the engine listed in the
api/database response. This mapping can be
customized by using the engine_platform_map
config option. For example, to map databases using the athena
engine to
the underlying datasets in the glue
platform, the following snippet can be used:
engine_platform_map:
athena: glue
DataHub will try to determine database name from Metabase api/database
payload. However, the name can be overridden from database_alias_map
for a given database connected to Metabase.
If several platform instances with the same platform (e.g. from several distinct clickhouse clusters) are present in DataHub, the mapping between database id in Metabase and platform instance in DataHub may be configured with the following map:
database_id_to_instance_map:
"42": platform_instance_in_datahub
The key in this map must be string, not integer although Metabase API provides id
as number.
If database_id_to_instance_map
is not specified, platform_instance_map
is used for platform instance mapping. If none of the above are specified, platform instance is not used when constructing urn
when searching for dataset relations.
Compatibility
Metabase version v0.48.3
Code Coordinates
- Class Name:
datahub.ingestion.source.metabase.MetabaseSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Metabase, feel free to ping us on our Slack.