Ask AI

You are viewing an unreleased or outdated version of the documentation

Metadata

Dagster uses metadata to communicate arbitrary user-specified metadata about structured events.

Refer to the Metadata documentation for more information.

class dagster.MetadataValue[source]

Utility class to wrap metadata values passed into Dagster events so that they can be displayed in the Dagster UI and other tooling.

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "my_text_label": "hello",
            "dashboard_url": MetadataValue.url("http://mycoolsite.com/my_dashboard"),
            "num_rows": 0,
        },
    )
static asset(asset_key)[source]

Static constructor for a metadata value referencing a Dagster asset, by key.

For example:

@op
def validate_table(context, df):
    yield AssetMaterialization(
        asset_key=AssetKey("my_table"),
        metadata={
            "Related asset": MetadataValue.asset(AssetKey('my_other_table')),
        },
    )
Parameters:

asset_key (AssetKey) – The asset key referencing the asset.

static bool(value)[source]

Static constructor for a metadata value wrapping a bool as BoolMetadataValuye. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "num rows > 1000": MetadataValue.bool(len(df) > 1000),
        },
    )
Parameters:

value (bool) – The bool value for a metadata entry.

static column_lineage(lineage)[source]

Static constructor for a metadata value wrapping a column lineage as TableColumnLineageMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Parameters:

lineage (TableColumnLineage) – The column lineage for a metadata entry.

static dagster_run(run_id)[source]

Static constructor for a metadata value wrapping a reference to a Dagster run.

Parameters:

run_id (str) – The ID of the run.

static float(value)[source]

Static constructor for a metadata value wrapping a float as FloatMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "size (bytes)": MetadataValue.float(calculate_bytes(df)),
        }
    )
Parameters:

value (float) – The float value for a metadata entry.

static int(value)[source]

Static constructor for a metadata value wrapping an int as IntMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "number of rows": MetadataValue.int(len(df)),
        },
    )
Parameters:

value (int) – The int value for a metadata entry.

static job(job_name, location_name, *, repository_name=None)[source]

Static constructor for a metadata value referencing a Dagster job, by name.

For example:

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset"
        metadata={
            "Producing job": MetadataValue.job('my_other_job'),
        },
    )
Parameters:
  • job_name (str) – The name of the job.

  • location_name (Optional[str]) – The code location name for the job.

  • repository_name (Optional[str]) – The repository name of the job, if different from the default.

static json(data)[source]

Static constructor for a metadata value wrapping a json-serializable list or dict as JsonMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context):
    yield ExpectationResult(
        success=not missing_things,
        label="is_present",
        metadata={
            "about my dataset": MetadataValue.json({"missing_columns": missing_things})
        },
    )
Parameters:

data (Union[Sequence[Any], Mapping[str, Any]]) – The JSON data for a metadata entry.

static md(data)[source]

Static constructor for a metadata value wrapping markdown data as MarkdownMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, md_str):
    yield AssetMaterialization(
        asset_key="info",
        metadata={
            'Details': MetadataValue.md(md_str)
        },
    )
Parameters:

md_str (str) – The markdown for a metadata entry.

static notebook(path)[source]

Static constructor for a metadata value wrapping a notebook path as NotebookMetadataValue.

Example

@op
def emit_metadata(context):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "notebook_path": MetadataValue.notebook("path/to/notebook.ipynb"),
        }
    )
Parameters:

path (str) – The path to a notebook for a metadata entry.

static null()[source]

Static constructor for a metadata value representing null. Can be used as the value type for the metadata parameter for supported events.

static path(path)[source]

Static constructor for a metadata value wrapping a path as PathMetadataValue.

Example

@op
def emit_metadata(context):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "filepath": MetadataValue.path("path/to/file"),
        }
    )
Parameters:

path (str) – The path for a metadata entry.

static python_artifact(python_artifact)[source]

Static constructor for a metadata value wrapping a python artifact as PythonArtifactMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "class": MetadataValue.python_artifact(MyClass),
            "function": MetadataValue.python_artifact(my_function),
        }
    )
Parameters:

value (Callable) – The python class or function for a metadata entry.

static table(records, schema=None)[source]

experimental This API may break in future versions, even between dot releases.

Static constructor for a metadata value wrapping arbitrary tabular data as TableMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context):
    yield ExpectationResult(
        success=not has_errors,
        label="is_valid",
        metadata={
            "errors": MetadataValue.table(
                records=[
                    TableRecord(code="invalid-data-type", row=2, col="name"),
                ],
                schema=TableSchema(
                    columns=[
                        TableColumn(name="code", type="string"),
                        TableColumn(name="row", type="int"),
                        TableColumn(name="col", type="string"),
                    ]
                )
            ),
        },
    )
static table_schema(schema)[source]

Static constructor for a metadata value wrapping a table schema as TableSchemaMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

schema = TableSchema(
    columns = [
        TableColumn(name="id", type="int"),
        TableColumn(name="status", type="bool"),
    ]
)

DagsterType(
    type_check_fn=some_validation_fn,
    name='MyTable',
    metadata={
        'my_table_schema': MetadataValue.table_schema(schema),
    }
)
Parameters:

schema (TableSchema) – The table schema for a metadata entry.

static text(text)[source]

Static constructor for a metadata value wrapping text as TextMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context, df):
    yield AssetMaterialization(
        asset_key="my_dataset",
        metadata={
            "my_text_label": MetadataValue.text("hello")
        },
    )
Parameters:

text (str) – The text string for a metadata entry.

static timestamp(value)[source]

Static constructor for a metadata value wrapping a UNIX timestamp as a TimestampMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Parameters:

value (Union[float, datetime]) – The unix timestamp value for a metadata entry. If a datetime is provided, the timestamp will be extracted. datetimes without timezones are not accepted, because their timestamps can be ambiguous.

static url(url)[source]

Static constructor for a metadata value wrapping a URL as UrlMetadataValue. Can be used as the value type for the metadata parameter for supported events.

Example

@op
def emit_metadata(context):
    yield AssetMaterialization(
        asset_key="my_dashboard",
        metadata={
            "dashboard_url": MetadataValue.url("http://mycoolsite.com/my_dashboard"),
        }
    )
Parameters:

url (str) – The URL for a metadata entry.

abstract property value

The wrapped value.

class dagster.MetadataEntry(label, description=None, entry_data=None, value=None)[source]

deprecated This API will be removed in version 2.0.

Please use a dict with MetadataValue values instead..

A structure for describing metadata for Dagster events.

Note

This class is no longer usable in any Dagster API, and will be completely removed in 2.0.

Lists of objects of this type can be passed as arguments to Dagster events and will be displayed in the Dagster UI and other tooling.

Should be yielded from within an IO manager to append metadata for a given input/output event. For other event types, passing a dict with MetadataValue values to the metadata argument is preferred.

Parameters:
  • label (str) – Short display label for this metadata entry.

  • description (Optional[str]) – A human-readable description of this metadata entry.

  • value (MetadataValue) – Typed metadata entry data. The different types allow for customized display in tools like the Dagster UI.

Metadata types

All metadata types inherit from MetadataValue. The following types are defined:

class dagster.DagsterAssetMetadataValue(asset_key)[source]

Representation of a dagster asset.

Parameters:

asset_key (AssetKey) – The dagster asset key

property value

The wrapped AssetKey.

Type:

AssetKey

class dagster.DagsterRunMetadataValue(run_id)[source]

Representation of a dagster run.

Parameters:

run_id (str) – The run id

property value

The wrapped run id.

Type:

str

class dagster.FloatMetadataValue(value)[source]

Container class for float metadata entry data.

Parameters:

value (Optional[float]) – The float value.

class dagster.IntMetadataValue(value)[source]

Container class for int metadata entry data.

Parameters:

value (Optional[int]) – The int value.

class dagster.JsonMetadataValue(data)[source]

Container class for JSON metadata entry data.

Parameters:

data (Union[Sequence[Any], Dict[str, Any]]) – The JSON data.

property value

The wrapped JSON data.

Type:

Optional[Union[Sequence[Any], Dict[str, Any]]]

class dagster.MarkdownMetadataValue(md_str)[source]

Container class for markdown metadata entry data.

Parameters:

md_str (Optional[str]) – The markdown as a string.

property value

The wrapped markdown as a string.

Type:

Optional[str]

class dagster.PathMetadataValue(path)[source]

Container class for path metadata entry data.

Parameters:

path (Optional[str]) – The path as a string or conforming to os.PathLike.

property value

The wrapped path.

Type:

Optional[str]

class dagster.NotebookMetadataValue(path)[source]

Container class for notebook metadata entry data.

Parameters:

path (Optional[str]) – The path to the notebook as a string or conforming to os.PathLike.

property value

The wrapped path to the notebook as a string.

Type:

Optional[str]

class dagster.PythonArtifactMetadataValue(module, name)[source]

Container class for python artifact metadata entry data.

Parameters:
  • module (str) – The module where the python artifact can be found

  • name (str) – The name of the python artifact

property value

Identity function.

Type:

PythonArtifactMetadataValue

class dagster.TableColumnLineageMetadataValue(column_lineage)[source]

Representation of the lineage of column inputs to column outputs of arbitrary tabular data.

Parameters:

column_lineage (TableColumnLineage) – The lineage of column inputs to column outputs for the table.

property value

The wrapped TableSpec.

Type:

TableSpec

class dagster.TableMetadataValue(records, schema)[source]

experimental This API may break in future versions, even between dot releases.

Container class for table metadata entry data.

Parameters:
  • records (TableRecord) – The data as a list of records (i.e. rows).

  • schema (Optional[TableSchema]) – A schema for the table.

Example

from dagster import TableMetadataValue, TableRecord

TableMetadataValue(
    schema=None,
    records=[
        TableRecord({"column1": 5, "column2": "x"}),
        TableRecord({"column1": 7, "column2": "y"}),
    ]
)
static infer_column_type(value)[source]

str: Infer the TableSchema column type that will be used for a value.

property value

Identity function.

Type:

TableMetadataValue

class dagster.TableSchemaMetadataValue(schema)[source]

Representation of a schema for arbitrary tabular data.

Parameters:

schema (TableSchema) – The dictionary containing the schema representation.

property value

The wrapped TableSchema.

Type:

TableSchema

class dagster.TextMetadataValue(text)[source]

Container class for text metadata entry data.

Parameters:

text (Optional[str]) – The text data.

property value

The wrapped text data.

Type:

Optional[str]

class dagster.TimestampMetadataValue(value)[source]

Container class for metadata value that’s a unix timestamp.

Parameters:

value (float) – Seconds since the unix epoch.

class dagster.UrlMetadataValue(url)[source]

Container class for URL metadata entry data.

Parameters:

url (Optional[str]) – The URL as a string.

property value

The wrapped URL.

Type:

Optional[str]

class dagster.CodeReferencesMetadataValue(*, code_references)[source]

experimental This API may break in future versions, even between dot releases.

Metadata value type which represents source locations (locally or otherwise) of the asset in question. For example, the file path and line number where the asset is defined.

sources

A list of code references for the asset, such as file locations or references to source control.

Type:

List[Union[LocalFileCodeReference, SourceControlCodeReference]]

Tables

These APIs provide the ability to express column schemas (TableSchema), rows/records (TableRecord), and column lineage (TableColumnLineage) in Dagster as metadata.

class dagster.TableRecord(data)[source]

experimental This API may break in future versions, even between dot releases.

Represents one record in a table. Field keys are arbitrary strings– field values must be strings, integers, floats, or bools.

class dagster.TableSchema(columns, constraints=None)[source]

Representation of a schema for tabular data.

Schema is composed of two parts:

  • A required list of columns (TableColumn). Each column specifies a name, type, set of constraints, and (optional) description. type defaults to string if unspecified. Column constraints (TableColumnConstraints) consist of boolean properties unique and nullable, as well as a list of strings other containing string descriptions of all additional constraints (e.g. “<= 5”).

  • An optional list of table-level constraints (TableConstraints). A table-level constraint cannot be expressed in terms of a single column, e.g. col a > col b. Presently, all table-level constraints must be expressed as strings under the other attribute of a TableConstraints object.

# example schema
TableSchema(
    constraints = TableConstraints(
        other = [
            "foo > bar",
        ],
    ),
    columns = [
        TableColumn(
            name = "foo",
            type = "string",
            description = "Foo description",
            constraints = TableColumnConstraints(
                nullable = False,
                other = [
                    "starts with the letter 'a'",
                ],
            ),
        ),
        TableColumn(
            name = "bar",
            type = "string",
        ),
        TableColumn(
            name = "baz",
            type = "custom_type",
            constraints = TableColumnConstraints(
                unique = True,
            )
        ),
    ],
)
Parameters:
static from_name_type_dict(name_type_dict)[source]

Constructs a TableSchema from a dictionary whose keys are column names and values are the names of data types of those columns.

class dagster.TableConstraints(other)[source]

Descriptor for “table-level” constraints. Presently only one property, other is supported. This contains strings describing arbitrary table-level constraints. A table-level constraint is a constraint defined in terms of multiple columns (e.g. col_A > col_B) or in terms of rows.

Parameters:

other (List[str]) – Descriptions of arbitrary table-level constraints.

class dagster.TableColumn(name, type='string', description=None, constraints=None, tags=None)[source]

Descriptor for a table column. The only property that must be specified by the user is name. If no type is specified, string is assumed. If no constraints are specified, the column is assumed to be nullable (i.e. required = False) and have no other constraints beyond the data type.

Parameters:
  • name (List[str]) – Descriptions of arbitrary table-level constraints.

  • type (Optional[str]) – The type of the column. Can be an arbitrary string. Defaults to “string”.

  • description (Optional[str]) – Description of this column. Defaults to None.

  • constraints (Optional[TableColumnConstraints]) – Column-level constraints. If unspecified, column is nullable with no constraints.

  • tags (Optional[Mapping[str, str]]) – Tags for filtering or organizing columns.

class dagster.TableColumnConstraints(nullable=True, unique=False, other=None)[source]

Descriptor for a table column’s constraints. Nullability and uniqueness are specified with boolean properties. All other constraints are described using arbitrary strings under the other property.

Parameters:
  • nullable (Optional[bool]) – If true, this column can hold null values.

  • unique (Optional[bool]) – If true, all values in this column must be unique.

  • other (List[str]) – Descriptions of arbitrary column-level constraints not expressible by the predefined properties.

class dagster.TableColumnLineage(deps_by_column)[source]

experimental This API may break in future versions, even between dot releases.

Represents the lineage of column outputs to column inputs for a tabular asset.

Parameters:

deps_by_column (Mapping[str, Sequence[TableColumnDep]]) – A mapping from column names to the columns that the column depends on.

Examples

Defining column lineage at materialization time, where the resulting asset has two columns, new_column_foo and new_column_qux. The first column, new_column_foo, depends on column_bar in source_bar and column_baz in source_baz. The second column, new_column_qux, depends on column_quuz in source_bar.

from dagster import (
    AssetKey,
    MaterializeResult,
    TableColumnDep,
    TableColumnLineage,
    asset,
)


@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")])
def my_asset():
    yield MaterializeResult(
        metadata={
            "dagster/column_lineage": TableColumnLineage(
                deps_by_column={
                    "new_column_foo": [
                        TableColumnDep(
                            asset_key=AssetKey("source_bar"),
                            column_name="column_bar",
                        ),
                        TableColumnDep(
                            asset_key=AssetKey("source_baz"),
                            column_name="column_baz",
                        ),
                    ],
                    "new_column_qux": [
                        TableColumnDep(
                            asset_key=AssetKey("source_bar"),
                            column_name="column_quuz",
                        ),
                    ],
                }
            )
        }
    )
class dagster.TableColumnDep(asset_key, column_name)[source]

experimental This API may break in future versions, even between dot releases.

Object representing an identifier for a column in an asset.

Code references

The following functions are used to attach source code references to your assets. For more information, refer to the Linking to asset definition code with code references guide.

dagster.with_source_code_references(assets_defs)[source]

experimental This API may break in future versions, even between dot releases.

Wrapper function which attaches local code reference metadata to the provided asset definitions. This points to the filepath and line number where the asset body is defined.

Parameters:

assets_defs (Sequence[Union[AssetsDefinition, SourceAsset, CacheableAssetsDefinition]]) – The asset definitions to which source code metadata should be attached.

Returns:

The asset definitions with source code metadata attached.

Return type:

Sequence[AssetsDefinition]

experimental This API may break in future versions, even between dot releases.

Wrapper function which converts local file path code references to source control URLs based on the provided source control URL and branch.

Parameters:
  • assets_defs (Sequence[Union[AssetsDefinition, SourceAsset, CacheableAssetsDefinition]]) – The asset definitions to which source control metadata should be attached. Only assets with local file code references (such as those created by with_source_code_references) will be converted.

  • git_url (str) – The base URL for the source control system. For example, “https://github.com/dagster-io/dagster”.

  • git_branch (str) – The branch in the source control system, such as “master”.

  • file_path_mapping (FilePathMapping) – Specifies the mapping between local file paths and their corresponding paths in a source control repository. Simple usage is to provide a AnchorBasedFilePathMapping instance, which specifies an anchor file in the repository and the corresponding local file path, which is extrapolated to all other local file paths. Alternatively, a custom function can be provided which takes a local file path and returns the corresponding path in the repository, allowing for more complex mappings.

Example

defs = Definitions(
    assets=link_code_references_to_git(
        with_source_code_references([my_dbt_assets]),
        git_url="https://github.com/dagster-io/dagster",
        git_branch="master",
        file_path_mapping=AnchorBasedFilePathMapping(
            local_file_anchor=Path(__file__),
            file_anchor_path_in_repository="python_modules/my_module/my-module/__init__.py",
        ),
    )
)
class dagster.FilePathMapping(*args, **kwargs)[source]

experimental This API may break in future versions, even between dot releases.

Base class which defines a file path mapping function. These functions are used to map local file paths to their corresponding paths in a source control repository.

In many cases where a source control repository is reproduced exactly on a local machine, the included AnchorBasedFilePathMapping class can be used to specify a direct mapping between the local file paths and the repository paths. However, in cases where the repository structure differs from the local structure, a custom mapping function can be provided to handle these cases.

abstract convert_to_source_control_path(local_path)[source]

Maps a local file path to the corresponding path in a source control repository.

Parameters:

local_path (Path) – The local file path to map.

Returns:

The corresponding path in the hosted source control repository, relative to the repository root.

Return type:

str

class dagster.AnchorBasedFilePathMapping(local_file_anchor, file_anchor_path_in_repository)[source]

experimental This API may break in future versions, even between dot releases.

Specifies the mapping between local file paths and their corresponding paths in a source control repository, using a specific file “anchor” as a reference point. All other paths are calculated relative to this anchor file.

For example, if the chosen anchor file is /Users/dagster/Documents/python_modules/my_module/my-module/__init__.py locally, and python_modules/my_module/my-module/__init__.py in a source control repository, in order to map a different file /Users/dagster/Documents/python_modules/my_module/my-module/my_asset.py to the repository path, the mapping function will position the file in the repository relative to the anchor file’s position in the repository, resulting in python_modules/my_module/my-module/my_asset.py.

Parameters:
  • local_file_anchor (Path) – The path to a local file that is present in the repository.

  • file_anchor_path_in_repository (str) – The path to the anchor file in the repository.

Example

mapping_fn = AnchorBasedFilePathMapping(
    local_file_anchor=Path(__file__),
    file_anchor_path_in_repository="python_modules/my_module/my-module/__init__.py",
)
convert_to_source_control_path(local_path)[source]

Maps a local file path to the corresponding path in a source control repository based on the anchor file and its corresponding path in the repository.

Parameters:

local_path (Path) – The local file path to map.

Returns:

The corresponding path in the hosted source control repository, relative to the repository root.

Return type:

str