tracdap.rt.api

Package Contents

Classes

TracContext

Interface that allows model components to interact with the platform at runtime

TracModel

Base class that model components inherit from to be recognised by the platform

Functions

A(attr_name, attr_value[, attr_type, categorical])

Shorthand alias for define_attribute()

F(field_name, field_type, label[, business_key, ...])

Shorthand alias for define_field()

P(param_name, param_type, label[, default_value])

Shorthand alias for define_parameter()

declare_field(field_name, field_type, label[, ...])

Deprecated since version 0.4.4.

declare_input_table(*fields)

Deprecated since version 0.4.4.

declare_output_table(*fields)

Deprecated since version 0.4.4.

declare_parameter(param_name, param_type, label[, ...])

Deprecated since version 0.4.4.

declare_parameters(*params)

Deprecated since version 0.4.4.

define_attribute(attr_name, attr_value[, attr_type, ...])

Define an individual model attribute

define_attributes(*attrs)

Defined a set of attributes to catalogue and describe a model

define_field(field_name, field_type, label[, ...])

Define the schema for an individual field, which can be used in a model input or output schema.

define_input_table(*fields)

Define a model input with a table schema.

define_output_table(*fields)

Define a model output with a table schema.

define_parameter(param_name, param_type, label[, ...])

Define an individual model parameter

define_parameters(*params)

Defined all the parameters used by a model

define_schema(*fields[, schema_type])

Create a SchemaDefinition from a list of fields.

load_schema(package, schema_file[, schema_type])

load a SchemaDefinition from a CSV file or package resource.

class tracdap.rt.api.TracContext

Interface that allows model components to interact with the platform at runtime

TRAC supplies every model with a context when the model is run. The context allows models to access parameters, inputs, outputs and schemas, as well as other resources such as the Spark context (if the model is using Spark) and model logs.

TRAC guarantees that everything defined in the model (parameters, inputs and outputs) will be available in the context when the model is running. So, if a model defines a parameter called “param1” as an integer, the model will be able to call get_parameter(“param1”) and will receive an integer value.

When a model is running on a production deployment of the TRAC platform, parameters, inputs and outputs will be supplied by TRAC as part of the job. These could be coming from entries selected by a user in the UI or settings configured as part of a scheduled task. To develop models locally, a job config file can be supplied (typically in YAML or JSON) to set the required parameters, inputs and output locations. In either case, TRAC will validate the supplied configuration against the model definition to make sure the context always includes exactly what the model requires.

All the context API methods are validated at runtime and will raise ERuntimeValidation if a model tries to access an unknown identifier or perform some other invalid operation.

See also

TracModel

abstract get_pandas_table(dataset_name, use_temporal_objects=None)

Get the data for a model input or output as a Pandas dataframe

The data for both inputs and outputs can be retrieved as a Pandas dataframe using this method. Inputs must be defined in TracModel.define_inputs() and outputs in TracModel.define_outputs(). Input and output names are case-sensitive.

The TRAC runtime will handle loading the data and assembling it into a Pandas dataframe. This may happen before the model runs or when a dataset is requested. Models should take care not to request very large datasets as Pandas tables, doing so is likely to cause a memory overflow. Use get_spark_table() instead to work with big data.

Model inputs are always available and can be queried by this method. Outputs are only available after they have been saved to the context using put_pandas_table() (or another put_XXX_table method). Attempting to retrieve an output before it has been saved will cause a runtime validation error.

Attempting to retrieve a dataset that is not defined as a model input or output will result in a runtime validation error, even if that dataset exists in the job config and is used by other models.

Parameters:
  • dataset_name (str) – The name of the model input or output to get data for

  • use_temporal_objects (Optional[bool]) – Use Python objects for date/time fields instead of the NumPy datetime64 type

Returns:

A pandas dataframe containing the data for the named dataset

Raises:

ERuntimeValidation

Return type:

pandas.DataFrame

abstract get_parameter(parameter_name)

Get the value of a model parameter

Model parameters defined in TracModel.define_parameters() can be retrieved at runtime by this method. Values are returned as native Python types. Parameter names are case-sensitive.

Attempting to retrieve parameters not defined by the model will result in a runtime validation error, even if those parameters are supplied in the job config and used by other models.

Parameters:

parameter_name (str) – The name of the parameter to get

Returns:

The parameter value, as a native Python data type

Raises:

ERuntimeValidation

Return type:

Any

abstract get_schema(dataset_name)

Get the schema of a model input or output

The schema of an input or output can be retrieved and examined at runtime using this method. Inputs must be defined in TracModel.define_inputs() and outputs in TracModel.define_outputs(). Input and output names are case-sensitive.

In the current version of the runtime all model inputs and outputs are defined statically, get_schema() will return the schema as it was defined.

Attempting to retrieve the schema for a dataset that is not defined as a model input or output will result in a runtime validation error, even if that dataset exists in the job config and is used by other models.

Parameters:

dataset_name (str) – The name of the input or output to get the schema for

Returns:

The schema definition for the named dataset

Return type:

SchemaDefinition

Raises:

ERuntimeValidation

abstract log()

Get a Python logger that can be used for writing model logs

Logs written to this logger are recorded by TRAC. When models are run on the platform, these logs are assembled and saved with the job outputs as a dataset, that can be queried through the regular TRAC data and metadata APIs.

Returns:

A Python logger that can be used for writing model logs

Return type:

logging.Logger

abstract put_pandas_table(dataset_name, dataset)

Save the data for a model output as a Pandas dataframe

The data for model outputs can be saved as a Pandas dataframe using this method. Outputs must be defined in TracModel.define_outputs(). Output names are case-sensitive.

The supplied data must match the schema of the named output. Missing fields or fields of the wrong type will result in a data validation error. Extra fields will be discarded with a warning. The schema of an output dataset can be checked using get_schema().

Each model output can only be saved once. Attempting to save the same output twice will cause a runtime validation error. Once an output has been saved, it can be retrieved by calling get_pandas_table() (or another get_XXX_table method). Attempting to save a dataset that is not defined as a model output will also cause a runtime validation error.

Parameters:
  • dataset_name (str) – The name of the model output to save data for

  • dataset (pandas.DataFrame) – A pandas dataframe containing the data for the named dataset

Raises:

ERuntimeValidation, EDataValidation

class tracdap.rt.api.TracModel

Base class that model components inherit from to be recognised by the platform

The modelling API is designed to be as simple and un-opinionated as possible. Models inherit from TracModel and implement the run_model() method to provide their model logic. run_model() has one parameter, a TracContext object which is supplied to the model at runtime, allowing it to access parameters, inputs and outputs.

Models must also as a minimum implement three methods to define the model schema, define_parameters(), define_inputs() and define_outputs(). The parameters, inputs and outputs that are defined will be available in the context at runtime. The tracdap.rt.api package includes a number of helper functions to implement these methods in a clear and robust way.

While model components can largely do what they like, there are three rules that should be followed to ensure models are deterministic. These are:

  1. No threading

  2. Use TRAC for random number generation

  3. Use TRAC to access the current time

Threading should never be needed in model code, Python only runs one execution thread at a time and TRAC already handles IO masking and model ordering. Both Pandas and PySpark provide compute concurrency. Random numbers and time will be made available in the TracContext API in a future version of TRAC.

Models should also avoid making system calls, or using the Python builtins exec() or eval().

See also

TracContext

define_attributes()

Define attributes that will be associated with the model when it is loaded into the TRAC platform

Note

This is an experimental API that is not yet stabilised, expect changes in future versions of TRAC

These attributes can be used to index or describe the model, they will be available for metadata searches. Attributes must be primitive (scalar) values that can be expressed in the TRAC type system. Multivalued attributes can be supplied as lists, in which case the attribute type must be given explicitly. Controlled attributes (starting with trac_ or _) are not allowed and will fail validation.

To define attributes in code, always use the define_* functions in the tracdap.rt.api package. This will ensure attributes are defined in the correct format with all the required fields. Attributes that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns:

A set of attributes that will be applied to the model when it is loaded into the TRAC platform

Return type:

Dict[str, Value]

abstract define_inputs()

Define data inputs that will be available to the model at runtime

Implement this method to define the model’s inputs, every data input that the model uses must be defined. Models may choose to ignore some inputs, it is ok to define inputs that are not always used.

To define model inputs in code, always use the define_* functions in the tracdap.rt.api package. This will ensure inputs are defined in the correct format with all the required fields. Model inputs that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns:

The full set of inputs that will be available to the model at runtime

Return type:

Dict[str, ModelInputSchema]

abstract define_outputs()

Define data outputs that will be produced by the model at runtime

Implement this method to define the model’s outputs, every data output that the model produces must be defined and every output that is defined must be produced. If a model defines an output which is not produced, a runtime validation error will be raised after the model completes.

To define model outputs in code, always use the define_* functions in the tracdap.rt.api package. This will ensure outputs are defined in the correct format with all the required fields. Model outputs that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns:

The full set of outputs that will be produced by the model at runtime

Return type:

Dict[str, ModelOutputSchema]

abstract define_parameters()

Define parameters that will be available to the model at runtime

Implement this method to define the model’s parameters, every parameter that the model uses must be defined. Models may choose to ignore some parameters, it is ok to define parameters that are not always used.

To define model parameters in code, always use the define_* functions in the tracdap.rt.api package. This will ensure parameters are defined in the correct format with all the required fields. Parameters that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns:

The full set of parameters that will be available to the model at

Return type:

Dict[str, ModelParameter]

abstract run_model(ctx)

Entry point for running model code

Implement this method to provide the model logic. A TracContext is provided at runtime, which makes parameters and inputs available and provides a means to save outputs. All the outputs defined in define_outputs() must be saved before this method returns, otherwise a runtime validation error will be raised.

Model code can raise exceptions, either in a controlled way by detecting error conditions and raising errors explicitly, or in an uncontrolled way as a result of bugs in the model code. Exceptions may also originate inside libraries the model code is using. If an exception escapes from :py:meth`run_model` TRAC will mark the model as failed, the job that contains the model will also fail.

Parameters:

ctx (TracContext) – A context use to access model inputs, outputs and parameters and communicate with the TRAC platform

tracdap.rt.api.A(attr_name, attr_value, attr_type=None, categorical=False)

Shorthand alias for define_attribute()

Note

This is an experimental API that is not yet stabilised, expect changes in future versions of TRAC

Return type:

TagUpdate

Parameters:
  • attr_name (str) –

  • attr_value (Any) –

  • attr_type (Optional[BasicType]) –

  • categorical (bool) –

tracdap.rt.api.F(field_name, field_type, label, business_key=False, categorical=False, format_code=None, field_order=None)

Shorthand alias for define_field()

Return type:

FieldSchema

Parameters:
  • field_name (str) –

  • field_type (BasicType) –

  • label (str) –

  • business_key (bool) –

  • categorical (bool) –

  • format_code (_tp.Optional[str]) –

  • field_order (_tp.Optional[int]) –

tracdap.rt.api.P(param_name, param_type, label, default_value=None)

Shorthand alias for define_parameter()

Return type:

_Named[ModelParameter]

Parameters:
  • param_name (str) –

  • param_type (TypeDescriptor | BasicType) –

  • label (str) –

  • default_value (Optional[Any]) –

tracdap.rt.api.declare_field(field_name, field_type, label, business_key=False, categorical=False, format_code=None, field_order=None)

Deprecated since version 0.4.4: Use define_field() or F() instead.

Return type:

FieldSchema

Parameters:
  • field_name (str) –

  • field_type (BasicType) –

  • label (str) –

  • business_key (bool) –

  • categorical (bool) –

  • format_code (_tp.Optional[str]) –

  • field_order (_tp.Optional[int]) –

tracdap.rt.api.declare_input_table(*fields)

Deprecated since version 0.4.4: Use define_input_table() instead.

Return type:

ModelInputSchema

Parameters:

fields (FieldSchema | List[FieldSchema]) –

tracdap.rt.api.declare_output_table(*fields)

Deprecated since version 0.4.4: Use define_output_table() instead.

Return type:

ModelOutputSchema

Parameters:

fields (FieldSchema | List[FieldSchema]) –

tracdap.rt.api.declare_parameter(param_name, param_type, label, default_value=None)

Deprecated since version 0.4.4: Use define_parameter() or P() instead.

Return type:

_Named[ModelParameter]

Parameters:
  • param_name (str) –

  • param_type (TypeDescriptor | BasicType) –

  • label (str) –

  • default_value (Optional[Any]) –

tracdap.rt.api.declare_parameters(*params)

Deprecated since version 0.4.4: Use define_parameters() instead.

Return type:

Dict[str, ModelParameter]

Parameters:

params (_Named[ModelParameter] | List[_Named[ModelParameter]]) –

tracdap.rt.api.define_attribute(attr_name, attr_value, attr_type=None, categorical=False)

Define an individual model attribute

Note

This is an experimental API that is not yet stabilised, expect changes in future versions of TRAC

Model attributes can be defined using this method (or trac.A). The attr_name and attr_value are always required to define an attribute. attr_type is always required for multivalued attributes but is optional otherwise. The categorical flag can be applied to STRING attributes if required.

Once defined attributes can be passed to define_attributes(), either as a list or as individual arguments, to create the set of attributes for a model.

Parameters:
  • attr_name (str) – The attribute name

  • attr_value (Any) – The attribute value (as a raw Python value)

  • attr_type (Optional[BasicType]) – The TRAC type for this attribute (optional, except for multivalued attributes)

  • categorical (bool) – A flag to indicate whether this attribute is categorical

Returns:

An attribute for the model, ready for loading into the TRAC platform

Return type:

TagUpdate

tracdap.rt.api.define_attributes(*attrs)

Defined a set of attributes to catalogue and describe a model

Note

This is an experimental API that is not yet stabilised, expect changes in future versions of TRAC

Attributes can be supplied either as individual arguments to this function or as a list. In either case, each attribute should be defined using define_attribute() (or trac.A).

Parameters:

attrs (TagUpdate | List[TagUpdate]) – The attributes that will be defined, either as individual arguments or as a list

Returns:

A set of model attributes, in the correct format to return from TracModel.define_attributes()

Return type:

List[TagUpdate]

tracdap.rt.api.define_field(field_name, field_type, label, business_key=False, categorical=False, format_code=None, field_order=None)

Define the schema for an individual field, which can be used in a model input or output schema.

Individual fields in a dataset can be defined using this method or the shorthand alias F(). The name, type and label of a field are always required. The business_key and categorical flags are false by default. Format code is optional.

If no field ordering is supplied, fields will automatically be assigned a contiguous ordering starting at 0. In this case care must be taken when creating an updated version of a model, that the order of existing fields is not disturbed. Adding fields to the end of a list is always safe. If field orders are specified explicitly, the must for a contiguous ordering starting at 0.

Once defined field schemas can be passed to define_input_table() or define_output_table(), either as a list or as individual arguments, to create the full schema for an input or output.

Parameters:
  • field_name (str) – The field’s name, used as the field identifier in code and queries (must be a valid identifier)

  • field_type (BasicType) – The data type of the field, only primitive types are allowed

  • label (str) – A descriptive label for the field (required)

  • business_key (bool) – Flag indicating whether this field is a business key for its dataset (default: False)

  • categorical (bool) – Flag indicating whether this is a categorical field (default: False)

  • format_code (_tp.Optional[str]) – A code that can be interpreted by client applications to format the field (optional)

  • field_order (_tp.Optional[int]) – Explicit field ordering (optional)

Returns:

A field schema, suitable for use in a schema definition

Return type:

FieldSchema

tracdap.rt.api.define_input_table(*fields)

Define a model input with a table schema.

Fields can be supplied either as individual arguments to this function or as a list. Individual fields should be defined using define_field() or the shorthand alias F().

Parameters:

fields (FieldSchema | List[FieldSchema]) – A set of fields to make up a TableSchema

Returns:

A model input schema, suitable for returning from TracModel.define_inputs()

Return type:

ModelInputSchema

tracdap.rt.api.define_output_table(*fields)

Define a model output with a table schema.

Fields can be supplied either as individual arguments to this function or as a list. Individual fields should be defined using define_field() or the shorthand alias F().

Parameters:

fields (FieldSchema | List[FieldSchema]) – A set of fields to make up a TableSchema

Returns:

A model output schema, suitable for returning from TracModel.define_outputs()

Return type:

ModelOutputSchema

tracdap.rt.api.define_parameter(param_name, param_type, label, default_value=None)

Define an individual model parameter

Individual model parameters can be defined using this method (or trac.P). The name, type and label are required fields to define a parameter. Name is used as the identifier to work with the parameter in code, e.g. when calling get_parameter() or defining parameters in a job config.

If a default value is specified, the model parameter becomes optional. It is ok to omit optional parameters when running models or setting up jobs, in which case the default value will be used. If no default is specified then the model parameter becomes mandatory, a value must always be supplied in order to execute the model.

Once defined model parameters can be passed to define_parameters(), either as a list or as individual arguments, to create the set of parameters for a model.

Parameters:
  • param_name (str) – The parameter name, used to identify the parameter in code (must be a valid identifier)

  • param_type (TypeDescriptor | BasicType) – The parameter type, expressed in the TRAC type system

  • label (str) – A descriptive label for the parameter (required)

  • default_value (Optional[Any]) – A default value to use if no explicit value is supplied (optional)

Returns:

A named model parameter, suitable for passing to define_parameters()

Return type:

_Named[ModelParameter]

tracdap.rt.api.define_parameters(*params)

Defined all the parameters used by a model

Parameters can be supplied either as individual arguments to this function or as a list. In either case, each parameter should be defined using define_parameter() (or trac.P).

Parameters:

params (_Named[ModelParameter] | List[_Named[ModelParameter]]) – The parameters that will be defined, either as individual arguments or as a list

Returns:

A set of model parameters, in the correct format to return from TracModel.define_parameters()

Return type:

Dict[str, ModelParameter]

tracdap.rt.api.define_schema(*fields, schema_type=SchemaType.TABLE)

Create a SchemaDefinition from a list of fields.

Fields can be supplied either as individual arguments to this function or as a list. Individual fields should be defined using define_field() or the shorthand alias F(). Schema type can be specified using the schema_type parameter, currently only TABLE schemas are supported.

Model inputs and outputs must be specified as ModelInputSchema and ModelOutputSchema respectively. The input/output schema classes both require a schema definition than can be created with this method. Alternatively, you can use define_input_table() or define_output_table() to create the input/output schema classes directly.

Parameters:
  • fields (FieldSchema | List[FieldSchema]) – The list of fields to include in the schema

  • schema_type (SchemaType) – The type of schema to create (currently only TABLE schemas are supported)

Returns:

A schema definition built from the supplied fields and schema type

Return type:

SchemaDefinition

tracdap.rt.api.load_schema(package, schema_file, schema_type=SchemaType.TABLE)

load a SchemaDefinition from a CSV file or package resource.

The schema CSV file must contain the following columns:

  • field_name (string, required)

  • field_type (BasicType, required)

  • label (string, required)

  • business_key (boolean, optional)

  • categorical (boolean, optional)

  • format_code (string, optional)

Field order is taken from the order in which the fields are listed. Schema type can be specified using the schema_type parameter, currently only TABLE schemas are supported.

Model inputs and outputs must be specified as ModelInputSchema and ModelOutputSchema respectively. The input/output schema classes both require a schema definition than can be created with this method.

Parameters:
  • package (ModuleType | str) – Package (or package name) in the model repository that contains the schema file

  • schema_file (str) – Name of the schema file to load, which must be in the specified package

  • schema_type (SchemaType) – The type of schema to create (currently only TABLE schemas are supported)

Returns:

A schema definition loaded from the schema file

Return type:

SchemaDefinition