trac.rt.api

Package Contents

Classes

TracContext

Interface that allows model components to interact with the platform at runtime

TracModel

Base class that model components inherit from to be recognised by the platform

Functions

F(*args, **kwargs)

Shorthand alias for declare_field()

P(*args, **kwargs)

Shorthand alias for declare_parameter()

declare_field(field_name, field_type, label, business_key = False, categorical = False, format_code = None, field_order = None)

Declare an individual field, for use in a model input or output schema

declare_input_table(*fields)

Declare a model input with a table schema

declare_output_table(*fields)

Declare a model output with a table schema

declare_parameter(param_name, param_type, label, default_value = None)

Declare an individual model parameter

declare_parameters(*params)

Declare all the parameters used by a model

class trac.rt.api.TracContext

Interface that allows model components to interact with the platform at runtime

TRAC supplies every model with a context when the model is run. The context allows models to access parameters, inputs, outputs and schemas, as well as other resources such as the Spark context (if the model is using Spark) and model logs.

TRAC guarantees that everything defined in the model (parameters, inputs and outputs) will be available in the context when the model is running. So, if a model defines a parameter called “param1” as an integer, the model will be able to call get_parameter(“param1”) and will receive an integer value.

When a model is running on a production deployment of the TRAC platform, parameters, inputs and outputs will be supplied by TRAC as part of the job. These could be coming from entries selected by a user in the UI or settings configured as part of a scheduled task. To develop models locally, a job config file can be supplied (typically in YAML or JSON) to set the required parameters, inputs and output locations. In either case, TRAC will validate the supplied configuration against the model definition to make sure the context always includes exactly what the model requires.

All the context API methods are validated at runtime and will raise ERuntimeValidation if a model tries to access an unknown identifier or perform some other invalid operation.

See also

TracModel

abstract get_pandas_table(self, dataset_name)

Get the data for a model input or output as a Pandas dataframe

The data for both inputs and outputs can be retrieved as a Pandas dataframe using this method. Inputs must be defined in TracModel.define_inputs() and outputs in TracModel.define_outputs(). Input and output names are case sensitive.

The TRAC runtime will handle loading the data and assembling it into a Pandas dataframe. This may happen before the model runs or when a dataset is requested. Models should take care not to request very large datasets as Pandas tables, doing so is likely to cause a memory overflow. Use get_spark_table() instead to work with big data.

Model inputs are always available and can be queried by this method. Outputs are only available after they have been saved to the context using put_pandas_table() (or another put_XXX_table method). Attempting to retrieve an output before it has been saved will cause a runtime validation error.

Attempting to retrieve a dataset that is not defined as a model input or output will result in a runtime validation error, even if that dataset exists in the job config and is used by other models.

Parameters

dataset_name (str) – The name of the model input or output to get data for

Returns

A pandas dataframe containing the data for the named dataset

Raises

ERuntimeValidation

Return type

pandas.DataFrame

abstract get_parameter(self, parameter_name)

Get the value of a model parameter

Model parameters defined in TracModel.define_parameters() can be retrieved at runtime by this method. Values are returned as native Python types. Parameter names are case sensitive.

Attempting to retrieve parameters not defined by the model will result in a runtime validation error, even if those parameters are supplied in the job config and used by other models.

Parameters

parameter_name (str) – The name of the parameter to get

Returns

The parameter value, as a native Python data type

Raises

ERuntimeValidation

Return type

Any

abstract get_schema(self, dataset_name)

Get the schema of a model input or output

The schema of an input or output can be retrieved and examined at runtime using this method. Inputs must be defined in TracModel.define_inputs() and outputs in TracModel.define_outputs(). Input and output names are case sensitive.

In the current version of the runtime all model inputs and outputs are defined statically, get_schema() will return the schema as it was defined.

Attempting to retrieve the schema for a dataset that is not defined as a model input or output will result in a runtime validation error, even if that dataset exists in the job config and is used by other models.

Parameters

dataset_name (str) – The name of the input or output to get the schema for

Returns

The schema definition for the named dataset

Return type

SchemaDefinition

Raises

ERuntimeValidation

abstract get_spark_context(self)

Spark support is not available in the current version of the runtime

Return type

pyspark.SparkContext

abstract get_spark_sql_context(self)

Spark support is not available in the current version of the runtime

Return type

pyspark.sql.SQLContext

abstract get_spark_table(self, dataset_name)

Spark support is not available in the current version of the runtime

Parameters

dataset_name (str) –

Return type

pyspark.sql.DataFrame

abstract get_spark_table_rdd(self, dataset_name)

Spark support is not available in the current version of the runtime

Parameters

dataset_name (str) –

Return type

pyspark.RDD

abstract log(self)

Get a Python logger that can be used for writing model logs

Logs written to this logger are recorded by TRAC. When models are run on the platform, these logs are assembled and saved with the job outputs as a dataset, that can be queried through the regular TRAC data and metadata APIs.

Returns

A Python logger that can be used for writing model logs

Return type

logging.Logger

abstract put_pandas_table(self, dataset_name, dataset)

Save the data for a model output as a Pandas dataframe

The data for model outputs can be saved as a Pandas dataframe using this method. Outputs must be defined in TracModel.define_outputs(). Output names are case sensitive.

The supplied data must match the schema of the named output. Missing fields or fields of the wrong type will result in a data validation error. Extra fields will be discarded with a warning. The schema of an output dataset can be checked using get_schema().

Each model output can only be saved once. Attempting to save the same output twice will cause a runtime validation error. Once an output has been saved, it can be retrieved by calling get_pandas_table() (or another get_XXX_table method). Attempting to save a dataset that is not defined as a model output will also cause a runtime validation error.

Parameters
  • dataset_name (str) – The name of the model output to save data for

  • dataset (pandas.DataFrame) – A pandas dataframe containing the data for the named dataset

Raises

ERuntimeValidation, EDataValidation

abstract put_spark_table(self, dataset_name, dataset)

Spark support is not available in the current version of the runtime

Parameters
  • dataset_name (str) –

  • dataset (pyspark.sql.DataFrame) –

abstract put_spark_table_rdd(self, dataset_name, dataset)

Spark support is not available in the current version of the runtime

Parameters
  • dataset_name (str) –

  • dataset (pyspark.RDD) –

class trac.rt.api.TracModel

Base class that model components inherit from to be recognised by the platform

The modelling API is designed to be as simple and un-opinionated as possible. Models inherit from TracModel and implement the run_model() method to provide their model logic. run_model() has one parameter, a TracContext object which is supplied to the model at runtime, allowing it to access parameters, inputs and outputs.

Models must also as a minimum implement three methods to define the model schema, define_parameters(), define_inputs() and define_outputs(). The parameters, inputs and outputs that are defined will be available in the context at runtime. The trac.rt.api package includes a number of helper functions to implement these methods in a clear and robust way.

While model components can largely do what they like, there are three rules that should be followed to ensure models are deterministic. These are:

  1. No threading

  2. Use TRAC for random number generation

  3. Use TRAC to access the current time

Threading should never be needed in model code, Python only runs one execution thread at a time and TRAC already handles IO masking and model ordering. Both Pandas and PySpark provide compute concurrency. Random numbers and time will be made available in the TracContext API in a future version of TRAC.

Models should also avoid making system calls, or using the Python builtins exec() or eval().

See also

TracContext

abstract define_inputs(self)

Define data inputs that will be available to the model at runtime

Implement this method to define the model’s inputs, every data input that the model uses must be defined. Models may choose to ignore some inputs, it is ok to define inputs that are not always used.

To declare model inputs in code, always use the declare_* functions in the trac.rt.api package. This will ensure inputs are defined in the correct format with all the required fields. Model inputs that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns

The full set of inputs that will be available to the model at runtime

Return type

Dict[str, trac.rt.metadata.ModelInputSchema]

abstract define_outputs(self)

Define data outputs that will be produced by the model at runtime

Implement this method to define the model’s outputs, every data output that the model produces must be defined and every output that is defined must be produced. If a model defines an output which is not produced, a runtime validation error will be raised after the model completes.

To declare model outputs in code, always use the declare_* functions in the trac.rt.api package. This will ensure outputs are defined in the correct format with all the required fields. Model outputs that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns

The full set of outputs that will be produced by the model at runtime

Return type

Dict[str, trac.rt.metadata.ModelOutputSchema]

abstract define_parameters(self)

Define parameters that will be available to the model at runtime

Implement this method to define the model’s parameters, every parameter that the model uses must be defined. Models may choose to ignore some parameters, it is ok to define parameters that are not always used.

To declare model parameters in code, always use the declare_* functions in the trac.rt.api package. This will ensure parameters are defined in the correct format with all the required fields. Parameters that are defined in the wrong format or with required fields missing will result in a model validation failure.

Returns

The full set of parameters that will be available to the model at runtime

Return type

Dict[str, trac.rt.metadata.ModelParameter]

abstract run_model(self, ctx)

Entry point for running model code

Implement this method to provide the model logic. A TracContext is provided at runtime, which makes parameters and inputs available and provides a means to save outputs. All the outputs defined in define_outputs() must be saved before this method returns, otherwise a runtime validation error will be raised.

Model code can raise exceptions, either in a controlled way by detecting error conditions and raising errors explicitly, or in an uncontrolled way as a result of bugs in the model code. Exceptions may also originate inside libraries the model code is using. If an exception escapes from :py:meth`run_model` TRAC will mark the model as failed, the job that contains the model will also fail.

Parameters

ctx (TracContext) – A context use to access model inputs, outputs and parameters and communicate with the TRAC platform

trac.rt.api.F(*args, **kwargs)

Shorthand alias for declare_field()

trac.rt.api.P(*args, **kwargs)

Shorthand alias for declare_parameter()

trac.rt.api.declare_field(field_name, field_type, label, business_key=False, categorical=False, format_code=None, field_order=None)

Declare an individual field, for use in a model input or output schema

Individual fields in a dataset can be declared using this method (or trac.F). The name, type and label of a field are required parameters. The business_key and categorical flags are false by default. Format code is optional.

If no field ordering is supplied, fields will automatically be assigned a contiguous ordering starting at 0. In this case care must be taken when creating an updated version of a model, that the order of existing fields is not disturbed. Adding fields to the end of a list is always safe. If field orders are specified explicitly, the must for a contiguous ordering starting at 0.

Declared fields should be passed to declare_input_table() or declare_output_table(), either individually or as a list, to create the full schema for an input or output.

Parameters
  • field_name (str) – The field’s name, used as the field identifier in code and queries (must be a valid identifier)

  • field_type (trac.rt.metadata.BasicType) – The data type of the field, only primitive types are allowed

  • label (str) – A descriptive label for the field (required)

  • business_key (bool) – Flag indicating whether this field is a business key for its dataset (default: False)

  • categorical (bool) – Flag indicating whether this is a categorical field (default: False)

  • format_code (Optional[str]) – A code that can be interpreted by client applications to format the field (optional)

  • field_order (Optional[int]) – Explicit field ordering (optional)

Returns

A field schema, suitable for use in a schema definition

Return type

trac.rt.metadata.FieldSchema

trac.rt.api.declare_input_table(*fields)

Declare a model input with a table schema

Fields can be supplied either as individual arguments to this function or as a list. In either case, each field should be declared using declare_field() (or trac.F).

Parameters

fields (Union[trac.rt.metadata.FieldSchema, List[trac.rt.metadata.FieldSchema]]) – A set of fields to make up a TableSchema

Returns

A model input schema, suitable for returning from TracModel.define_inputs()

Return type

trac.rt.metadata.ModelInputSchema

trac.rt.api.declare_output_table(*fields)

Declare a model output with a table schema

Fields can be supplied either as individual arguments to this function or as a list. In either case, each field should be declared using declare_field() (or trac.F).

Parameters

fields (Union[trac.rt.metadata.FieldSchema, List[trac.rt.metadata.FieldSchema]]) – A set of fields to make up a TableSchema

Returns

A model output schema, suitable for returning from TracModel.define_outputs()

Return type

trac.rt.metadata.ModelOutputSchema

trac.rt.api.declare_parameter(param_name, param_type, label, default_value=None)

Declare an individual model parameter

Individual model parameters can be declared using this method (or trac.P). The name, type and label are required fields to declare a parameter. Name is used as the identifier to work with the parameter in code, e.g. when calling get_parameter() or defining parameters in a job config.

If a default value is specified, the model parameter becomes optional. It is ok to omit optional parameters when running models or setting up jobs, in which case the default value will be used. If no default is specified then the model parameter becomes mandatory, a value must always be supplied in order to execute the model.

Declared parameters should be passed to declare_parameters(), either individually or as a list, to create the set of parameters for a model.

Parameters
  • param_name (str) – The parameter name, used to identify the parameter in code (must be a valid identifier)

  • param_type (Union[trac.rt.metadata.TypeDescriptor, trac.rt.metadata.BasicType]) – The parameter type, expressed in the TRAC type system

  • label (str) – A descriptive label for the parameter (required)

  • default_value (Optional[Any]) – A default value to use if no explicit value is supplied (optional)

Returns

A named model parameter, suitable for passing to declare_parameters()

Return type

_Named[trac.rt.metadata.ModelParameter]

trac.rt.api.declare_parameters(*params)

Declare all the parameters used by a model

Parameters can be supplied either as individual arguments to this function or as a list. In either case, each parameter should be declared using declare_parameter() (or trac.P).

Parameters

params (Union[_Named[trac.rt.metadata.ModelParameter], List[_Named[trac.rt.metadata.ModelParameter]]]) – The parameters that will be defined, either as individual arguments or as a list

Returns

A set of model parameters, in the correct format to return from :py:meth:TracModel.define_parameters

Return type

Dict[str, trac.rt.metadata.ModelParameter]