Chapter 3 - Inputs & Outputs¶

This tutorial is based on example code which can be found in the TRAC GitHub Repository under examples/models/python.

Optional Inputs & Outputs¶

Optional inputs and outputs provide a way for a model to react to the available data. If an input is marked as optional then it may not be supplied, the model code must check at runtime to see if it is available. When an output is marked as optional the model can choose whether to provide that output or not, for example in response to the input data or a boolean flag supplied as a model parameter.

Here is an example of defining an optional input, using schemas read from schema files:

src/tutorial/optional_io.py¶

    def define_inputs(self) -> tp.Dict[str, trac.ModelInputSchema]:

        # Define an optional account filter input, using external schema files

        customer_loans = trac.load_schema(schemas, "customer_loans.csv")
        account_filter = trac.load_schema(schemas, "account_filter.csv")

        return {
            "customer_loans": trac.ModelInputSchema(customer_loans),
            "account_filter": trac.ModelInputSchema(account_filter, optional=True)
        }

Schemas defined in code can also be marked as optional, let’s use that approach to define an optional output:

    def define_outputs(self) -> tp.Dict[str, trac.ModelOutputSchema]:

        # Define an optional output for stats on excluded accounts, using schema definitions in code

        profit_by_region = trac.define_output_table(
            trac.F("region", trac.STRING, label="Customer home region", categorical=True),
            trac.F("gross_profit", trac.DECIMAL, label="Total gross profit"))

        exclusions = trac.define_output_table(
            trac.F("reason", trac.STRING, "Reason for exclusion"),
            trac.F("count", trac.INTEGER, "Number of accounts"),
            optional=True)

        return {
            "profit_by_region": profit_by_region,
            "exclusions": exclusions
        }

Now let’s see how to use optional inputs and outputs in run_model(). Since the input is optional we will need to check if it is available before we can use it. TRAC provides the has_dataset() method for this purpose. If the optional dataset exists we will use it to apply some filtering to the customer accounts list, then produce the optional output dataset with some stats on the filtered accounts. Here is what that looks like:

        if ctx.has_dataset("account_filter"):

            # Filter out customer accounts with IDs in the filter set
            account_filter = ctx.get_pandas_table("account_filter")
            account_mask = customer_loans['id'].isin(account_filter["account_id"])
            customer_loans = customer_loans.loc[~account_mask]

            # Create an optional output with some stats about the excluded accounts
            exclusions = account_filter.groupby(["reason"]).size().to_frame(name="count").reset_index()
            ctx.put_pandas_table("exclusions", exclusions)

In this example the optional output is only produced when the optional input is supplied - that is not a requirement and the model can decide whether to provide optional outputs based on whatever criteria are appropriate. If an optional output is not going to be produced, then simply do not output the dataset and TRAC will understand it has been omitted. If an optional output is produced then it is subject to all the same validation rules as any other dataset.