Chapter 1 - Hello World

This tutorial is based on the hello_world.py example, which can be found in the TRAC GitHub Repository under examples/models/python.

Requirements

The TRAC runtime for Python has these requirements:

  • Python: 3.7 up to 3.11.x

  • Pandas: 1.2 up to 1.5.x

  • PySpark 2.4.x, or 3.0 up to 3.3.x

Not every combination of versions will work, e.g. PySpark 3 requires Python 3.8.

Installing the runtime

The TRAC runtime package can be installed directly from PyPI:

pip install tracdap-runtime

The TRAC runtime depends on Pandas and PySpark, so these libraries will be pulled in as dependencies. If you want to target particular versions, install them explicitly first.

Writing a model

To write a model, start by importing the TRAC API package and inheriting from the TracModel base class. This class is the entry point for running code in TRAC, both on the platform and using the local development sandbox.

examples/models/python/src/tutorial/hello_world.py
15import typing as tp
16import tracdap.rt.api as trac
17
18
19class HelloWorldModel(trac.TracModel):
20

The model can define any parameters it is going to need. In this example there is only a single parameter so it can be declared in code (more complex models may wish to manage parameters in a parameters file). TRAC provides helper functions to ensure parameters are defined in the correct format.

21    def define_parameters(self) -> tp.Dict[str, trac.ModelParameter]:
22
23        return trac.define_parameters(
24            trac.P(
25                "meaning_of_life", trac.INTEGER,
26                label="The answer to the ultimate question of life, the universe and everything"))
27

The model can also define inputs and outputs. In this case since all we are going to do is write a message in the log, no inputs and outputs are needed. Still, these methods are required in order for the model to be valid.

28    def define_inputs(self) -> tp.Dict[str, trac.ModelInputSchema]:
29        return {}
30
31    def define_outputs(self) -> tp.Dict[str, trac.ModelOutputSchema]:
32        return {}
33

To write the model logic, implement the run_model() method. When run_model() is called it receives a TracContext object which allows models to interact with the TRAC platform.

34    def run_model(self, ctx: trac.TracContext):
35
36        ctx.log().info("Hello world model is running")
37
38        meaning_of_life = ctx.get_parameter("meaning_of_life")
39        ctx.log().info(f"The meaning of life is {meaning_of_life}")
40

There are two useful features of TracContext that can be seen in this example:

  • The log() method returns a standard Python logger that can be used for writing model logs. When models run on the platform, TRAC will capture any logs written to this logger and make them available with the job outputs as searchable datasets. Log outputs are available even if a job fails so they can be used for debugging.

  • get_parameter() allows models to access any parameters defined in the define_parameters() method. They are returned as native Python objects, so integers use the Python integer type, date and time values use the Python datetime classes and so on.

Supplying config

To run the model, we need to supply two configuration files:

  • Job config, which includes everything related to the models and the data and parameters that will be used to execute them.

  • System config, which includes everything related to storage locations, repositories, execution environment and other system settings.

When models are deployed to run on the platform, TRAC generates the job configuration according to scheduled instructions and/or user input. A full set of metadata is assembled for every object and setting that goes into a job, so that execution can be strictly controlled and validated. In development mode most of this configuration can be inferred, so the config needed to run models is kept short and readable.

For our Hello World model, we only need to supply a single parameter in the job configuration:

examples/models/python/config/hello_world.yaml
job:
  runModel:

    parameters:
      meaning_of_life: 42

Since this model is not using a Spark session or any storage, there is nothing that needs to be configured in the system config. We still need to supply a config file though:

sys_config.yaml
# No system config needed!

Run the model

The easiest way to launch a model during development is to call launch_model() from the TRAC launch package. Make sure to guard the launch by checking __name__ == “__main__”, to prevent launching a local config when the model is deployed to the platform (TRAC will not allow this, but the model will fail to deploy)!

examples/models/python/src/tutorial/hello_world.py
42if __name__ == "__main__":
43    import tracdap.rt.launch as launch
44    launch.launch_model(HelloWorldModel, "config/hello_world.yaml", "config/sys_config.yaml")

Paths for the system and job config files are resolved in the following order:

  1. If absolute paths are supplied, these take top priority

  2. Resolve relative to the current working directory

  3. Resolve relative to the directory containing the Python module of the model

Now you should be able to run your model script and see the model output in the logs:

2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - START RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - Hello world model is running
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The meaning of life is 42
2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - DONE RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1

See also

The full source code for this example is available on GitHub