Chapter 1 - Hello World

This tutorial is based on example code which can be found in the TRAC GitHub Repository under examples/models/python.

Requirements

The TRAC runtime for Python has these requirements:

  • Python: 3.8 up to 3.12

  • Pandas: 1.2 up to 2.2

  • PySpark 3.0 up to 3.5 (optional)

3rd party libraries may impose additional constraints on supported versions of Python, Pandas or PySpark. As of February 2024, the Python libraries for GCP do not yet support Python 3.12.

Setting up a new project

If you are starting a project from scratch, it’s a good idea to follow the standard Python conventions for package naming and folder layout. If you are working on an existing project or are already familiar with the Python conventions, then you can skip this section

For this example we will create a project folder called example-project. Typically this will be a Git repository. You will also want to create a Python virtual environment for the project. Some IDEs will be able to do this for you, or you can do it from the command line using these commands:

mkdir example-project
cd example-project
git init
python -m venv .\venv
venv\Scripts\activate
mkdir example-project
cd example-project
git init
python -m venv ./venv
. venv/bin/activate

For this tutorial we want a single Python package that we will call “tutorial”. By convention Python source code goes in a folder called either “src” or the name of your project - we will use “src”. We are going to need some config files, those should be outside the source folder. We will also need a folder for tests and a few other common project files. Here is a very standard example of what that looks like:

examples-project
├── config
│   ├── hello_world.yaml
│   └── sys_config.yaml
├── src
│   └── tutorial
│       ├── __init__.py
│       └── hello_world.py
├── test
│   └── tutorial_tests
│       ├── __init__.py
│       └── test_hello_world_model.py
├── venv
│   └── ...
├── .gitignore
├── README.txt
└── ...

Let’s quickly run through what these files are. First the src folder and the tutorial package. In this example “tutorial” is our root package, which means any import statements in our code should start with “import tutorial.” or “from tutorial.xxx import yyy”. To make the folder called “tutorial” into a Python package we have to add the special __init__.py file, initially this should be empty. We have created one module, hello_world, in the tutorial package and this is where we will add the code for our model.

It is important to note that the “src” folder is not a package, rather it is the folder where our packages live. This means that other folders and files (e.g. config, the .gitignore file and everything else) do not get muddled into the Python package tree. If you see code that says “import src.xxx” or “from src.xxx import yyy” then something has gone wrong!

The test folder contains our test code which is also arranged as a package. Notice that the package name is not the same (tutorial_test instead of tutorial) - Python will not allow the same package to be defined in two places. Putting the test code in a separate test folder stops it getting mixed in with the code in src/, which is important when it comes to releasing code to production.

TRAC uses a few simple config files to control models during local development, so we have set up a config folder to put those in. The contents of these files is discussed later in the tutorial.

The venv/ folder is where Python puts any libraries your project uses, including the TRAC runtime library. Typically you want to ignore this folder in Git by adding it to the .gitignore file. Your IDE might do this automatically, otherwise you can create a file called .gitignore and add this line to it:

venv/**

The README.txt file is not required but it is usually a good idea to have one. You can add a brief description of the project, instructions for build and running the code etc. if you are using GitHub the contents of this file will be displayed on the home page for your repository.

Installing the runtime

The TRAC runtime package can be installed directly from PyPI:

pip install tracdap-runtime

The TRAC runtime depends on Pandas and PySpark, so these libraries will be pulled in as dependencies. If you want to target particular versions, you can install them explicitly:

pip install "pandas == 2.1.4"

Alternatively, you can create requirements.txt in the root of your project folder and record projects requirements there.

Note

TRAC supports both Pandas 1.X and 2.X. Models written for 1.X might not work with 2.X and vice versa. From TRAC 0.6 onward, new installations default to Pandas 2.X. To change the version of Pandas in your sandbox environment, you can use the pip install command:

pip install "pandas == 1.5.3"

Writing a model

To write a model, start by importing the TRAC API package and inheriting from the TracModel base class. This class is the entry point for running code in TRAC, both on the platform and using the local development sandbox.

src/tutorial/hello_world.py
16import typing as tp
17import tracdap.rt.api as trac
18
19
20class HelloWorldModel(trac.TracModel):

The model can define any parameters it is going to need. In this example there is only a single parameter so it can be declared in code (more complex models may wish to manage parameters in a parameters file). TRAC provides helper functions to ensure parameters are defined in the correct format.

22    def define_parameters(self) -> tp.Dict[str, trac.ModelParameter]:
23
24        return trac.define_parameters(
25            trac.P(
26                "meaning_of_life", trac.INTEGER,
27                label="The answer to the ultimate question of life, the universe and everything"))

The model can also define inputs and outputs. In this case since all we are going to do is write a message in the log, no inputs and outputs are needed. Still, these methods are required in order for the model to be valid.

29    def define_inputs(self) -> tp.Dict[str, trac.ModelInputSchema]:
30        return {}
31
32    def define_outputs(self) -> tp.Dict[str, trac.ModelOutputSchema]:
33        return {}

To write the model logic, implement the run_model() method. When run_model() is called it receives a TracContext object which allows models to interact with the TRAC platform.

35    def run_model(self, ctx: trac.TracContext):
36
37        ctx.log().info("Hello world model is running")
38
39        meaning_of_life = ctx.get_parameter("meaning_of_life")
40        ctx.log().info(f"The meaning of life is {meaning_of_life}")

There are two useful features of TracContext that can be seen in this example:

  • The log() method returns a standard Python logger that can be used for writing model logs. When models run on the platform, TRAC will capture any logs written to this logger and make them available with the job outputs as searchable datasets. Log outputs are available even if a job fails so they can be used for debugging.

  • get_parameter() allows models to access any parameters defined in the define_parameters() method. They are returned as native Python objects, so integers use the Python integer type, date and time values use the Python datetime classes and so on.

Supplying config

To run the model, we need to supply two configuration files:

  • Job config, which includes everything related to the models and the data and parameters that will be used to execute them.

  • System config, which includes everything related to storage locations, repositories, execution environment and other system settings.

When models are deployed to run on the platform, TRAC generates the job configuration according to scheduled instructions and/or user input. A full set of metadata is assembled for every object and setting that goes into a job, so that execution can be strictly controlled and validated. In development mode most of this configuration can be inferred, so the config needed to run models is kept short and readable.

For our Hello World model, we only need to supply a single parameter in the job configuration:

config/hello_world.yaml
job:
  runModel:

    parameters:
      meaning_of_life: 42

Since this model is not using a Spark session or any storage, there is nothing that needs to be configured in the system config. We still need to supply a config file though:

config/sys_config.yaml
# The file can be empty, but you need to supply it!

Run the model

The easiest way to launch a model during development is to call launch_model() from the TRAC launch package. Make sure to guard the launch by checking __name__ == “__main__”, to prevent launching a local config when the model is deployed to the platform (TRAC will not allow this, but the model will fail to deploy)!

src/tutorial/hello_world.py
43if __name__ == "__main__":
44    import tracdap.rt.launch as launch
45    launch.launch_model(HelloWorldModel, "config/hello_world.yaml", "config/sys_config.yaml")

Paths for the system and job config files are resolved in the following order:

  1. If absolute paths are supplied, these take top priority

  2. Resolve relative to the current working directory

  3. Resolve relative to the directory containing the Python module of the model

Now you should be able to run your model script and see the model output in the logs:

2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - START RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - Hello world model is running
2022-05-31 12:19:36,104 [engine] INFO __main__.HelloWorldModel - The meaning of life is 42
2022-05-31 12:19:36,104 [engine] INFO tracdap.rt.exec.engine.NodeProcessor - DONE RunModel [HelloWorldModel] / JOB-92df0bd5-50bd-4885-bc7a-3d4d95029360-v1

See also

Full source code is available for the Hello World example on GitHub