TracDataApi¶
- class tracdap.api.TracDataApi¶
Public API for creating, updating, reading and querying primary data stored in the TRAC platform.
The TRAC data API provides a standard mechanism for client applications to store and access data in the TRAC platform. Calls are translated into the underlying storage mechanisms, using push-down operations where possible for efficient queries on large datasets.
The data API includes format translation, so data can be uploaded and retrieved in any supported format. The back-end storage format is controlled by the platform. For example if a user uploads a CSV file TRAC will convert it to the default storage format (by default Arrow IPC file format). Later a web application might ask for that data in JSON format and TRAC would again perform the conversion. The platform uses Apache Arrow as an intermediate representation that other formats are converted from and to.
The data API uses streaming operations to support transfer of large datasets and can be used for both user-facing client applications and system-to-system integration. For regular, high-volume data transfers there are other options for integration, including data import jobs and direct back-end integration of the underlying storage technologies. These options can be faster and reduce storage requirements, at the expense of tighter coupling between systems. A compromise is to use direct integration and import jobs for a small number of critical feeds containing a high volume of data, and use the data API for client access and for integration of secondary and/or low-volume systems.
- createDataset(request)¶
Create a new dataset, supplying the schema and content as a data stream
This method creates a new dataset and a corresponding DATA object in the TRAC metadata store. Once a dataset is created it can be used as an input into a model run, it can also be read and queried using the data API. Data can be supplied in any format supported by the platform.
Although large datasets can be uploaded using this call, data import jobs are normally used to bring in large volumes of data from external systems.
The request must specify a schema for the dataset, incoming data will be verified against the schema. Schemas can be specified using either:
A full schema definition - if a full schema is supplied, it will be embedded with the dataset and used for this dataset only
A schema ID - a tag selector for an existing SCHEMA object, which may be shared by multiple datasets
The “format” parameter describes the format used to upload data. For example, to upload a CSV file the format would be set to “text/csv” and the file content can be uploaded directly, or to upload the output of an editor grid in a web client the format can be set to “text/json” to upload a JSON representation of the editor contents. TRAC will apply format conversion before the data is processed and stored.
Tag updates can be supplied to tag the newly created dataset, they behave exactly the same as tag updates in the createObject() call of TracMetadataApi.
This is a client streaming method. The first message in the request stream must contain all the request fields and metadata, including a schema specifier. After the first message all metadata fields should be omitted. Subsequent messages should contain the content of the dataset as a series of chunks, encoded as per the “format” field of the first message. Clients may choose whether or not to include a chunk of data in the first message and empty (i.e. zero-length) chunks are permitted at any point in the stream.
This method returns the header of the newly created DATA object. Error conditions include: Invalid request, unknown tenant, schema not found (if an external schema ID is used), format not supported, data does not match schema, corrupt or invalid data stream. Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (DataWriteRequest)
- Return type:
- createFile(request)¶
Upload a new file into TRAC, sending the content as a data stream
Calling this method will create a new FILE object in the metadata store. Tag updates can be supplied when creating a FILE, they will be passed on to the metadata service. The semantics for tag updates are identical to the createObject() method in TracMetadataApi.
This is a client streaming method. The first message in the request stream must contain all the request fields and required metadata. The second and subsequent messages should contain the content of the file as a series of chunks (byte buffers). (other fields that are set after the first message will be ignored). Empty chunks can be included at any point in the stream and will be ignored. Clients may choose to include the first chunk in the first message along with the request metadata, or to put an empty chunk in the first message and start streaming content in the second message. For very small files, it is possible to put the entire content in one chunk in the first message, so there is only a single message in the stream. All of these approaches are supported.
Clients may specify the size of the file being created. When a size is supplied, TRAC will check the size against the number of bytes stored. If the stored file size does not match the supplied value, the error will be reported with an error status of DATA_LOSS. When no size is supplied the check cannot be performed.
The method returns the header of the newly created FILE object. Error conditions include: Invalid request, unknown tenant and validation failure and data loss (if the number of bytes stored does not match the number specified in the request). Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (FileWriteRequest)
- Return type:
- createSmallDataset(request)¶
Create a new dataset, supplying the schema and content as a single blob
This method creates a new dataset and a corresponding DATA object in the TRAC metadata store. Once a dataset is created it can be used as an input into a model run, it can also be read and queried using the data API. Data can be supplied in any format supported by the platform.
The request must specify a schema for the dataset, incoming data will be verified against the schema. Schemas can be specified using either:
A full schema definition - if a full schema is supplied, it will be embedded with the dataset and used for this dataset only
A schema ID - a tag selector for an existing SCHEMA object, which may be shared by multiple datasets
The “format” parameter describes the format used to upload data. For example, to upload a CSV file the format would be set to “text/csv” and the file content can be uploaded directly, or to upload the output of an editor grid in a web client the format can be set to “text/json” to upload a JSON representation of the editor contents. TRAC will apply format conversion before the data is processed and stored.
Tag updates can be supplied to tag the newly created dataset, they behave exactly the same as tag updates in the createObject() call of TracMetadataApi.
This is a unary call, all the request fields and metadata (including schema specifier) and dataset content encoded as per the “format” field are supplied in a single message. It is intended for working with small datasets and for use in environments where client streaming is not available (particularly in gRPC-Web clients).
This method returns the header of the newly created DATA object. Error conditions include: Invalid request, unknown tenant, schema not found (if an external schema ID is used), format not supported, data does not match schema, corrupt or invalid data stream. Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (DataWriteRequest)
- Return type:
- createSmallFile(request)¶
Upload a new file into TRAC, sending the content as a single blob
Calling this method will create a new FILE object in the metadata store. Tag updates can be supplied when creating a FILE, they will be passed on to the metadata service. The semantics for tag updates are identical to the createObject() method in TracMetadataApi.
This is a unary method. The request must contain all the relevant fields and the entire content of the file in a single message. Errors may occur if the file is too large to fit in a single message frame.
Clients may specify the size of the file being created. When a size is supplied, TRAC will check the size against the number of bytes stored. If the stored file size does not match the supplied value, the error will be reported with an error status of DATA_LOSS. When no size is supplied the check cannot be performed.
The method returns the header of the newly created FILE object. Error conditions include: Invalid request, unknown tenant and validation failure, file too large and data loss (if the number of bytes stored does not match the number specified in the request). Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (FileWriteRequest)
- Return type:
- downloadFile(request)¶
Download a file as a data stream
This method is intended for use by browsers and other HTTP clients to download a file using an HTTP GET request.
- Parameters:
request (DownloadRequest)
- Return type:
- downloadLatestFile(request)¶
Download the latest version of a file as a data stream
This method is intended for use by browsers and other HTTP clients to download a file using an HTTP GET request.
- Parameters:
request (DownloadRequest)
- Return type:
- readDataset(request)¶
Read an existing dataset, returning the content as a data stream
This method reads the contents of an existing dataset and returns it in the requested format, along with a copy of the data schema. Data can be requested in any format supported by the platform.
The request uses a regular TagSelector to indicate which dataset and version to read. The format parameter is a mime type and must be a supported data format.
This is a server streaming method. The first message in the response stream will contain a schema definition for the dataset (this may come from an embedded schema or an external schema object). The second and subsequent messages will deliver the content of the dataset in the requested format. TRAC guarantees that the first message will always contain an empty chunk of content, which can be safely ignored.
Error conditions include: Invalid request, unknown tenant, object not found, format not supported. Storage errors may also be reported if there is a problem communicating with the underlying storage technology.
- Parameters:
request (DataReadRequest)
- Return type:
- readFile(request)¶
Download a file that has been stored in TRAC and return it as a data stream
The request uses a regular TagSelector to indicate which file to read. The semantics of the request are identical to the readObject() method in TracMetadataApi.
This is a server streaming method. The first message in the response stream will contain the response metadata (i.e. the file definition). The second and subsequent messages will deliver the content of the file as a stream of chunks (byte buffers). Empty chunks may be included at any point in the stream and should be ignored. In particular, TRAC guarantees that the chunk in the first message will always be an empty chunk. Clients are free to ignore this chunk, for example if they have a separate function for processing the first message in the response stream. Alternatively clients may process the empty chunk in the firs message in the same way as any other chunk. Both approaches are supported.
Error conditions include: Invalid request, unknown tenant, unknown object ID, object type does not match ID, unknown object version, unknown tag version. Storage errors may also be reported if there is a problem communicating with the underlying storage technology.
- Parameters:
request (FileReadRequest)
- Return type:
- readSmallDataset(request)¶
Read an existing dataset, returning the content as a single blob
This method reads the contents of an existing dataset and returns it in the requested format, along with a copy of the data schema. Data can be requested in any format supported by the platform.
The request uses a regular TagSelector to indicate which dataset and version to read. The format parameter is a mime type and must be a supported data format.
This is a unary call, both the schema and the content of the dataset are returned in a single response message. The content of the dataset will be encoded in the requested format. Errors may occur if the content of the dataset is too large to fit in a single message frame.
Error conditions include: Invalid request, unknown tenant, object not found, format not supported. Storage errors may also be reported if there is a problem communicating with the underlying storage technology.
- Parameters:
request (DataReadRequest)
- Return type:
- readSmallFile(request)¶
Download a file that has been stored in TRAC and return it as a single blob
The request uses a regular TagSelector to indicate which file to read. The semantics of the request are identical to the readObject() method in TracMetadataApi.
This is a unary method, the response will contain the file definition and the whole content of the file in a single message. Errors may occur if the file is too large to fit in a single message frame.
Error conditions include: Invalid request, unknown tenant, unknown object ID, object type does not match ID, unknown object version, unknown tag version, file too large. Storage errors may also be reported if there is a problem communicating with the underlying storage technology.
- Parameters:
request (FileReadRequest)
- Return type:
- updateDataset(request)¶
Update an existing dataset, supplying the schema and content as a data stream
This method updates an existing dataset and the corresponding DATA object in the TRAC metadata store. As per the TRAC immutability guarantee, the original version of the dataset is not altered. After an update, both the original version and the new version are available to use as inputs into a model runs and to read and query using the data API. Data can be supplied in any format supported by the platform.
Although large datasets can be uploaded using this call, data import jobs are normally used to bring in large volumes of data from external systems.
To update a dataset, the priorVersion field must indicate the dataset being updated. Only the latest version of a dataset can be updated.
The request must specify a schema for the new version of the dataset, incoming data will be verified against the schema. The new schema must be compatible with the schema of the previous version. Schemas can be specified using either:
A full schema definition - Datasets created using an embedded schema must supply a full schema for all subsequent versions and each schema version must be compatible with the version before. Fields may be added, but not removed or altered.
A schema ID - Datasets created using an external schema must use the same external schema ID for all subsequent versions. It is permitted for later versions of a dataset to use later versions of the external schema, but not earlier versions.
The “format” parameter describes the format used to upload data. For example, to upload a CSV file the format would be set to “text/csv” and the file content can be uploaded directly, or to upload the output of an editor grid in a web client the format can be set to “text/json” to upload a JSON representation of the editor contents. It is not necessary for different versions of the same dataset to be uploaded using the same format. TRAC will apply format conversion before the data is processed and stored.
Tag updates can be supplied to tag the new version of the dataset, they behave exactly the same as tag updates in the updateObject() call of TracMetadataApi.
This is a client streaming method. The first message in the request stream must contain all the request fields and metadata, including a schema specifier. After the first message all metadata fields should be omitted. Subsequent messages should contain the content of the dataset as a series of chunks, encoded as per the “format” field of the first message. Clients may choose whether or not to include a chunk of data in the first message and empty (i.e. zero-length) chunks are permitted at any point in the stream.
This method returns the header of the version of the DATA object. Error conditions include: Invalid request, unknown tenant, schema not found (if an external schema ID is used), schema version not compatible, format not supported, data does not match schema, corrupt or invalid data stream. Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (DataWriteRequest)
- Return type:
- updateFile(request)¶
Upload a new version of an existing file into TRAC, sending the content as a data stream
Calling this method will update the relevant FILE object in the metadata store. The latest version of the FILE must be supplied in the priorVersion field of the request. For example if the latest version of a FILE object is version 2, the priorVersion field should refer to version 2 and TRAC will create version 3 as a result of the update call. The metadata and content of prior versions remain unaltered. The file name may be changed between versions, but the extension and mime type must stay the same. Tag updates can be supplied when updating a FILE, they will be passed on to the metadata service> The semantics for tag updates are identical to the updateObject() method in TracMetadataApi.
This is a client streaming method. The first message in the request stream must contain all the request fields and required metadata. The second and subsequent messages should contain the content of the file as a series of byte buffers. (other fields that are set after the first message will be ignored). Empty chunks can be included at any point in the stream and will be ignored. Clients may choose to include the first chunk in the first message along with the request metadata, or to put an empty chunk in the first message and start streaming content in the second message. For very small files, it is possible to put the entire content in one chunk in the first message, so there is only a single message in the stream. All of these approaches are supported.
Clients may specify the size of the file being updated. When a size is supplied, TRAC will check the size against the number of bytes stored. If the stored file size does not match the supplied value, the error will be reported with an error status of DATA_LOSS. When no size is supplied the check cannot be performed.
The call returns the header for the new version of the FILE object. Error conditions include: Invalid request, unknown tenant, validation failure, failed preconditions (e.g. extension and mime type changes) and data loss (if the number of bytes stored does not match the number specified in the request). Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (FileWriteRequest)
- Return type:
- updateSmallDataset(request)¶
Update an existing dataset, supplying the schema and content as a single blob
This method updates an existing dataset and the corresponding DATA object in the TRAC metadata store. As per the TRAC immutability guarantee, the original version of the dataset is not altered. After an update, both the original version and the new version are available to use as inputs into a model runs and to read and query using the data API. Data can be supplied in any format supported by the platform.
To update a dataset, the priorVersion field must indicate the dataset being updated. Only the latest version of a dataset can be updated.
The request must specify a schema for the new version of the dataset, incoming data will be verified against the schema. The new schema must be compatible with the schema of the previous version. Schemas can be specified using either:
A full schema definition - Datasets created using an embedded schema must supply a full schema for all subsequent versions and each schema version must be compatible with the version before. Fields may be added, but not removed or altered.
A schema ID - Datasets created using an external schema must use the same external schema ID for all subsequent versions. It is permitted for later versions of a dataset to use later versions of the external schema, but not earlier versions.
The “format” parameter describes the format used to upload data. For example, to upload a CSV file the format would be set to “text/csv” and the file content can be uploaded directly, or to upload the output of an editor grid in a web client the format can be set to “text/json” to upload a JSON representation of the editor contents. It is not necessary for different versions of the same dataset to be uploaded using the same format. TRAC will apply format conversion before the data is processed and stored.
Tag updates can be supplied to tag the new version of the dataset, they behave exactly the same as tag updates in the updateObject() call of TracMetadataApi.
This is a unary call, all the request fields and metadata (including schema specifier) and dataset content encoded as per the “format” field are supplied in a single message. It is intended for working with small datasets and for use in environments where client streaming is not available (particularly in gRPC-Web clients).
This method returns the header of the version of the DATA object. Error conditions include: Invalid request, unknown tenant, schema not found (if an external schema ID is used), schema version not compatible, format not supported, data does not match schema, corrupt or invalid data stream. Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (DataWriteRequest)
- Return type:
- updateSmallFile(request)¶
Upload a new version of an existing file into TRAC, sending the content as a single blob
Calling this method will update the relevant FILE object in the metadata store. The latest version of the FILE must be supplied in the priorVersion field of the request. For example if the latest version of a FILE object is version 2, the priorVersion field should refer to version 2 and TRAC will create version 3 as a result of the update call. The metadata and content of prior versions remain unaltered. The file name may be changed between versions, but the extension and mime type must stay the same. Tag updates can be supplied when updating a FILE, they will be passed on to the metadata service> The semantics for tag updates are identical to the updateObject() method in TracMetadataApi.
This is a unary call. The request must contain all the relevant fields and the entire content of the file in a single message. Errors may occur if the file is too large to fit in a single message frame.
Clients may specify the size of the file being updated. When a size is supplied, TRAC will check the size against the number of bytes stored. If the stored file size does not match the supplied value, the error will be reported with an error status of DATA_LOSS. When no size is supplied the check cannot be performed.
The call returns the header for the new version of the FILE object. Error conditions include: Invalid request, unknown tenant, validation failure, failed preconditions (e.g. extension and mime type changes) file too large and data loss (if the number of bytes stored does not match the number specified in the request). Storage errors may also be reported if there is a problem communicating with the underlying storage technology. In the event of an error, TRAC will do its best to clean up any partially-written data in the storage layer.
- Parameters:
request (FileWriteRequest)
- Return type: