How Can We Help?
Pipelines API
Cortex’s Pipelines API provides programmatic access to any Machine Learning Pipeline built in your Cortex account. The API allows you to build automated workflows for managing, updating, and deploying your ML Pipelines. This means smoother integrations between Cortex and your business.
The following sections describe the various requests that may be made to the Pipelines API. Each description includes the optional and required parameters that should be submitted with the request.
All API requests should be made to api-us.vidora.com, using the methods listed below. Note that authentication is required for each of these methods.
Manage your ML Pipelines
The below methods allow you to list, filter, and fetch details for the ML pipelines that have been built in your Cortex account. This makes it easy for you to manage up to hundreds of unique pipelines.
List ML Pipelines
Returns an array of pipelines built in your Cortex account, ordered by how recently each pipeline was run. Optionally, the response can be filtered by the parameters listed below.
Request
Method | URL |
GET | /v1/api/pipelines |
Parameters
Param | Required? | Type | Description |
recurring | No | Boolean | Whether or not the pipeline is set to run on a recurring schedule. |
active | No | Boolean | Whether or not the pipeline is currently active. |
Example Request
http://api-us.vidora.com/v1/api/pipelines?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
[ { id: "f90819314266a344", # Unique ID for the pipeline name: "Sample Classification Pipeline", # Name of the pipeline type: "Classification", # Pipeline type active: true, # Whether the pipeline is active recurring: true, # Whether it runs on recurring schedule status: "complete", # Status ("running|complete|failed") created_at: "2020-04-05T00:00:00", # When the pipeline was created last_trained_at: "2020-04-05T05:00:00", # When the pipeline last trained last_prediction_at: "2020-04-05T05:00:00". # When the pipeline last made predictions }, { id: "f8d544d30e43a550", name: "Sample Look Alike Pipeline", type: "Look Alike", active: true, recurring: true, status: "running", created_at: "2020-04-05T00:00:00", last_trained_at: "2020-04-05T05:00:00", last_prediction_at: "2020-04-05T05:00:00" } ]
The above example shows a signed GET request to return all pipelines. The response indicates that there are two pipelines in the account, and includes details for each pipeline such as id, name, type, and more.
Get ML Pipeline
Returns details for a given pipeline.
Request
Method | URL |
GET | /v1/api/pipelines/<PIPELINE_ID> |
Parameters
Param | Required? | Type | Description |
pipeline_id | Yes | string | Unique ID for the pipeline. |
Example Request
http://api-us.vidora.com/v1/api/pipelines/<PIPELINE_ID>?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
{ id: "f90819314266a344", # Unique ID for the pipeline name: "Sample Classification Pipeline", # Name of the pipeline type: "Classification", # Pipeline type active: true, # Whether the pipeline is active recurring: true, # Whether it runs on recurring schedule status: "complete", # Status ("running|complete|failed") created_at: "2020-04-05T00:00:00", # When the pipeline was created last_trained_at: "2020-04-05T05:00:00", # When the pipeline last trained last_prediction_at: "2020-04-05T05:00:00" # When the pipeline last made predictions active_label_set_id: "j8daw02sz57hjwze" # Unique ID for the active label set }
The above example shows a signed GET request to fetch details for pipeline <PIPELINE_ID>. If the pipeline requires uploaded labels (Classification, Look Alike, or Regression), these details will include a unique ID for the most recently uploaded label set assigned to the pipeline.
Update your ML Pipelines
The below methods allow you to update details of an existing pipeline. Most notably, you can add a new set of labels to trigger an automatic retraining of any Classification, Look Alike, or Regression pipeline. This process involves a POST request to create a new label set (Create Label Set), and a PUT request to assign that label set to a specific pipeline (Update Pipeline).
Create Label Set
Creates a new label set. Once a label set has been successfully validated, you may assign it to a pipeline (via the Update Pipeline method) in order to automatically begin retraining that pipeline with the new labels.
Request
Method | URL |
POST | /v1/api/label_sets |
Parameters
Param | Required? | Type | Description | ||||||
label_type | Yes | string | Pipeline type that the labels are intended for. Valid values are “classification”, “look alike”, or “regression”. | ||||||
name | Yes | string | A descriptive name for the label set. | ||||||
labels | Yes | file | A file containing the training labels. Uploaded files are required to be in CSV format (optionally gzipped), with an id column (string) and a label column (boolean if label_type is “classification” or “look alike”, float if label_type is “regression”). Sample File Formats:
|
||||||
start_date | No | date | Earliest date for which the uploaded labels are valid (e.g. “2020-02-01”). Default value is 90 days from today. Along with end_date, this value will define the training window for your pipeline. | ||||||
end_date | No | date | Latest date for which the uploaded labels are valid (e.g. “2020-04-01”). Default value is today. Along with start_date, this value will define the training window for your pipeline. | ||||||
ancestor_set_id | No | string | The ID of another label set to compare labels with. If your new label set is meant to refresh a previously-defined label set, it is useful to pass in an ancestor_set_id to ensure at least 80% of your labels overlap. |
Example Request
Creating a new label set requires uploading a file, which must be signed in the POST body like any other Cortex API request. However, since you’re uploading a file, you cannot send JSON as the payload. Instead, the file must be signed and sent as form data. Below is an example which shows how to sign the request.
Payload of Form-Data
Just like signatures for any other Cortex API request, your POST body needs to be included in the signature generation process. The POST body for this request is sent as form-data, which requires a boundary to be defined between each parameter. In this example, we’ll use –BOUNDARY as the defined boundary. If we were to send a common CSV as the labels file, the payload would look like the following:
body = "--BOUNDARY\r\n" \ "Content-Disposition: form-data; name=\"labels\"; filename=\"my_file.csv\"\r\n" \ "Content-Type: text/csv\r\n" \ "\r\n" \ "#{file.read}\r\n" \ "--BOUNDARY\r\n" \
A few things to note about the above:
- To end the form-data, append “—” at the end of your boundary. In this example, the end to the form-data is defined by –BOUNDARY.
- The Content-Disposition is form-data.
- The file being uploaded is a common CSV, so the Content-Type is text/csv.
- The file contents are read directly into the payload so it can be signed. In this example, file contents are read using Ruby code.
- IMPORTANT: the form data must have the correct line breaks which includes both \r and \n.
Sending a gzipped file
It’s likely that you will want to gzip your labels file to speed up the upload. If sending a gzipped file, the content type must be set to application/gzip or application/x-gzip like the below:
body = "--BOUNDARY\r\n" \ "Content-Disposition: form-data; name=\"labels\"; filename=\"my_file.csv.gz\"\r\n" \ "Content-Type: application/gzip\r\n" \ "\r\n" \ "#{file.read}\r\n" \ "--BOUNDARY--\r\n" \
Adding more params to the payload
When adding other parameters to the request, you can either put them in the form data or append them in the URL. Below is an example of how you would add more parameters to the form data. Note that each parameter is specified using the –BOUNDARY from above.
body = "--BOUNDARY\r\n" \ "Content-Disposition: form-data; name=\"labels\"; filename=\"my_file.csv\"\r\n" \ "Content-Type: text/csv\r\n" \ "\r\n" \ "#{file.read}\r\n" \ "--BOUNDARY\r\n" \ "Content-Disposition: form-data; name=\"label_type\"\r\n" \ "\r\n" \ "classification\r\n" \ "--BOUNDARY\r\n" \ "Content-Disposition: form-data; name=\"name\"\r\n" \ "\r\n" \ "My Classification Label Set\r\n" \ "--BOUNDARY--\r\n" \
If you were to append the additional parameters to the URL, they would look like the below. Keep in mind that the labels file would still be sent in the payload as form-data.
POST https://api-us.vidora.com/v1/api/label_sets?label_type=classification&name=My%20Classification%20Label%20Set
Sending a gzipped file
Please see the example code in your Cortex account for how to generate a signature for your upload. Like any signed request, you must join the following to create the signature:
- Secret Key
- Method
- Path
- URL Params
- Body
Sending the request
Since label uploads are sent as form-data, you cannot send a request header of Content-Type: application/json. Instead, it must be sent as form-data that defines the boundary. An example request header for the examples above would specify the following:
"Content-Type": "multipart/form-data; boundary=BOUNDARY"
Ruby gzip example
The below example shows how to create a label set with a gzipped file using Ruby. Ask your account manager for examples using other languages (e.g. Bash).
require "digest/sha2" require "base64" require "active_support/time" require "rest-client" # Function for generating signatures def generate_signature(secret, http_method, request_path, params = {}, body = nil) string_to_sign = [ secret, http_method, request_path, params.sort { |pair1, pair2| pair1[0] <=> pair2[0] }.map { |k, v| "#{k}=#{v}" }.join("&"), body, ].join("\n") Base64.strict_encode64(Digest::SHA256.digest(string_to_sign))[0, 43].chomp("=") end # Access your labels file on disk and create body labels_file = File.new("<GZIPPED_FILE>") body = "--BOUNDARY\r\nContent-Disposition: form-data; name=\"labels\"; filename=\"<GZIPPED_FILE>\"\r\n" \ "Content-Type: application/gzip\r\n\r\n#{labels_file.read}\r\n--BOUNDARY--\r\n" params = { api_key: "<API_KEY>", expires: (Time.now.utc + 2.days.to_i).strftime("%Y-%m-%dT%H:%M"), name: "<LABEL_SET_NAME>", label_type: "classification", } secret = "<API_SECRET>" params["signature"] = generate_signature(secret, "POST", "/v1/api/label_sets", params, body) url = "https://api-us.vidora.com/v1/api/label_sets?#{params.to_query}" request_headers = { "Content-Type": "multipart/form-data; boundary=BOUNDARY" } response = RestClient.post(url, body, request_headers) puts response
Example Response
{ id: "0tbeb8mg9ljyod4h", # Unique ID for the label set name: "My New Label Set", # Name of the label set status: "validating", # Status ("validating|validated|failed") label_type: "classification", # Type of pipeline the labels will be applied to filename: "subscribers-data-2019.csv", # Name of the uploaded file updated_at: "2020-04-05T00:00:00", # When the label set was last updated start_date: "2020-02-01", # Earliest date for which the labels are valid end_date: "2020-04-01" # Last date for which the labels are valid }
The response includes details about the new label set, including a status field which indicates that the set is currently being validated. When the status changes to “validated”, you may assign the label set to a pipeline via the Update Pipeline method. If the status shows “failed”, an error field in the response will show more information about the validation error.
Get Label Set
Returns details for a given label set. This is useful for checking the status of a label set after you’ve created it.
Request
Method | URL |
GET | /v1/api/label_sets/<LABEL_SET_ID> |
Parameters
Param | Required? | Type | Description |
label_set_id | Yes | string | Unique ID for the label set. |
Example Request
http://api-us.vidora.com/v1/api/label_sets/<LABEL_SET_ID>?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
{ id: "0tbeb8mg9ljyod4h", # Unique ID for the label set name: "My New Label Set", # Name of the label set status: "validated", # Status ("validating|validated|failed") type: "classification", # Type of pipeline the labels will be applied to filename: "subscribers-data-2019.csv", # Name of the uploaded file updated_at: "2020-04-05T00:00:00", # When the label set was last updated start_date: "2020-02-01", # Earliest date for which the labels are valid end_date: "2020-04-01", # Last date for which the labels are valid download_url: "http://s3...", # URL to download the file errors: [], # Array of validation error messages, if any warnings: [] # Array of validation warning messages, if any }
The above example shows a signed GET request to fetch details for label set <LABEL_SET_ID>. The response includes details such as the set’s name, label_type, and more.
Update Label Set
Updates details for a given label set. A label set may only be updated if it has not yet been PUT to a pipeline. If the label set has already been assigned to a pipeline, you should POST a new label set instead.
Request
Method | URL |
PUT | /v1/api/label_sets/<LABEL_SET_ID> |
Parameters
Param | Required? | Type | Description | ||||||
label_set_id | Yes | string | Unique ID for the label set that you would like to update. | ||||||
label_type | Yes | string | Pipeline type that the labels are intended for. Valid values are “classification”, “look alike”, or “regression”. | ||||||
name | Yes | string | A descriptive name for the label set. | ||||||
labels | Yes | file | A file containing the training labels. Uploaded files are required to be in CSV format (optionally gzipped), with an id column (string) and a label column (boolean if label_type is “classification” or “look alike”, float if label_type is “regression”). Sample File Format:
|
||||||
start_date | No | date | Earliest date for which the uploaded labels are valid (e.g. “2020-02-01”). Default value is 90 days from today. Along with end_date, this value will define the training window for your pipeline. | ||||||
end_date | No | date | Latest date for which the uploaded labels are valid (e.g. “2020-04-01”). Default value is today. Along with start_date, this value will define the training window for your pipeline. | ||||||
ancestor_set_id | No | string | The ID of another label set to compare labels with. If your new label set is meant to refresh a previously-defined label set, it is useful to pass in an ancestor_set_id to ensure at least 80% of your labels overlap. |
Example Request
PUT http://api-us.vidora.com/v1/api/label_sets/<LABEL_SET_ID>?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE> Content-Type: application/json { "start_date": "2020-03-01" }>
Example Response
{ id: "0tbeb8mg9ljyod4h", # Unique ID for the label set name: "My New Label Set", # Name of the label set status: "validating", # Status ("validating|validated|failed") label_type: "classification", # Type of pipeline the labels will be applied to filename: "subscribers-data-2019.csv", # Name of the uploaded file updated_at: "2020-04-05T00:00:00", # When the label set was last updated start_date: "2020-03-01", # Earliest date for which the labels are valid end_date: "2020-04-01" # Last date for which the labels are valid }
The above example shows a signed PUT request to update the start date for label set <LABEL_SET_ID>. The response includes details for the label set such as its name, status, and more.
List Label Sets
Returns an array of label sets. Optionally, the response can be filtered by the parameters listed below.
Request
Method | URL |
GET | /v1/api/label_sets |
Parameters
Param | Required? | Value | Description |
status | No | string | Status of the label set (“validating”, “validated”, or “failed”). |
label_type | No | integer | Pipeline type that the labels are intended for (“classification”, “look alike”, or “regression”). |
Example Request
http://api-us.vidora.com/v1/api/label_sets?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
[ { id: "0tbeb8mg9ljyod4h", # Unique ID for the label set name: "My New Label Set", # Name of the label set status: "validating", # Status ("validating|validated|failed") label_type: "classification", # Type of pipeline the labels will be applied to filename: "subscribers-data-2019.csv", # Name of the uploaded file updated_at: "2020-04-05T00:00:00" # When the label set was last updated }, { id: "p54avfxnd8soj96g", name: "Look Alike Label Set", status: "validated", label_type: "look alike", filename: "survey-data-2020.csv", updated_at: "2020-04-07T00:00:00" } ]
The above example shows a signed GET request to return all label sets in the account. The response indicates that there are two label sets, and includes details such as each set’s id, name, label_type, and more.
Update Pipeline
Updates details for a given pipeline. Editable information includes the pipeline’s name, and its active label set. If the active label set is updated, the pipeline will automatically begin retraining using the new label set.
Request
Method | URL |
PUT | /v1/api/pipelines/<PIPELINE_ID> |
Parameters
Param | Required? | Value | Description |
pipeline_id | Yes | string | Unique ID for the pipeline. |
name | No | string | Edits the name of the pipeline. |
active_label_set_id | No | integer | Sets the active label set for the pipeline and triggers an automatic retraining. This option applies only to Classification, Look Alike, and Regression pipelines. |
Example Request
PUT http://api-us.vidora.com/v1/api/pipelines/<PIPELINE_ID>?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE> Content-Type: application/json { "active_label_set_id": "0tbeb8mg9ljyod4h" }
Example Response
{ id: "f90819314266a344", # Unique ID for the pipeline name: "Sample Classification Pipeline", # Name of the pipeline type: "Classification", # Pipeline type active: true, # Whether the pipeline is active recurring: true, # Whether it runs on recurring schedule status: "running" # Status ("running|complete|failed") created_at: "2020-04-05T00:00:00", # When the pipeline was created last_trained_at: "2020-04-05T05:00:00", # When the pipeline last trained last_prediction_at: "2020-04-05T05:00:00", # When the pipeline last made predictions active_label_set_id: "0tbeb8mg9ljyod4h" # Unique ID for the active label set }
The above example shows a signed PUT request to update the active label sets for pipeline <PIPELINE_ID>. The response includes details for the pipeline such as its name, type, status, and more.
Deploying Pipeline Predictions
The below methods allow you to access prediction files that you have exported from any existing pipeline. Most notably, you can fetch a download link for a given set of exported predictions, allowing you to deploy any pipeline in an automated workflow.
List Prediction Exports for a Pipeline
Returns an array of prediction exports for a given pipeline.
The ID for a pipeline can be fetched by making a request to the List ML Pipelines method.
Request
Method | URL |
GET | /v1/api/pipelines/<PIPELINE_ID>/prediction_exports |
Parameters
Param | Required? | Value | Description |
pipeline_id | Yes | string | Unique ID for the pipeline. |
recurring | No | boolean | Whether the prediction export is set to run on a recurring schedule. |
exported_since | No | integer | A Unix timestamp which limits the response to Prediction Exports that have been exported at or after this point in time. |
Example Request
http://api-us.vidora.com/v1/api/pipelines/<PIPELINE_ID>/prediction_exports?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
[ { id: "vrhmjwf10b66zl9k", # Unique ID for the prediction export name: "Sample Recurring Export", # Name of the prediction export status: "complete" # Status ("exporting|complete|failed") }, { id: "q86yqblw2ejp39bd", name: "Sample One-Time Export", status: "exporting" } ]
The above example shows a signed GET request to return all prediction exports for pipeline <PIPELINE_ID>. The response indicates that there are two such exports, and includes details for each one such as its name and status.
Get Prediction Export
Returns details for a given prediction export from a given pipeline.
The ID for a pipeline can be fetched by making a request to the List ML Pipelines method. The ID for a prediction export can be fetched by making a request to the List Prediction Exports for a Pipeline method.
Request
Method | URL |
GET | /v1/api/pipelines/<PIPELINE_ID>/prediction_exports/<EXPORT_ID> |
Parameters
Param | Required? | Value | Description |
pipeline_id | Yes | string | Unique ID for the pipeline. |
export_id | Yes | string | Unique ID for the prediction export. |
recurring | No | boolean | Whether the prediction export is set to run on a recurring schedule. |
Example Request
http://api-us.vidora.com/v1/api/pipelines/<PIPELINE_ID>/prediction_exports/<EXPORT_ID>?api_key=<YOUR_KEY>&expires=2020-06-01T00%3A00&signature=<YOUR_SIGNATURE>
Example Response
{
id: "vrhmjwf10b66zl9k", # Unique ID for the prediction export
name: "Sample Recurring Export", # Name of the prediction export
recurring: true # Whether the export runs on a recurring basis
status: "complete" # Status ("exporting|complete|failed")
last_exported: "2020-04-05T00:00:00", # When the export last ran
columns: ["User ID", "Conversion Probability"], # Columns included in the exported file
total: 2581, # Number of predictions exported
download_url: "http://s3..." # Download URL for accessing the exported file
}
The above example shows a signed GET request to fetch details for prediction export <EXPORT_ID> from pipeline <PIPELINE_ID>. The response includes details for the export such as its name, number of predictions, and a link to download the exported file.
Response Codes
Response | Description |
200 OK | The request was successful. |
400 Bad Request | The request was invalid, possibly due to malformed parameters. |
401 Unauthorized | The api_key and/or signature was invalid. |
404 Not Found | The id used in the request was not found or the request URI does not exist. |
500 Internal Error | There was a server side error, and we cannot serve the request at the current time. |
Related Links
Still have questions? Reach out to support@mparticle.com for more info!