create(projectId, body, x__xgafv=None)
Creates a Cloud Dataflow job from a template.
get(projectId, gcsPath=None, location=None, x__xgafv=None, view=None)
Get the template associated with a template.
Launch a template.
create(projectId, body, x__xgafv=None)
Creates a Cloud Dataflow job from a template. Args: projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required) body: object, The request body. (required) The object takes the form of: { # A request to create a Cloud Dataflow job from a template. "environment": { # The environment values to set at runtime. # The runtime environment for the job. "machineType": "A String", # The machine type to use for the job. Defaults to the value from the # template if not specified. "network": "A String", # Network to which VMs will be assigned. If empty or unspecified, # the service will use the network "default". "zone": "A String", # The Compute Engine [availability # zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones) # for launching worker instances to run your pipeline. "additionalUserLabels": { # Additional user labels to be specified for the job. # Keys and values should follow the restrictions specified in the [labeling # restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions) # page. "a_key": "A String", }, "additionalExperiments": [ # Additional experiment flags for the job. "A String", ], "bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory. # Use with caution. "tempLocation": "A String", # The Cloud Storage path to use for temporary files. # Must be a valid Cloud Storage URL, beginning with `gs://`. "serviceAccountEmail": "A String", # The email address of the service account to run the job as. "numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job. "maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made # available to your pipeline during execution, from 1 to 1000. "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of # the form "regions/REGION/subnetworks/SUBNETWORK". }, "gcsPath": "A String", # Required. A Cloud Storage path to the template from which to # create the job. # Must be a valid Cloud Storage URL, beginning with `gs://`. "location": "A String", # The [regional endpoint] # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to # which to direct the request. "parameters": { # The runtime parameters to pass to the job. "a_key": "A String", }, "jobName": "A String", # Required. The job name to use for the created job. } x__xgafv: string, V1 error format. Allowed values 1 - v1 error format 2 - v2 error format Returns: An object of the form: { # Defines a job to be run by the Cloud Dataflow service. "labels": { # User-defined labels for this job. # # The labels map can contain no more than 64 entries. Entries of the labels # map are UTF8 strings that comply with the following restrictions: # # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62} # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63} # * Both keys and values are additionally constrained to be <= 128 bytes in # size. "a_key": "A String", }, "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs # by the metadata values provided here. Populated for ListJobs and all GetJob # views SUMMARY and higher. # ListJob response and Job SUMMARY view. "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job. "versionDisplayName": "A String", # A readable string describing the version of the SDK. "version": "A String", # The version of the SDK used to run the job. "sdkSupportStatus": "A String", # The support status for this SDK version. }, "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job. { # Metadata for a PubSub connector used by the job. "topic": "A String", # Topic accessed in the connection. "subscription": "A String", # Subscription used in the connection. }, ], "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job. { # Metadata for a Datastore connector used by the job. "projectId": "A String", # ProjectId accessed in the connection. "namespace": "A String", # Namespace used in the connection. }, ], "fileDetails": [ # Identification of a File source used in the Dataflow job. { # Metadata for a File connector used by the job. "filePattern": "A String", # File Pattern used to access files by the connector. }, ], "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job. { # Metadata for a Spanner connector used by the job. "instanceId": "A String", # InstanceId accessed in the connection. "projectId": "A String", # ProjectId accessed in the connection. "databaseId": "A String", # DatabaseId accessed in the connection. }, ], "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job. { # Metadata for a BigTable connector used by the job. "instanceId": "A String", # InstanceId accessed in the connection. "projectId": "A String", # ProjectId accessed in the connection. "tableId": "A String", # TableId accessed in the connection. }, ], "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job. { # Metadata for a BigQuery connector used by the job. "projectId": "A String", # Project accessed in the connection. "dataset": "A String", # Dataset accessed in the connection. "table": "A String", # Table accessed in the connection. "query": "A String", # Query used to access data in the connection. }, ], }, "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time. # A description of the user pipeline and stages through which it is executed. # Created by Cloud Dataflow service. Only retrieved with # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL. # form. This data is provided by the Dataflow service for ease of visualizing # the pipeline and interpreting Dataflow provided metrics. "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them. { # Description of the type, names/ids, and input/outputs for a transform. "kind": "A String", # Type of transform. "name": "A String", # User provided name for this transform instance. "inputCollectionName": [ # User names for all collection inputs to this transform. "A String", ], "displayData": [ # Transform-specific display data. { # Data provided with a pipeline or transform to provide descriptive info. "shortStrValue": "A String", # A possible additional shorter value to display. # For example a java_class_name_value of com.mypackage.MyDoFn # will be stored with MyDoFn as the short_str_value and # com.mypackage.MyDoFn as the java_class_name value. # short_str_value can be displayed and java_class_name_value # will be displayed as a tooltip. "durationValue": "A String", # Contains value if the data is of duration type. "url": "A String", # An optional full URL. "floatValue": 3.14, # Contains value if the data is of float type. "namespace": "A String", # The namespace for the key. This is usually a class name or programming # language namespace (i.e. python module) which defines the display data. # This allows a dax monitoring system to specially handle the data # and perform custom rendering. "javaClassValue": "A String", # Contains value if the data is of java class type. "label": "A String", # An optional label to display in a dax UI for the element. "boolValue": True or False, # Contains value if the data is of a boolean type. "strValue": "A String", # Contains value if the data is of string type. "key": "A String", # The key identifying the display data. # This is intended to be used as a label for the display data # when viewed in a dax monitoring system. "int64Value": "A String", # Contains value if the data is of int64 type. "timestampValue": "A String", # Contains value if the data is of timestamp type. }, ], "outputCollectionName": [ # User names for all collection outputs to this transform. "A String", ], "id": "A String", # SDK generated id of this transform instance. }, ], "executionPipelineStage": [ # Description of each stage of execution of the pipeline. { # Description of the composing transforms, names/ids, and input/outputs of a # stage of execution. Some composing transforms and sources may have been # generated by the Dataflow service during execution planning. "componentSource": [ # Collections produced and consumed by component transforms of this stage. { # Description of an interstitial value between transforms in an execution # stage. "userName": "A String", # Human-readable name for this transform; may be user or system generated. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. "name": "A String", # Dataflow service generated name for this source. }, ], "kind": "A String", # Type of tranform this stage is executing. "name": "A String", # Dataflow service generated name for this stage. "outputSource": [ # Output sources for this stage. { # Description of an input or output of an execution stage. "userName": "A String", # Human-readable name for this source; may be user or system generated. "sizeBytes": "A String", # Size of the source, if measurable. "name": "A String", # Dataflow service generated name for this source. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. }, ], "inputSource": [ # Input sources for this stage. { # Description of an input or output of an execution stage. "userName": "A String", # Human-readable name for this source; may be user or system generated. "sizeBytes": "A String", # Size of the source, if measurable. "name": "A String", # Dataflow service generated name for this source. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. }, ], "componentTransform": [ # Transforms that comprise this execution stage. { # Description of a transform executed as part of an execution stage. "userName": "A String", # Human-readable name for this transform; may be user or system generated. "originalTransform": "A String", # User name for the original user transform with which this transform is # most closely associated. "name": "A String", # Dataflow service generated name for this source. }, ], "id": "A String", # Dataflow service generated id for this stage. }, ], "displayData": [ # Pipeline level display data. { # Data provided with a pipeline or transform to provide descriptive info. "shortStrValue": "A String", # A possible additional shorter value to display. # For example a java_class_name_value of com.mypackage.MyDoFn # will be stored with MyDoFn as the short_str_value and # com.mypackage.MyDoFn as the java_class_name value. # short_str_value can be displayed and java_class_name_value # will be displayed as a tooltip. "durationValue": "A String", # Contains value if the data is of duration type. "url": "A String", # An optional full URL. "floatValue": 3.14, # Contains value if the data is of float type. "namespace": "A String", # The namespace for the key. This is usually a class name or programming # language namespace (i.e. python module) which defines the display data. # This allows a dax monitoring system to specially handle the data # and perform custom rendering. "javaClassValue": "A String", # Contains value if the data is of java class type. "label": "A String", # An optional label to display in a dax UI for the element. "boolValue": True or False, # Contains value if the data is of a boolean type. "strValue": "A String", # Contains value if the data is of string type. "key": "A String", # The key identifying the display data. # This is intended to be used as a label for the display data # when viewed in a dax monitoring system. "int64Value": "A String", # Contains value if the data is of int64 type. "timestampValue": "A String", # Contains value if the data is of timestamp type. }, ], }, "stageStates": [ # This field may be mutated by the Cloud Dataflow service; # callers cannot mutate it. { # A message describing the state of a particular execution stage. "executionStageName": "A String", # The name of the execution stage. "executionStageState": "A String", # Executions stage states allow the same set of values as JobState. "currentStateTime": "A String", # The time at which the stage transitioned to this state. }, ], "id": "A String", # The unique ID of this job. # # This field is set by the Cloud Dataflow service when the Job is # created, and is immutable for the life of the job. "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in # `JOB_STATE_UPDATED`), this field contains the ID of that job. "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to. "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the # corresponding name prefixes of the new job. "a_key": "A String", }, "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job. "version": { # A structure describing which components and their versions of the service # are required in order to run the job. "a_key": "", # Properties of the object. }, "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in. "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data # at rest, AKA a Customer Managed Encryption Key (CMEK). # # Format: # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY "internalExperiments": { # Experimental settings. "a_key": "", # Properties of the object. Contains field @type with type URL. }, "dataset": "A String", # The dataset for the current project where various workflow # related tables are stored. # # The supported resource type is: # # Google BigQuery: # bigquery.googleapis.com/{dataset} "experiments": [ # The list of experiments to enable. "A String", ], "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account. "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These # options are passed through the service and are used to recreate the # SDK pipeline options on the worker in a language agnostic and platform # independent way. "a_key": "", # Properties of the object. }, "userAgent": { # A description of the process that generated the request. "a_key": "", # Properties of the object. }, "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or # unspecified, the service will attempt to choose a reasonable # default. This should be in the form of the API service name, # e.g. "compute.googleapis.com". "workerPools": [ # The worker pools. At least one "harness" worker pool must be # specified in order for the job to have workers. { # Describes one particular pool of Cloud Dataflow workers to be # instantiated by the Cloud Dataflow service in order to perform the # computations required by a job. Note that a workflow job may use # multiple pools, in order to match the various computational # requirements of the various stages of the job. "diskSourceImage": "A String", # Fully qualified source image for disks. "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when # using the standard Dataflow task runner. Users should ignore # this field. "workflowFileName": "A String", # The file to store the workflow in. "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs # will not be uploaded. # # The supported resource type is: # # Google Cloud Storage: # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "commandlinesFileName": "A String", # The file to store preprocessing commands in. "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness. "reportingEnabled": True or False, # Whether to send work progress updates to the service. "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example, # "shuffle/v1beta1". "workerId": "A String", # The ID of the worker running this pipeline. "baseUrl": "A String", # The base URL for accessing Google Cloud APIs. # # When workers access Google Cloud APIs, they logically do so via # relative URLs. If this field is specified, it supplies the base # URL to use for resolving these relative URLs. The normative # algorithm used is defined by RFC 1808, "Relative Uniform Resource # Locators". # # If not specified, the default value is "http://www.googleapis.com/" "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example, # "dataflow/v1b3/projects". "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary # storage. # # The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} }, "vmId": "A String", # The ID string of the VM. "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories. "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit. "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to # access the Cloud Dataflow API. "A String", ], "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by # taskrunner; e.g. "root". "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs. # # When workers access Google Cloud APIs, they logically do so via # relative URLs. If this field is specified, it supplies the base # URL to use for resolving these relative URLs. The normative # algorithm used is defined by RFC 1808, "Relative Uniform Resource # Locators". # # If not specified, the default value is "http://www.googleapis.com/" "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by # taskrunner; e.g. "wheel". "languageHint": "A String", # The suggested backend language. "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial # console. "streamingWorkerMainClass": "A String", # The streaming worker main class name. "logDir": "A String", # The directory on the VM to store logs. "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3" "harnessCommand": "A String", # The command to launch the worker harness. "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for # temporary storage. # # The supported resource type is: # # Google Cloud Storage: # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr. }, "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle` # are supported. "packages": [ # Packages to be installed on workers. { # The packages that must be installed in order for a worker to run the # steps of the Cloud Dataflow job that will be assigned to its worker # pool. # # This is the mechanism by which the Cloud Dataflow SDK causes code to # be loaded onto the workers. For example, the Cloud Dataflow Java SDK # might use this to install jars containing the user's code and all of the # various dependencies (libraries, data files, etc.) required in order # for that code to run. "location": "A String", # The resource to read the package from. The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket} # bucket.storage.googleapis.com/ "name": "A String", # The name of the package. }, ], "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the # service will attempt to choose a reasonable default. "network": "A String", # Network to which VMs will be assigned. If empty or unspecified, # the service will use the network "default". "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service # will attempt to choose a reasonable default. "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will # attempt to choose a reasonable default. "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool. # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and # `TEARDOWN_NEVER`. # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn # down. # # If the workers are not torn down by the service, they will # continue to run and use Google Compute Engine VM resources in the # user's project until they are explicitly terminated by the user. # Because of this, Google recommends using the `TEARDOWN_ALWAYS` # policy except for small, manually supervised test jobs. # # If unknown or unspecified, the service will attempt to choose a reasonable # default. "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google # Compute Engine API. "ipConfiguration": "A String", # Configuration for VM IPs. "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the # service will choose a number of threads (according to the number of cores # on the selected machine type for batch, or 1 by convention for streaming). "poolArgs": { # Extra arguments for this worker pool. "a_key": "", # Properties of the object. Contains field @type with type URL. }, "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to # execute the job. If zero or unspecified, the service will # attempt to choose a reasonable default. "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker # harness, residing in Google Container Registry. "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of # the form "regions/REGION/subnetworks/SUBNETWORK". "dataDisks": [ # Data disks that are used by a VM in this workflow. { # Describes the data disk used by a workflow job. "mountPoint": "A String", # Directory in a VM where disk is mounted. "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will # attempt to choose a reasonable default. "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This # must be a disk type appropriate to the project and zone in which # the workers will run. If unknown or unspecified, the service # will attempt to choose a reasonable default. # # For example, the standard persistent disk type is a resource name # typically ending in "pd-standard". If SSD persistent disks are # available, the resource name typically ends with "pd-ssd". The # actual valid values are defined the Google Compute Engine API, # not by the Cloud Dataflow API; consult the Google Compute Engine # documentation for more information about determining the set of # available disk types for a particular project and zone. # # Google Compute Engine Disk types are local to a particular # project in a particular zone, and so the resource name will # typically look something like this: # # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard }, ], "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool. "maxNumWorkers": 42, # The maximum number of workers to cap scaling at. "algorithm": "A String", # The algorithm to use for autoscaling. }, "defaultPackageSet": "A String", # The default package set to install. This allows the service to # select a default set of packages which are useful to worker # harnesses written in a particular language. "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will # attempt to choose a reasonable default. "metadata": { # Metadata to set on the Google Compute Engine VMs. "a_key": "A String", }, }, ], "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary # storage. The system will append the suffix "/temp-{JOBNAME} to # this resource prefix, where {JOBNAME} is the value of the # job_name field. The resulting bucket and object prefix is used # as the prefix of the resources used to store temporary data # needed during the job execution. NOTE: This will override the # value in taskrunner_settings. # The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} }, "location": "A String", # The [regional endpoint] # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that # contains this job. "tempFiles": [ # A set of files the system should be aware of that are used # for temporary storage. These temporary files will be # removed on job completion. # No duplicates are allowed. # No file patterns are supported. # # The supported files are: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "A String", ], "type": "A String", # The type of Cloud Dataflow job. "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts. # If this field is set, the service will ensure its uniqueness. # The request to create a job will fail if the service has knowledge of a # previously submitted job with the same client's ID and job name. # The caller may use this field to ensure idempotence of job # creation across retried attempts to create a job. # By default, the field is empty and, in that case, the service ignores it. "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given # snapshot. "stepsLocation": "A String", # The GCS location where the steps are stored. "currentStateTime": "A String", # The timestamp associated with the current state. "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING). # Flexible resource scheduling jobs are started with some delay after job # creation, so start_time is unset before start and is updated when the # job is started by the Cloud Dataflow service. For other jobs, start_time # always equals to create_time and is immutable and set by the Cloud Dataflow # service. "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the # Cloud Dataflow service. "requestedState": "A String", # The job's requested state. # # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may # also be used to directly set a job's requested state to # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the # job if it has not already reached a terminal state. "name": "A String", # The user-specified Cloud Dataflow job name. # # Only one Job with a given name may exist in a project at any # given time. If a caller attempts to create a Job with the same # name as an already-existing Job, the attempt returns the # existing Job. # # The name must match the regular expression # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?` "steps": [ # Exactly one of step or steps_location should be specified. # # The top-level steps that constitute the entire job. { # Defines a particular step within a Cloud Dataflow job. # # A job consists of multiple steps, each of which performs some # specific operation as part of the overall job. Data is typically # passed from one step to another as part of the job. # # Here's an example of a sequence of steps which together implement a # Map-Reduce job: # # * Read a collection of data from some source, parsing the # collection's elements. # # * Validate the elements. # # * Apply a user-defined function to map each element to some value # and extract an element-specific key value. # # * Group elements with the same key into a single element with # that key, transforming a multiply-keyed collection into a # uniquely-keyed collection. # # * Write the elements out to some data sink. # # Note that the Cloud Dataflow service may be used to run many different # types of jobs, not just Map-Reduce. "kind": "A String", # The kind of step in the Cloud Dataflow job. "properties": { # Named properties associated with the step. Each kind of # predefined step has its own required set of properties. # Must be provided on Create. Only retrieved with JOB_VIEW_ALL. "a_key": "", # Properties of the object. }, "name": "A String", # The name that identifies the step. This must be unique for each # step with respect to all other steps in the Cloud Dataflow job. }, ], "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID # of the job it replaced. # # When sending a `CreateJobRequest`, you can update a job by specifying it # here. The job named here is stopped, and its intermediate state is # transferred to this job. "currentState": "A String", # The current state of the job. # # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise # specified. # # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a # terminal state. After a job has reached a terminal state, no # further state updates may be made. # # This field may be mutated by the Cloud Dataflow service; # callers cannot mutate it. "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated. # isn't contained in the submitted job. "stages": { # A mapping from each stage to the information about that stage. "a_key": { # Contains information about how a particular # google.dataflow.v1beta3.Step will be executed. "stepName": [ # The steps associated with the execution stage. # Note that stages may have several steps, and that a given step # might be run by more than one stage. "A String", ], }, }, }, }
get(projectId, gcsPath=None, location=None, x__xgafv=None, view=None)
Get the template associated with a template. Args: projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required) gcsPath: string, Required. A Cloud Storage path to the template from which to create the job. Must be valid Cloud Storage URL, beginning with 'gs://'. location: string, The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to which to direct the request. x__xgafv: string, V1 error format. Allowed values 1 - v1 error format 2 - v2 error format view: string, The view to retrieve. Defaults to METADATA_ONLY. Returns: An object of the form: { # The response to a GetTemplate request. "status": { # The `Status` type defines a logical error model that is suitable for # The status of the get template request. Any problems with the # request will be indicated in the error_details. # different programming environments, including REST APIs and RPC APIs. It is # used by [gRPC](https://github.com/grpc). The error model is designed to be: # # - Simple to use and understand for most users # - Flexible enough to meet unexpected needs # # # Overview # # The `Status` message contains three pieces of data: error code, error # message, and error details. The error code should be an enum value of # google.rpc.Code, but it may accept additional error codes if needed. The # error message should be a developer-facing English message that helps # developers *understand* and *resolve* the error. If a localized user-facing # error message is needed, put the localized message in the error details or # localize it in the client. The optional error details may contain arbitrary # information about the error. There is a predefined set of error detail types # in the package `google.rpc` that can be used for common error conditions. # # # Language mapping # # The `Status` message is the logical representation of the error model, but it # is not necessarily the actual wire format. When the `Status` message is # exposed in different client libraries and different wire protocols, it can be # mapped differently. For example, it will likely be mapped to some exceptions # in Java, but more likely mapped to some error codes in C. # # # Other uses # # The error model and the `Status` message can be used in a variety of # environments, either with or without APIs, to provide a # consistent developer experience across different environments. # # Example uses of this error model include: # # - Partial errors. If a service needs to return partial errors to the client, # it may embed the `Status` in the normal response to indicate the partial # errors. # # - Workflow errors. A typical workflow has multiple steps. Each step may # have a `Status` message for error reporting. # # - Batch operations. If a client uses batch request and batch response, the # `Status` message should be used directly inside batch response, one for # each error sub-response. # # - Asynchronous operations. If an API call embeds asynchronous operation # results in its response, the status of those operations should be # represented directly using the `Status` message. # # - Logging. If some API errors are stored in logs, the message `Status` could # be used directly after any stripping needed for security/privacy reasons. "message": "A String", # A developer-facing error message, which should be in English. Any # user-facing error message should be localized and sent in the # google.rpc.Status.details field, or localized by the client. "code": 42, # The status code, which should be an enum value of google.rpc.Code. "details": [ # A list of messages that carry the error details. There is a common set of # message types for APIs to use. { "a_key": "", # Properties of the object. Contains field @type with type URL. }, ], }, "metadata": { # Metadata describing a template. # The template metadata describing the template name, available # parameters, etc. "name": "A String", # Required. The name of the template. "parameters": [ # The parameters for the template. { # Metadata for a specific parameter. "regexes": [ # Optional. Regexes that the parameter must match. "A String", ], "helpText": "A String", # Required. The help text to display for the parameter. "name": "A String", # Required. The name of the parameter. "isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false. "label": "A String", # Required. The label to display for the parameter. }, ], "description": "A String", # Optional. A description of the template. }, }
launch(projectId, body, dynamicTemplate_gcsPath=None, x__xgafv=None, dynamicTemplate_stagingLocation=None, location=None, gcsPath=None, validateOnly=None)
Launch a template. Args: projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required) body: object, The request body. (required) The object takes the form of: { # Parameters to provide to the template being launched. "environment": { # The environment values to set at runtime. # The runtime environment for the job. "machineType": "A String", # The machine type to use for the job. Defaults to the value from the # template if not specified. "network": "A String", # Network to which VMs will be assigned. If empty or unspecified, # the service will use the network "default". "zone": "A String", # The Compute Engine [availability # zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones) # for launching worker instances to run your pipeline. "additionalUserLabels": { # Additional user labels to be specified for the job. # Keys and values should follow the restrictions specified in the [labeling # restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions) # page. "a_key": "A String", }, "additionalExperiments": [ # Additional experiment flags for the job. "A String", ], "bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory. # Use with caution. "tempLocation": "A String", # The Cloud Storage path to use for temporary files. # Must be a valid Cloud Storage URL, beginning with `gs://`. "serviceAccountEmail": "A String", # The email address of the service account to run the job as. "numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job. "maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made # available to your pipeline during execution, from 1 to 1000. "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of # the form "regions/REGION/subnetworks/SUBNETWORK". }, "parameters": { # The runtime parameters to pass to the job. "a_key": "A String", }, "jobName": "A String", # Required. The job name to use for the created job. } dynamicTemplate_gcsPath: string, Path to dynamic template spec file on GCS. The file must be a Json serialized DynamicTemplateFieSpec object. x__xgafv: string, V1 error format. Allowed values 1 - v1 error format 2 - v2 error format dynamicTemplate_stagingLocation: string, Cloud Storage path for staging dependencies. Must be a valid Cloud Storage URL, beginning with `gs://`. location: string, The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to which to direct the request. gcsPath: string, A Cloud Storage path to the template from which to create the job. Must be valid Cloud Storage URL, beginning with 'gs://'. validateOnly: boolean, If true, the request is validated but not actually executed. Defaults to false. Returns: An object of the form: { # Response to the request to launch a template. "job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and # the job was successfully launched. "labels": { # User-defined labels for this job. # # The labels map can contain no more than 64 entries. Entries of the labels # map are UTF8 strings that comply with the following restrictions: # # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62} # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63} # * Both keys and values are additionally constrained to be <= 128 bytes in # size. "a_key": "A String", }, "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs # by the metadata values provided here. Populated for ListJobs and all GetJob # views SUMMARY and higher. # ListJob response and Job SUMMARY view. "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job. "versionDisplayName": "A String", # A readable string describing the version of the SDK. "version": "A String", # The version of the SDK used to run the job. "sdkSupportStatus": "A String", # The support status for this SDK version. }, "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job. { # Metadata for a PubSub connector used by the job. "topic": "A String", # Topic accessed in the connection. "subscription": "A String", # Subscription used in the connection. }, ], "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job. { # Metadata for a Datastore connector used by the job. "projectId": "A String", # ProjectId accessed in the connection. "namespace": "A String", # Namespace used in the connection. }, ], "fileDetails": [ # Identification of a File source used in the Dataflow job. { # Metadata for a File connector used by the job. "filePattern": "A String", # File Pattern used to access files by the connector. }, ], "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job. { # Metadata for a Spanner connector used by the job. "instanceId": "A String", # InstanceId accessed in the connection. "projectId": "A String", # ProjectId accessed in the connection. "databaseId": "A String", # DatabaseId accessed in the connection. }, ], "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job. { # Metadata for a BigTable connector used by the job. "instanceId": "A String", # InstanceId accessed in the connection. "projectId": "A String", # ProjectId accessed in the connection. "tableId": "A String", # TableId accessed in the connection. }, ], "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job. { # Metadata for a BigQuery connector used by the job. "projectId": "A String", # Project accessed in the connection. "dataset": "A String", # Dataset accessed in the connection. "table": "A String", # Table accessed in the connection. "query": "A String", # Query used to access data in the connection. }, ], }, "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time. # A description of the user pipeline and stages through which it is executed. # Created by Cloud Dataflow service. Only retrieved with # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL. # form. This data is provided by the Dataflow service for ease of visualizing # the pipeline and interpreting Dataflow provided metrics. "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them. { # Description of the type, names/ids, and input/outputs for a transform. "kind": "A String", # Type of transform. "name": "A String", # User provided name for this transform instance. "inputCollectionName": [ # User names for all collection inputs to this transform. "A String", ], "displayData": [ # Transform-specific display data. { # Data provided with a pipeline or transform to provide descriptive info. "shortStrValue": "A String", # A possible additional shorter value to display. # For example a java_class_name_value of com.mypackage.MyDoFn # will be stored with MyDoFn as the short_str_value and # com.mypackage.MyDoFn as the java_class_name value. # short_str_value can be displayed and java_class_name_value # will be displayed as a tooltip. "durationValue": "A String", # Contains value if the data is of duration type. "url": "A String", # An optional full URL. "floatValue": 3.14, # Contains value if the data is of float type. "namespace": "A String", # The namespace for the key. This is usually a class name or programming # language namespace (i.e. python module) which defines the display data. # This allows a dax monitoring system to specially handle the data # and perform custom rendering. "javaClassValue": "A String", # Contains value if the data is of java class type. "label": "A String", # An optional label to display in a dax UI for the element. "boolValue": True or False, # Contains value if the data is of a boolean type. "strValue": "A String", # Contains value if the data is of string type. "key": "A String", # The key identifying the display data. # This is intended to be used as a label for the display data # when viewed in a dax monitoring system. "int64Value": "A String", # Contains value if the data is of int64 type. "timestampValue": "A String", # Contains value if the data is of timestamp type. }, ], "outputCollectionName": [ # User names for all collection outputs to this transform. "A String", ], "id": "A String", # SDK generated id of this transform instance. }, ], "executionPipelineStage": [ # Description of each stage of execution of the pipeline. { # Description of the composing transforms, names/ids, and input/outputs of a # stage of execution. Some composing transforms and sources may have been # generated by the Dataflow service during execution planning. "componentSource": [ # Collections produced and consumed by component transforms of this stage. { # Description of an interstitial value between transforms in an execution # stage. "userName": "A String", # Human-readable name for this transform; may be user or system generated. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. "name": "A String", # Dataflow service generated name for this source. }, ], "kind": "A String", # Type of tranform this stage is executing. "name": "A String", # Dataflow service generated name for this stage. "outputSource": [ # Output sources for this stage. { # Description of an input or output of an execution stage. "userName": "A String", # Human-readable name for this source; may be user or system generated. "sizeBytes": "A String", # Size of the source, if measurable. "name": "A String", # Dataflow service generated name for this source. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. }, ], "inputSource": [ # Input sources for this stage. { # Description of an input or output of an execution stage. "userName": "A String", # Human-readable name for this source; may be user or system generated. "sizeBytes": "A String", # Size of the source, if measurable. "name": "A String", # Dataflow service generated name for this source. "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this # source is most closely associated. }, ], "componentTransform": [ # Transforms that comprise this execution stage. { # Description of a transform executed as part of an execution stage. "userName": "A String", # Human-readable name for this transform; may be user or system generated. "originalTransform": "A String", # User name for the original user transform with which this transform is # most closely associated. "name": "A String", # Dataflow service generated name for this source. }, ], "id": "A String", # Dataflow service generated id for this stage. }, ], "displayData": [ # Pipeline level display data. { # Data provided with a pipeline or transform to provide descriptive info. "shortStrValue": "A String", # A possible additional shorter value to display. # For example a java_class_name_value of com.mypackage.MyDoFn # will be stored with MyDoFn as the short_str_value and # com.mypackage.MyDoFn as the java_class_name value. # short_str_value can be displayed and java_class_name_value # will be displayed as a tooltip. "durationValue": "A String", # Contains value if the data is of duration type. "url": "A String", # An optional full URL. "floatValue": 3.14, # Contains value if the data is of float type. "namespace": "A String", # The namespace for the key. This is usually a class name or programming # language namespace (i.e. python module) which defines the display data. # This allows a dax monitoring system to specially handle the data # and perform custom rendering. "javaClassValue": "A String", # Contains value if the data is of java class type. "label": "A String", # An optional label to display in a dax UI for the element. "boolValue": True or False, # Contains value if the data is of a boolean type. "strValue": "A String", # Contains value if the data is of string type. "key": "A String", # The key identifying the display data. # This is intended to be used as a label for the display data # when viewed in a dax monitoring system. "int64Value": "A String", # Contains value if the data is of int64 type. "timestampValue": "A String", # Contains value if the data is of timestamp type. }, ], }, "stageStates": [ # This field may be mutated by the Cloud Dataflow service; # callers cannot mutate it. { # A message describing the state of a particular execution stage. "executionStageName": "A String", # The name of the execution stage. "executionStageState": "A String", # Executions stage states allow the same set of values as JobState. "currentStateTime": "A String", # The time at which the stage transitioned to this state. }, ], "id": "A String", # The unique ID of this job. # # This field is set by the Cloud Dataflow service when the Job is # created, and is immutable for the life of the job. "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in # `JOB_STATE_UPDATED`), this field contains the ID of that job. "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to. "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the # corresponding name prefixes of the new job. "a_key": "A String", }, "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job. "version": { # A structure describing which components and their versions of the service # are required in order to run the job. "a_key": "", # Properties of the object. }, "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in. "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data # at rest, AKA a Customer Managed Encryption Key (CMEK). # # Format: # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY "internalExperiments": { # Experimental settings. "a_key": "", # Properties of the object. Contains field @type with type URL. }, "dataset": "A String", # The dataset for the current project where various workflow # related tables are stored. # # The supported resource type is: # # Google BigQuery: # bigquery.googleapis.com/{dataset} "experiments": [ # The list of experiments to enable. "A String", ], "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account. "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These # options are passed through the service and are used to recreate the # SDK pipeline options on the worker in a language agnostic and platform # independent way. "a_key": "", # Properties of the object. }, "userAgent": { # A description of the process that generated the request. "a_key": "", # Properties of the object. }, "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or # unspecified, the service will attempt to choose a reasonable # default. This should be in the form of the API service name, # e.g. "compute.googleapis.com". "workerPools": [ # The worker pools. At least one "harness" worker pool must be # specified in order for the job to have workers. { # Describes one particular pool of Cloud Dataflow workers to be # instantiated by the Cloud Dataflow service in order to perform the # computations required by a job. Note that a workflow job may use # multiple pools, in order to match the various computational # requirements of the various stages of the job. "diskSourceImage": "A String", # Fully qualified source image for disks. "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when # using the standard Dataflow task runner. Users should ignore # this field. "workflowFileName": "A String", # The file to store the workflow in. "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs # will not be uploaded. # # The supported resource type is: # # Google Cloud Storage: # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "commandlinesFileName": "A String", # The file to store preprocessing commands in. "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness. "reportingEnabled": True or False, # Whether to send work progress updates to the service. "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example, # "shuffle/v1beta1". "workerId": "A String", # The ID of the worker running this pipeline. "baseUrl": "A String", # The base URL for accessing Google Cloud APIs. # # When workers access Google Cloud APIs, they logically do so via # relative URLs. If this field is specified, it supplies the base # URL to use for resolving these relative URLs. The normative # algorithm used is defined by RFC 1808, "Relative Uniform Resource # Locators". # # If not specified, the default value is "http://www.googleapis.com/" "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example, # "dataflow/v1b3/projects". "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary # storage. # # The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} }, "vmId": "A String", # The ID string of the VM. "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories. "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit. "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to # access the Cloud Dataflow API. "A String", ], "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by # taskrunner; e.g. "root". "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs. # # When workers access Google Cloud APIs, they logically do so via # relative URLs. If this field is specified, it supplies the base # URL to use for resolving these relative URLs. The normative # algorithm used is defined by RFC 1808, "Relative Uniform Resource # Locators". # # If not specified, the default value is "http://www.googleapis.com/" "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by # taskrunner; e.g. "wheel". "languageHint": "A String", # The suggested backend language. "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial # console. "streamingWorkerMainClass": "A String", # The streaming worker main class name. "logDir": "A String", # The directory on the VM to store logs. "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3" "harnessCommand": "A String", # The command to launch the worker harness. "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for # temporary storage. # # The supported resource type is: # # Google Cloud Storage: # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr. }, "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle` # are supported. "packages": [ # Packages to be installed on workers. { # The packages that must be installed in order for a worker to run the # steps of the Cloud Dataflow job that will be assigned to its worker # pool. # # This is the mechanism by which the Cloud Dataflow SDK causes code to # be loaded onto the workers. For example, the Cloud Dataflow Java SDK # might use this to install jars containing the user's code and all of the # various dependencies (libraries, data files, etc.) required in order # for that code to run. "location": "A String", # The resource to read the package from. The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket} # bucket.storage.googleapis.com/ "name": "A String", # The name of the package. }, ], "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the # service will attempt to choose a reasonable default. "network": "A String", # Network to which VMs will be assigned. If empty or unspecified, # the service will use the network "default". "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service # will attempt to choose a reasonable default. "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will # attempt to choose a reasonable default. "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool. # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and # `TEARDOWN_NEVER`. # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn # down. # # If the workers are not torn down by the service, they will # continue to run and use Google Compute Engine VM resources in the # user's project until they are explicitly terminated by the user. # Because of this, Google recommends using the `TEARDOWN_ALWAYS` # policy except for small, manually supervised test jobs. # # If unknown or unspecified, the service will attempt to choose a reasonable # default. "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google # Compute Engine API. "ipConfiguration": "A String", # Configuration for VM IPs. "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the # service will choose a number of threads (according to the number of cores # on the selected machine type for batch, or 1 by convention for streaming). "poolArgs": { # Extra arguments for this worker pool. "a_key": "", # Properties of the object. Contains field @type with type URL. }, "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to # execute the job. If zero or unspecified, the service will # attempt to choose a reasonable default. "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker # harness, residing in Google Container Registry. "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of # the form "regions/REGION/subnetworks/SUBNETWORK". "dataDisks": [ # Data disks that are used by a VM in this workflow. { # Describes the data disk used by a workflow job. "mountPoint": "A String", # Directory in a VM where disk is mounted. "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will # attempt to choose a reasonable default. "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This # must be a disk type appropriate to the project and zone in which # the workers will run. If unknown or unspecified, the service # will attempt to choose a reasonable default. # # For example, the standard persistent disk type is a resource name # typically ending in "pd-standard". If SSD persistent disks are # available, the resource name typically ends with "pd-ssd". The # actual valid values are defined the Google Compute Engine API, # not by the Cloud Dataflow API; consult the Google Compute Engine # documentation for more information about determining the set of # available disk types for a particular project and zone. # # Google Compute Engine Disk types are local to a particular # project in a particular zone, and so the resource name will # typically look something like this: # # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard }, ], "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool. "maxNumWorkers": 42, # The maximum number of workers to cap scaling at. "algorithm": "A String", # The algorithm to use for autoscaling. }, "defaultPackageSet": "A String", # The default package set to install. This allows the service to # select a default set of packages which are useful to worker # harnesses written in a particular language. "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will # attempt to choose a reasonable default. "metadata": { # Metadata to set on the Google Compute Engine VMs. "a_key": "A String", }, }, ], "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary # storage. The system will append the suffix "/temp-{JOBNAME} to # this resource prefix, where {JOBNAME} is the value of the # job_name field. The resulting bucket and object prefix is used # as the prefix of the resources used to store temporary data # needed during the job execution. NOTE: This will override the # value in taskrunner_settings. # The supported resource type is: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} }, "location": "A String", # The [regional endpoint] # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that # contains this job. "tempFiles": [ # A set of files the system should be aware of that are used # for temporary storage. These temporary files will be # removed on job completion. # No duplicates are allowed. # No file patterns are supported. # # The supported files are: # # Google Cloud Storage: # # storage.googleapis.com/{bucket}/{object} # bucket.storage.googleapis.com/{object} "A String", ], "type": "A String", # The type of Cloud Dataflow job. "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts. # If this field is set, the service will ensure its uniqueness. # The request to create a job will fail if the service has knowledge of a # previously submitted job with the same client's ID and job name. # The caller may use this field to ensure idempotence of job # creation across retried attempts to create a job. # By default, the field is empty and, in that case, the service ignores it. "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given # snapshot. "stepsLocation": "A String", # The GCS location where the steps are stored. "currentStateTime": "A String", # The timestamp associated with the current state. "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING). # Flexible resource scheduling jobs are started with some delay after job # creation, so start_time is unset before start and is updated when the # job is started by the Cloud Dataflow service. For other jobs, start_time # always equals to create_time and is immutable and set by the Cloud Dataflow # service. "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the # Cloud Dataflow service. "requestedState": "A String", # The job's requested state. # # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may # also be used to directly set a job's requested state to # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the # job if it has not already reached a terminal state. "name": "A String", # The user-specified Cloud Dataflow job name. # # Only one Job with a given name may exist in a project at any # given time. If a caller attempts to create a Job with the same # name as an already-existing Job, the attempt returns the # existing Job. # # The name must match the regular expression # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?` "steps": [ # Exactly one of step or steps_location should be specified. # # The top-level steps that constitute the entire job. { # Defines a particular step within a Cloud Dataflow job. # # A job consists of multiple steps, each of which performs some # specific operation as part of the overall job. Data is typically # passed from one step to another as part of the job. # # Here's an example of a sequence of steps which together implement a # Map-Reduce job: # # * Read a collection of data from some source, parsing the # collection's elements. # # * Validate the elements. # # * Apply a user-defined function to map each element to some value # and extract an element-specific key value. # # * Group elements with the same key into a single element with # that key, transforming a multiply-keyed collection into a # uniquely-keyed collection. # # * Write the elements out to some data sink. # # Note that the Cloud Dataflow service may be used to run many different # types of jobs, not just Map-Reduce. "kind": "A String", # The kind of step in the Cloud Dataflow job. "properties": { # Named properties associated with the step. Each kind of # predefined step has its own required set of properties. # Must be provided on Create. Only retrieved with JOB_VIEW_ALL. "a_key": "", # Properties of the object. }, "name": "A String", # The name that identifies the step. This must be unique for each # step with respect to all other steps in the Cloud Dataflow job. }, ], "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID # of the job it replaced. # # When sending a `CreateJobRequest`, you can update a job by specifying it # here. The job named here is stopped, and its intermediate state is # transferred to this job. "currentState": "A String", # The current state of the job. # # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise # specified. # # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a # terminal state. After a job has reached a terminal state, no # further state updates may be made. # # This field may be mutated by the Cloud Dataflow service; # callers cannot mutate it. "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated. # isn't contained in the submitted job. "stages": { # A mapping from each stage to the information about that stage. "a_key": { # Contains information about how a particular # google.dataflow.v1beta3.Step will be executed. "stepName": [ # The steps associated with the execution stage. # Note that stages may have several steps, and that a given step # might be run by more than one stage. "A String", ], }, }, }, }, }