apache_beam.runners.dataflow.test_dataflow_runner module

Wrapper of Beam runners that’s built for running and verifying e2e tests.

class apache_beam.runners.dataflow.test_dataflow_runner.TestDataflowRunner(cache=None)[source]

Bases: apache_beam.runners.dataflow.dataflow_runner.DataflowRunner

run_pipeline(pipeline, options)[source]

Execute test pipeline and verify test matcher

build_console_url(options)[source]

Build a console url of Dataflow job.

wait_until_in_state(expected_state, timeout=600)[source]

Wait until Dataflow pipeline enters a certain state.

class CombineValuesPTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for CombineValues.

The DataflowRunner expects that the CombineValues PTransform acts as a primitive. So this override replaces the CombineValues with a primitive.

get_replacement_inputs(applied_ptransform)

Provides inputs that will be passed to the replacement PTransform.

Parameters:applied_ptransform – Original AppliedPTransform containing the PTransform to be replaced.
Returns:An iterable of PValues that will be passed to the expand() method of the replacement PTransform.
get_replacement_transform(ptransform)
get_replacement_transform_for_applied_ptransform(applied_ptransform)

Provides a runner specific override for a given AppliedPTransform.

Parameters:applied_ptransformAppliedPTransform containing the PTransform to be replaced.
Returns:A PTransform that will be the replacement for the PTransform inside the AppliedPTransform given as an argument.
matches(applied_ptransform)
class CreatePTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Create in streaming mode.

get_replacement_inputs(applied_ptransform)

Provides inputs that will be passed to the replacement PTransform.

Parameters:applied_ptransform – Original AppliedPTransform containing the PTransform to be replaced.
Returns:An iterable of PValues that will be passed to the expand() method of the replacement PTransform.
get_replacement_transform(ptransform)

Provides a runner specific override for a given PTransform.

Parameters:ptransform – PTransform to be replaced.
Returns:A PTransform that will be the replacement for the PTransform given as an argument.
get_replacement_transform_for_applied_ptransform(applied_ptransform)
matches(applied_ptransform)
class JrhReadPTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Read(BoundedSource)

get_replacement_inputs(applied_ptransform)

Provides inputs that will be passed to the replacement PTransform.

Parameters:applied_ptransform – Original AppliedPTransform containing the PTransform to be replaced.
Returns:An iterable of PValues that will be passed to the expand() method of the replacement PTransform.
get_replacement_transform(ptransform)

Provides a runner specific override for a given PTransform.

Parameters:ptransform – PTransform to be replaced.
Returns:A PTransform that will be the replacement for the PTransform given as an argument.
get_replacement_transform_for_applied_ptransform(applied_ptransform)
matches(applied_ptransform)
class NativeReadPTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Read using native sources.

The DataflowRunner expects that the Read PTransform using native sources act as a primitive. So this override replaces the Read with a primitive.

get_replacement_inputs(applied_ptransform)

Provides inputs that will be passed to the replacement PTransform.

Parameters:applied_ptransform – Original AppliedPTransform containing the PTransform to be replaced.
Returns:An iterable of PValues that will be passed to the expand() method of the replacement PTransform.
get_replacement_transform(ptransform)
get_replacement_transform_for_applied_ptransform(applied_ptransform)

Provides a runner specific override for a given AppliedPTransform.

Parameters:applied_ptransformAppliedPTransform containing the PTransform to be replaced.
Returns:A PTransform that will be the replacement for the PTransform inside the AppliedPTransform given as an argument.
matches(applied_ptransform)
class ReadPTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Read(BoundedSource)

get_replacement_inputs(applied_ptransform)

Provides inputs that will be passed to the replacement PTransform.

Parameters:applied_ptransform – Original AppliedPTransform containing the PTransform to be replaced.
Returns:An iterable of PValues that will be passed to the expand() method of the replacement PTransform.
get_replacement_transform(ptransform)

Provides a runner specific override for a given PTransform.

Parameters:ptransform – PTransform to be replaced.
Returns:A PTransform that will be the replacement for the PTransform given as an argument.
get_replacement_transform_for_applied_ptransform(applied_ptransform)
matches(applied_ptransform)
add_pcoll_with_auto_sharding(applied_ptransform)
apply(transform, input, options)
apply_GroupByKey(transform, pcoll, options)
apply_PTransform(transform, input, options)
static byte_array_to_json_string(raw_bytes)

Implements org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString.

static combinefn_visitor()
classmethod deserialize_windowing_strategy(serialized_data)
static flatten_input_visitor()
get_default_gcp_region()

Get a default value for Google Cloud region according to https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. If no default can be found, returns None.

get_pcoll_with_auto_sharding()
is_fnapi_compatible()
static json_string_to_byte_array(encoded_string)

Implements org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray.

static poll_for_job_completion(runner, result, duration)

Polls for the specified job to finish running (successfully or not).

Updates the result with the new job information before returning.

Parameters:
  • runner – DataflowRunner instance to use for polling job state.
  • result – DataflowPipelineResult instance used for job information.
  • duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.
run(transform, options=None)

Run the given transform or callable with this runner.

Blocks until the pipeline is complete. See also PipelineRunner.run_async.

run_CombineValuesReplacement(transform_node, options)
run_ExternalTransform(transform_node, options)
run_Flatten(transform_node, options)
run_GroupByKey(transform_node, options)
run_Impulse(transform_node, options)
run_ParDo(transform_node, options)
run_Read(transform_node, options)
run_TestStream(transform_node, options)
run__NativeWrite(transform_node, options)
run_async(transform, options=None)

Run the given transform or callable with this runner.

May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.

run_transform(transform_node, options)

Runner callback for a pipeline.run call.

Parameters:transform_node – transform node for the transform to run.

A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline.

classmethod serialize_windowing_strategy(windowing, default_environment)
static side_input_visitor(use_unified_worker=False, use_fn_api=False, deterministic_key_coders=True)
visit_transforms(pipeline, options)