apache_beam.transforms.combiners module

A library of basic combiner PTransform subclasses.

class apache_beam.transforms.combiners.Mean[source]

Bases: object

Combiners for computing arithmetic means of elements.

class Globally(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

combiners.Mean.Globally computes the arithmetic mean of the elements.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Mean.PerKey finds the means of the values for each key.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
class apache_beam.transforms.combiners.Count[source]

Bases: object

Combiners for counting elements.

class Globally(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

combiners.Count.Globally counts the total number of elements.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Count.PerKey counts how many elements each unique key has.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
class PerElement(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Count.PerElement counts how many times each element occurs.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
class apache_beam.transforms.combiners.Top[source]

Bases: object

Combiners for obtaining extremal elements.

class Of(n, **kwargs)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

Obtain a list of the compare-most N elements in a PCollection.

This transform will retrieve the n greatest elements in the PCollection to which it is applied, where “greatest” is determined by the comparator function supplied as the compare argument.

Creates a global Top operation.

The arguments ‘key’ and ‘reverse’ may be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • pcoll – PCollection to process.
  • n – number of elements to extract from pcoll.
  • **kwargs – may contain ‘key’ and/or ‘reverse’
default_label()[source]
expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class PerKey(n, **kwargs)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Identifies the compare-most N elements associated with each key.

This transform will produce a PCollection mapping unique keys in the input PCollection to the n greatest elements with which they are associated, where “greatest” is determined by the comparator function supplied as the compare argument in the initializer.

Creates a per-key Top operation.

The arguments ‘key’ and ‘reverse’ may be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • pcoll – PCollection to process.
  • n – number of elements to extract from pcoll.
  • **kwargs – may contain ‘key’ and/or ‘reverse’
default_label()[source]
expand(pcoll)[source]

Expands the transform.

Raises TypeCheckError: If the output type of the input PCollection is not compatible with Tuple[A, B].

Parameters:pcoll – PCollection to process
Returns:the PCollection containing the result.
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
static Largest(pcoll, n, has_defaults=True)[source]

Obtain a list of the greatest N elements in a PCollection.

static Smallest(pcoll, n, has_defaults=True)[source]

Obtain a list of the least N elements in a PCollection.

static LargestPerKey(pcoll, n)[source]

Identifies the N greatest elements associated with each key.

static SmallestPerKey(pcoll, n, reverse=True)[source]

Identifies the N least elements associated with each key.

class apache_beam.transforms.combiners.Sample[source]

Bases: object

Combiners for sampling n elements without replacement.

class FixedSizeGlobally(n)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

Sample n elements from the input PCollection without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_type_hints()
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class FixedSizePerKey(n)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Sample n elements associated with each key without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_type_hints()
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
class apache_beam.transforms.combiners.ToList(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a single list.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class apache_beam.transforms.combiners.ToDict(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a single dict.

PCollections should consist of 2-tuples, notionally (key, value) pairs. If multiple values are associated with the same key, only one of the values will be present in the resulting dict.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class apache_beam.transforms.combiners.ToSet(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a set.

expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class apache_beam.transforms.combiners.Latest[source]

Bases: object

Combiners for computing the latest element

class Globally(has_defaults=True)[source]

Bases: apache_beam.transforms.combiners.CombinerWithoutDefaults

Compute the element with the latest timestamp from a PCollection.

static add_timestamp(element, timestamp=TimestampParam)[source]
expand(pcoll)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_defaults(has_defaults=True)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
without_defaults()
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Compute elements with the latest timestamp for each key from a keyed PCollection

static add_timestamp(element, timestamp=TimestampParam)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
expand(pcoll)[source]