apache_beam.runners.dataflow.native_io.iobase module¶
Dataflow native sources and sinks.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.runners.dataflow.native_io.iobase.
NativeSource
[source]¶ Bases:
apache_beam.io.iobase.SourceBase
A source implemented by Dataflow service.
This class is to be only inherited by sources natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.
This class is deprecated and should not be used to define new sources.
-
coder
= None¶
-
display_data
()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:value
pairs. The value might be an integer, float or string value; aDisplayDataItem
for values that have more data (e.g. short value, label, url); or aHasDisplayData
instance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api
(fn_proto, context)¶ Converts from an FunctionSpec to a Fn object.
Prefer registering a urn with its parameter type and constructor.
-
classmethod
register_pickle_urn
(pickle_urn)¶ Registers and implements the given urn via pickling.
-
classmethod
register_urn
(urn, parameter_type, fn=None)¶ Registers a urn with a constructor.
For example, if ‘beam:fn:foo’ had parameter type FooPayload, one could write RunnerApiFn.register_urn(‘bean:fn:foo’, FooPayload, foo_from_proto) where foo_from_proto took as arguments a FooPayload and a PipelineContext. This function can also be used as a decorator rather than passing the callable in as the final parameter.
A corresponding to_runner_api_parameter method would be expected that returns the tuple (‘beam:fn:foo’, FooPayload)
-
to_runner_api
(context)¶ Returns an FunctionSpec encoding this Fn.
Prefer overriding self.to_runner_api_parameter.
-
to_runner_api_parameter
(context)¶
-
-
class
apache_beam.runners.dataflow.native_io.iobase.
NativeSourceReader
[source]¶ Bases:
object
A reader for a source implemented by Dataflow service.
-
returns_windowed_values
¶ Returns whether this reader returns windowed values.
-
get_progress
()[source]¶ Returns a representation of how far the reader has read.
Returns: A SourceReaderProgress object that gives the current progress of the reader.
-
request_dynamic_split
(dynamic_split_request)[source]¶ Attempts to split the input in two parts.
The two parts are named the “primary” part and the “residual” part. The current ‘NativeSourceReader’ keeps processing the primary part, while the residual part will be processed elsewhere (e.g. perhaps on a different worker).
The primary and residual parts, if concatenated, must represent the same input as the current input of this ‘NativeSourceReader’ before this call.
The boundary between the primary part and the residual part is specified in a framework-specific way using ‘DynamicSplitRequest’ e.g., if the framework supports the notion of positions, it might be a position at which the input is asked to split itself (which is not necessarily the same position at which it will split itself); it might be an approximate fraction of input, or something else.
This function returns a ‘DynamicSplitResult’, which encodes, in a framework-specific way, the information sufficient to construct a description of the resulting primary and residual inputs. For example, it might, again, be a position demarcating these parts, or it might be a pair of fully-specified input descriptions, or something else.
After a successful call to ‘request_dynamic_split()’, subsequent calls should be interpreted relative to the new primary.
Parameters: dynamic_split_request – A ‘DynamicSplitRequest’ describing the split request. Returns: ‘None’ if the ‘DynamicSplitRequest’ cannot be honored (in that case the input represented by this ‘NativeSourceReader’ stays the same), or a ‘DynamicSplitResult’ describing how the input was split into a primary and residual part.
-
-
class
apache_beam.runners.dataflow.native_io.iobase.
ReaderProgress
(position=None, percent_complete=None, remaining_time=None, consumed_split_points=None, remaining_split_points=None)[source]¶ Bases:
object
A representation of how far a NativeSourceReader has read.
-
position
¶ Returns progress, represented as a ReaderPosition object.
-
percent_complete
¶ Returns progress, represented as a percentage of total work.
Progress range from 0.0 (beginning, nothing complete) to 1.0 (end of the work range, entire WorkItem complete).
Returns: Progress represented as a percentage of total work.
-
remaining_time
¶ Returns progress, represented as an estimated time remaining.
-
consumed_split_points
¶
-
remaining_split_points
¶
-
-
class
apache_beam.runners.dataflow.native_io.iobase.
ReaderPosition
(end=None, key=None, byte_offset=None, record_index=None, shuffle_position=None, concat_position=None)[source]¶ Bases:
object
A representation of position in an iteration of a ‘NativeSourceReader’.
Initializes ReaderPosition.
A ReaderPosition may get instantiated for one of these position types. Only one of these should be specified.
Parameters: - end – position is past all other positions. For example, this may be used to represent the end position of an unbounded range.
- key – position is a string key.
- byte_offset – position is a byte offset.
- record_index – position is a record index
- shuffle_position – position is a base64 encoded shuffle position.
- concat_position – position is a ‘ConcatPosition’.
-
class
apache_beam.runners.dataflow.native_io.iobase.
ConcatPosition
(index, position)[source]¶ Bases:
object
A position that encapsulate an inner position and an index.
This is used to represent the position of a source that encapsulate several other sources.
Initializes ConcatPosition.
Parameters: - index – index of the source currently being read.
- position – inner position within the source currently being read.
-
class
apache_beam.runners.dataflow.native_io.iobase.
DynamicSplitRequest
(progress)[source]¶ Bases:
object
Specifies how ‘NativeSourceReader.request_dynamic_split’ should split.
-
class
apache_beam.runners.dataflow.native_io.iobase.
DynamicSplitResultWithPosition
(stop_position)[source]¶ Bases:
apache_beam.runners.dataflow.native_io.iobase.DynamicSplitResult
-
class
apache_beam.runners.dataflow.native_io.iobase.
NativeSink
[source]¶ Bases:
apache_beam.transforms.display.HasDisplayData
A sink implemented by Dataflow service.
This class is to be only inherited by sinks natively implemented by Cloud Dataflow service, hence should not be sub-classed by users.
-
display_data
()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:value
pairs. The value might be an integer, float or string value; aDisplayDataItem
for values that have more data (e.g. short value, label, url); or aHasDisplayData
instance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-