apache_beam.io.restriction_trackers module

iobase.RestrictionTracker implementations provided with Apache Beam.

class apache_beam.io.restriction_trackers.OffsetRange(start, stop)[source]

Bases: object

split(desired_num_offsets_per_split, min_num_offsets_per_split=1)[source]
split_at(split_pos)[source]
new_tracker()[source]
size()[source]
class apache_beam.io.restriction_trackers.OffsetRestrictionTracker(offset_range)[source]

Bases: apache_beam.io.iobase.RestrictionTracker

An iobase.RestrictionTracker implementations for an offset range.

Offset range is represented as OffsetRange.

check_done()[source]
current_restriction()[source]
current_progress()[source]
start_position()[source]
stop_position()[source]
try_claim(position)[source]
try_split(fraction_of_remainder)[source]
is_bounded()[source]
class apache_beam.io.restriction_trackers.UnsplittableRestrictionTracker(underling_tracker)[source]

Bases: apache_beam.io.iobase.RestrictionTracker

An iobase.RestrictionTracker that wraps another but does not split.

try_split(fraction_of_remainder)[source]
check_done()

Checks whether the restriction has been fully processed.

Called by the SDK harness after iterator returned by DoFn.process() has been fully read.

This method must raise a ValueError if there is still any unclaimed work remaining in the restriction when this method is invoked. Exception raised must have an informative error message.

This API is required to be implemented in order to make sure no data loss during SDK processing.

Returns: True if current restriction has been fully processed. :raises: ValueError – if there is still any unclaimed work remaining.

current_progress()

Returns a RestrictionProgress object representing the current progress.

This API is recommended to be implemented. The runner can do a better job at parallel processing with better progress signals.

current_restriction()

Returns the current restriction.

Returns a restriction accurately describing the full range of work the current DoFn.process() call will do, including already completed work.

The current restriction returned by method may be updated dynamically due to due to concurrent invocation of other methods of the RestrictionTracker, For example, split().

This API is required to be implemented.

Returns: a restriction object.

is_bounded()

Returns whether the amount of work represented by the current restriction is bounded.

The boundedness of the restriction is used to determine the default behavior of how to truncate restrictions when a pipeline is being drained. # pylint: disable=line-too-long If the restriction is bounded, then the entire restriction will be processed otherwise the restriction will be processed till a checkpoint is possible.

The API is required to be implemented.

Returns: True if the restriction represents a finite amount of work. Otherwise, returns False.

try_claim(position)

Attempts to claim the block of work in the current restriction identified by the given position. Each claimed position MUST be a valid split point.

If this succeeds, the DoFn MUST execute the entire block of work. If it fails, the DoFn.process() MUST return None without performing any additional work or emitting output (note that emitting output or performing work from DoFn.process() is also not allowed before the first call of this method).

The API is required to be implemented.

Parameters:position – current position that wants to be claimed.

Returns: True if the position can be claimed as current_position. Otherwise, returns False.