apache_beam.io.tfrecordio module¶
TFRecord sources and sinks.
-
class
apache_beam.io.tfrecordio.ReadFromTFRecord(file_pattern, coder=BytesCoder, compression_type='auto', validate=True)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformTransform for reading TFRecord sources.
Initialize a ReadFromTFRecord transform.
Parameters: - file_pattern – A file glob pattern to read TFRecords from.
- coder – Coder used to decode each record.
- compression_type – Used to handle compressed input files. Default value is CompressionTypes.AUTO, in which case the file_path’s extension will be used to detect the compression.
- validate – Boolean flag to verify that the files exist during the pipeline creation time.
Returns: A ReadFromTFRecord transform object.
-
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label()¶
-
default_type_hints()¶
-
display_data()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api(proto, context)¶
-
get_type_hints()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type(unused_input_type)¶
-
label¶
-
pipeline= None¶
-
classmethod
register_urn(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input()¶
-
side_inputs= ()¶
-
to_runner_api(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter(unused_context)¶
-
to_runner_api_pickled(unused_context)¶
-
type_check_inputs(pvalueish)¶
-
type_check_inputs_or_outputs(pvalueish, input_or_output)¶
-
type_check_outputs(pvalueish)¶
-
with_input_types(input_type_hint)¶ Annotates the input type of a
PTransformwith a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.Raises: TypeError– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types(type_hint)¶ Annotates the output type of a
PTransformwith a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.Raises: TypeError– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
-
class
apache_beam.io.tfrecordio.WriteToTFRecord(file_path_prefix, coder=BytesCoder, file_name_suffix='', num_shards=0, shard_name_template=None, compression_type='auto')[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformTransform for writing to TFRecord sinks.
Initialize WriteToTFRecord transform.
Parameters: - file_path_prefix – The file path to write to. The files written will begin with this prefix, followed by a shard identifier (see num_shards), and end in a common extension, if given by file_name_suffix.
- coder – Coder used to encode each record.
- file_name_suffix – Suffix for the files written.
- num_shards – The number of files (shards) used for output. If not set, the default value will be used.
- shard_name_template – A template string containing placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters ‘S’ and ‘N’ are replaced with the 0-padded shard number and shard count respectively. This argument can be ‘’ in which case it behaves as if num_shards was set to 1 and only one file will be generated. The default pattern used is ‘-SSSSS-of-NNNNN’ if None is passed as the shard_name_template.
- compression_type – Used to handle compressed output files. Typical value is CompressionTypes.AUTO, in which case the file_path’s extension will be used to detect the compression.
Returns: A WriteToTFRecord transform object.
-
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label()¶
-
default_type_hints()¶
-
display_data()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api(proto, context)¶
-
get_type_hints()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type(unused_input_type)¶
-
label¶
-
pipeline= None¶
-
classmethod
register_urn(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input()¶
-
side_inputs= ()¶
-
to_runner_api(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter(unused_context)¶
-
to_runner_api_pickled(unused_context)¶
-
type_check_inputs(pvalueish)¶
-
type_check_inputs_or_outputs(pvalueish, input_or_output)¶
-
type_check_outputs(pvalueish)¶
-
with_input_types(input_type_hint)¶ Annotates the input type of a
PTransformwith a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.Raises: TypeError– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types(type_hint)¶ Annotates the output type of a
PTransformwith a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.Raises: TypeError– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform