apache_beam.ml.gcp.visionml module¶
A connector for sending API requests to the GCP Vision API.
-
class
apache_beam.ml.gcp.visionml.
AnnotateImage
(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, context_side_input=None, metadata=None)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A
PTransform
for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/Batches elements together using
util.BatchElements
PTransform and sends each batch of elements to the GCP Vision API. Element is a Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or binary_type base64-encoded image data. Accepts an AsDict side input that maps each image to an image context.Parameters: - features – (List[
vision.types.Feature.enums.Feature
]) Required. The Vision API features to detect - retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
- timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
- max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
- min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
- client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
- context_side_input –
(beam.pvalue.AsDict) Optional. An
AsDict
of a PCollection to be passed to the _ImageAnnotateFn as the image context mapping containing additional image context and/or feature-specific parameters. Example usage:image_contexts = [(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict, ``vision.types.ImageContext()``]), (''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict, ``vision.types.ImageContext()``]),] context_side_input = ( p | "Image contexts" >> beam.Create(image_contexts) ) visionml.AnnotateImage(features, context_side_input=beam.pvalue.AsDict(context_side_input)))
- metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
-
MAX_BATCH_SIZE
= 5¶
-
MIN_BATCH_SIZE
= 1¶
-
annotations
() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label
()¶
-
default_type_hints
()¶
-
display_data
()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:value
pairs. The value might be an integer, float or string value; aDisplayDataItem
for values that have more data (e.g. short value, label, url); or aHasDisplayData
instance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api
(proto, context)¶
-
get_type_hints
()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing
(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type
(unused_input_type)¶
-
label
¶
-
pipeline
= None¶
-
classmethod
register_urn
(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input
()¶
-
side_inputs
= ()¶
-
to_runner_api
(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter
(unused_context)¶
-
to_runner_api_pickled
(unused_context)¶
-
type_check_inputs
(pvalueish)¶
-
type_check_inputs_or_outputs
(pvalueish, input_or_output)¶
-
type_check_outputs
(pvalueish)¶
-
with_input_types
(input_type_hint)¶ Annotates the input type of a
PTransform
with a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint
.Raises: TypeError
– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()
for further details.Returns: A reference to the instance of this particular PTransform
object. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types
(type_hint)¶ Annotates the output type of a
PTransform
with a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint
.Raises: TypeError
– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()
for further details.Returns: A reference to the instance of this particular PTransform
object. This allows chaining type-hinting related methods.Return type: PTransform
- features – (List[
-
class
apache_beam.ml.gcp.visionml.
AnnotateImageWithContext
(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, metadata=None)[source]¶ Bases:
apache_beam.ml.gcp.visionml.AnnotateImage
A
PTransform
for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/ Batches elements together usingutil.BatchElements
PTransform and sends each batch of elements to the GCP Vision API.Element is a tuple of:
(Union[text_type, binary_type], Optional[``vision.types.ImageContext``])
where the former is either an URI (e.g. a GCS URI) or binary_type base64-encoded image data.
Parameters: - features – (List[
vision.types.Feature.enums.Feature
]) Required. The Vision API features to detect - retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
- timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
- max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
- min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
- client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
- metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
-
MAX_BATCH_SIZE
= 5¶
-
MIN_BATCH_SIZE
= 1¶
-
annotations
() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label
()¶
-
default_type_hints
()¶
-
display_data
()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:value
pairs. The value might be an integer, float or string value; aDisplayDataItem
for values that have more data (e.g. short value, label, url); or aHasDisplayData
instance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api
(proto, context)¶
-
get_type_hints
()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing
(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type
(unused_input_type)¶
-
label
¶
-
pipeline
= None¶
-
classmethod
register_urn
(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input
()¶
-
side_inputs
= ()¶
-
to_runner_api
(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter
(unused_context)¶
-
to_runner_api_pickled
(unused_context)¶
-
type_check_inputs
(pvalueish)¶
-
type_check_inputs_or_outputs
(pvalueish, input_or_output)¶
-
type_check_outputs
(pvalueish)¶
-
with_input_types
(input_type_hint)¶ Annotates the input type of a
PTransform
with a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint
.Raises: TypeError
– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()
for further details.Returns: A reference to the instance of this particular PTransform
object. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types
(type_hint)¶ Annotates the output type of a
PTransform
with a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint
.Raises: TypeError
– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()
for further details.Returns: A reference to the instance of this particular PTransform
object. This allows chaining type-hinting related methods.Return type: PTransform
- features – (List[