apache_beam.ml.gcp.visionml module¶
A connector for sending API requests to the GCP Vision API.
-
class
apache_beam.ml.gcp.visionml.AnnotateImage(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, context_side_input=None, metadata=None)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransformA
PTransformfor annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/Batches elements together using
util.BatchElementsPTransform and sends each batch of elements to the GCP Vision API. Element is a Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or binary_type base64-encoded image data. Accepts an AsDict side input that maps each image to an image context.Parameters: - features – (List[
vision.types.Feature.enums.Feature]) Required. The Vision API features to detect - retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
- timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
- max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
- min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
- client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
- context_side_input –
(beam.pvalue.AsDict) Optional. An
AsDictof a PCollection to be passed to the _ImageAnnotateFn as the image context mapping containing additional image context and/or feature-specific parameters. Example usage:image_contexts = [(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict, ``vision.types.ImageContext()``]), (''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict, ``vision.types.ImageContext()``]),] context_side_input = ( p | "Image contexts" >> beam.Create(image_contexts) ) visionml.AnnotateImage(features, context_side_input=beam.pvalue.AsDict(context_side_input))) - metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
-
MAX_BATCH_SIZE= 5¶
-
MIN_BATCH_SIZE= 1¶
-
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label()¶
-
default_type_hints()¶
-
display_data()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api(proto, context)¶
-
get_type_hints()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type(unused_input_type)¶
-
label¶
-
pipeline= None¶
-
classmethod
register_urn(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input()¶
-
side_inputs= ()¶
-
to_runner_api(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter(unused_context)¶
-
to_runner_api_pickled(unused_context)¶
-
type_check_inputs(pvalueish)¶
-
type_check_inputs_or_outputs(pvalueish, input_or_output)¶
-
type_check_outputs(pvalueish)¶
-
with_input_types(input_type_hint)¶ Annotates the input type of a
PTransformwith a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.Raises: TypeError– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types(type_hint)¶ Annotates the output type of a
PTransformwith a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.Raises: TypeError– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
- features – (List[
-
class
apache_beam.ml.gcp.visionml.AnnotateImageWithContext(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, metadata=None)[source]¶ Bases:
apache_beam.ml.gcp.visionml.AnnotateImageA
PTransformfor annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/ Batches elements together usingutil.BatchElementsPTransform and sends each batch of elements to the GCP Vision API.Element is a tuple of:
(Union[text_type, binary_type], Optional[``vision.types.ImageContext``])
where the former is either an URI (e.g. a GCS URI) or binary_type base64-encoded image data.
Parameters: - features – (List[
vision.types.Feature.enums.Feature]) Required. The Vision API features to detect - retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
- timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
- max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
- min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
- client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
- metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
-
MAX_BATCH_SIZE= 5¶
-
MIN_BATCH_SIZE= 1¶
-
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]¶
-
default_label()¶
-
default_type_hints()¶
-
display_data()¶ Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
Returns: A dictionary containing key:valuepairs. The value might be an integer, float or string value; aDisplayDataItemfor values that have more data (e.g. short value, label, url); or aHasDisplayDatainstance that has more display data that should be picked up. For example:{ 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent }
Return type: Dict[str, Any]
-
classmethod
from_runner_api(proto, context)¶
-
get_type_hints()¶ Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.
-
get_windowing(inputs)¶ Returns the window function to be associated with transform’s output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
-
infer_output_type(unused_input_type)¶
-
label¶
-
pipeline= None¶
-
classmethod
register_urn(urn, parameter_type, constructor=None)¶
-
runner_api_requires_keyed_input()¶
-
side_inputs= ()¶
-
to_runner_api(context, has_parts=False, **extra_kwargs)¶
-
to_runner_api_parameter(unused_context)¶
-
to_runner_api_pickled(unused_context)¶
-
type_check_inputs(pvalueish)¶
-
type_check_inputs_or_outputs(pvalueish, input_or_output)¶
-
type_check_outputs(pvalueish)¶
-
with_input_types(input_type_hint)¶ Annotates the input type of a
PTransformwith a type-hint.Parameters: input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.Raises: TypeError– If input_type_hint is not a valid type-hint. Seeapache_beam.typehints.typehints.validate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
-
with_output_types(type_hint)¶ Annotates the output type of a
PTransformwith a type-hint.Parameters: type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.Raises: TypeError– If type_hint is not a valid type-hint. Seevalidate_composite_type_param()for further details.Returns: A reference to the instance of this particular PTransformobject. This allows chaining type-hinting related methods.Return type: PTransform
- features – (List[