apache_beam.ml.gcp.visionml module

A connector for sending API requests to the GCP Vision API.

class apache_beam.ml.gcp.visionml.AnnotateImage(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, context_side_input=None, metadata=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/

Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API. Element is a Union[text_type, binary_type] of either an URI (e.g. a GCS URI) or binary_type base64-encoded image data. Accepts an AsDict side input that maps each image to an image context.

Parameters:
  • features – (List[vision.types.Feature.enums.Feature]) Required. The Vision API features to detect
  • retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
  • timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
  • max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
  • min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
  • client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
  • context_side_input

    (beam.pvalue.AsDict) Optional. An AsDict of a PCollection to be passed to the _ImageAnnotateFn as the image context mapping containing additional image context and/or feature-specific parameters. Example usage:

    image_contexts =
      [(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
      ``vision.types.ImageContext()``]),
      (''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
      ``vision.types.ImageContext()``]),]
    
    context_side_input =
      (
        p
        | "Image contexts" >> beam.Create(image_contexts)
      )
    
    visionml.AnnotateImage(features,
      context_side_input=beam.pvalue.AsDict(context_side_input)))
    
  • metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
MAX_BATCH_SIZE = 5
MIN_BATCH_SIZE = 1
expand(pvalue)[source]
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
class apache_beam.ml.gcp.visionml.AnnotateImageWithContext(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, metadata=None)[source]

Bases: apache_beam.ml.gcp.visionml.AnnotateImage

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/ Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API.

Element is a tuple of:

(Union[text_type, binary_type],
Optional[``vision.types.ImageContext``])

where the former is either an URI (e.g. a GCS URI) or binary_type base64-encoded image data.

Parameters:
  • features – (List[vision.types.Feature.enums.Feature]) Required. The Vision API features to detect
  • retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
  • timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
  • max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
  • min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
  • client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
  • metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.
expand(pvalue)[source]
MAX_BATCH_SIZE = 5
MIN_BATCH_SIZE = 1
annotations() → Dict[str, Union[bytes, str, google.protobuf.message.Message]]
default_label()
default_type_hints()
display_data()

Returns the display data associated to a pipeline component.

It should be reimplemented in pipeline components that wish to have static display data.

Returns:A dictionary containing key:value pairs. The value might be an integer, float or string value; a DisplayDataItem for values that have more data (e.g. short value, label, url); or a HasDisplayData instance that has more display data that should be picked up. For example:
{
  'key1': 'string_value',
  'key2': 1234,
  'key3': 3.14159265,
  'key4': DisplayDataItem('apache.org', url='http://apache.org'),
  'key5': subComponent
}
Return type:Dict[str, Any]
classmethod from_runner_api(proto, context)
get_type_hints()

Gets and/or initializes type hints for this object.

If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.__class__ type hints.

get_windowing(inputs)

Returns the window function to be associated with transform’s output.

By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).

infer_output_type(unused_input_type)
label
pipeline = None
classmethod register_urn(urn, parameter_type, constructor=None)
runner_api_requires_keyed_input()
side_inputs = ()
to_runner_api(context, has_parts=False, **extra_kwargs)
to_runner_api_parameter(unused_context)
to_runner_api_pickled(unused_context)
type_check_inputs(pvalueish)
type_check_inputs_or_outputs(pvalueish, input_or_output)
type_check_outputs(pvalueish)
with_input_types(input_type_hint)

Annotates the input type of a PTransform with a type-hint.

Parameters:input_type_hint (type) – An instance of an allowed built-in type, a custom class, or an instance of a TypeConstraint.
Raises:TypeError – If input_type_hint is not a valid type-hint. See apache_beam.typehints.typehints.validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform
with_output_types(type_hint)

Annotates the output type of a PTransform with a type-hint.

Parameters:type_hint (type) – An instance of an allowed built-in type, a custom class, or a TypeConstraint.
Raises:TypeError – If type_hint is not a valid type-hint. See validate_composite_type_param() for further details.
Returns:A reference to the instance of this particular PTransform object. This allows chaining type-hinting related methods.
Return type:PTransform