apache_beam.io.hadoopfilesystem module¶
FileSystem
implementation for accessing
Hadoop Distributed File System files.
-
class
apache_beam.io.hadoopfilesystem.
HadoopFileSystem
(pipeline_options)[source]¶ Bases:
apache_beam.io.filesystem.FileSystem
FileSystem
implementation that supports HDFS.URL arguments to methods expect strings starting with
hdfs://
.Initializes a connection to HDFS.
Connection configuration is done by passing pipeline options. See
HadoopFileSystemOptions
.-
join
(base_url, *paths)[source]¶ Join two or more pathname components.
Parameters: - base_url – string path of the first component of the path. Must start with hdfs://.
- paths – path components to be added
Returns: Full url after combining all the passed components.
-
create
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
open
(url, mime_type='application/octet-stream', compression_type='auto')[source]¶ Returns: A Python File-like object.
-
copy
(source_file_names, destination_file_names)[source]¶ It is an error if any file to copy already exists at the destination.
Raises
BeamIOError
if any error occurred.Parameters: - source_file_names – iterable of URLs.
- destination_file_names – iterable of URLs.
-
exists
(url)[source]¶ Checks existence of url in HDFS.
Parameters: url – String in the form hdfs://… Returns: True if url exists as a file or directory in HDFS.
-
checksum
(url)[source]¶ Fetches a checksum description for a URL.
Returns: String describing the checksum.
-
CHUNK_SIZE
= 1¶
-
classmethod
get_all_plugin_paths
()¶ Get full import paths of the BeamPlugin subclass.
-
classmethod
get_all_subclasses
()¶ Get all the subclasses of the BeamPlugin class.
-
match
(patterns, limits=None)¶ Find all matching paths to the patterns provided.
See also
Patterns ending with ‘/’ or ‘’ will be appended with ‘*’.
Parameters: - patterns – list of string for the file path pattern to match against
- limits – list of maximum number of responses that need to be fetched
Returns: list of
MatchResult
objects.Raises: BeamIOError
– if any of the pattern match operations fail
-
match_files
(file_metas, pattern)¶ Filter
FileMetadata
objects by patternParameters: - file_metas (list of
FileMetadata
) – Files to consider when matching - pattern (str) – File pattern
See also
Returns: Generator of matching FileMetadata
- file_metas (list of
-
static
translate_pattern
(pattern)¶ Translate a pattern to a regular expression. There is no way to quote meta-characters.
- Pattern syntax:
The pattern syntax is based on the fnmatch syntax, with the following differences:
*
Is equivalent to[^/\]*
rather than.*
.**
Is equivalent to.*
.
See also
match()
uses this methodThis method is based on Python 2.7’s fnmatch.translate. The code in this method is licensed under PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2.
-