apache_beam.io.filesystemio module

Utilities for FileSystem implementations.

class apache_beam.io.filesystemio.Downloader[source]

Bases: object

Download interface for a single file.

Implementations should support random access reads.

size

Size of file to download.

get_range(start, end)[source]

Retrieve a given byte range [start, end) from this download.

Range must be in this form:
0 <= start < end: Fetch the bytes from start to end.
Parameters:
  • start – (int) Initial byte offset.
  • end – (int) Final byte offset, exclusive.
Returns:

(string) A buffer containing the requested data.

class apache_beam.io.filesystemio.Uploader[source]

Bases: object

Upload interface for a single file.

put(data)[source]

Write data to file sequentially.

Parameters:data – (memoryview) Data to write.
finish()[source]

Signal to upload any remaining data and close the file.

File should be fully written upon return from this method.

Raises:Any error encountered during the upload.
class apache_beam.io.filesystemio.DownloaderStream(downloader, read_buffer_size=8192, mode='rb')[source]

Bases: io.RawIOBase

Provides a stream interface for Downloader objects.

Initializes the stream.

Parameters:
  • downloader – (Downloader) Filesystem dependent implementation.
  • read_buffer_size – (int) Buffer size to use during read operations.
  • mode – (string) Python mode attribute for this stream.
readinto(b)[source]

Read up to len(b) bytes into b.

Returns number of bytes read (0 for EOF).

Parameters:b – (bytearray/memoryview) Buffer to read into.
seek(offset, whence=0)[source]

Set the stream’s current offset.

Note if the new offset is out of bound, it is adjusted to either 0 or EOF.

Parameters:
  • offset – seek offset as number.
  • whence – seek mode. Supported modes are os.SEEK_SET (absolute seek), os.SEEK_CUR (seek relative to the current position), and os.SEEK_END (seek relative to the end, offset should be negative).
Raises:

ValueError – When this stream is closed or if whence is invalid.

tell()[source]

Tell the stream’s current offset.

Returns:current offset in reading this stream.
Raises:ValueError – When this stream is closed.
seekable()[source]
readable()[source]
readall()[source]

Read until EOF, using multiple read() call.

close()

Flush and close the IO object.

This method has no effect if the file is already closed.

closed
fileno()

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

flush()

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

read()
readline()

Read and return a line from the stream.

If size is specified, at most size bytes will be read.

The line terminator is always b’n’ for binary files; for text files, the newlines argument to open can be used to select the line terminator(s) recognized.

readlines()

Return a list of lines from the stream.

hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()

Return whether object was opened for writing.

If False, write() will raise OSError.

write()
writelines()

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

class apache_beam.io.filesystemio.UploaderStream(uploader, mode='wb')[source]

Bases: io.RawIOBase

Provides a stream interface for Uploader objects.

Initializes the stream.

Parameters:
  • uploader – (Uploader) Filesystem dependent implementation.
  • mode – (string) Python mode attribute for this stream.
tell()[source]
write(b)[source]

Write bytes from b.

Returns number of bytes written (<= len(b)).

Parameters:b – (memoryview) Buffer with data to write.
close()[source]

Complete the upload and close this stream.

This method has no effect if the stream is already closed.

Raises:Any error encountered by the uploader.
writable()[source]
closed
fileno()

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

flush()

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

read()
readable()

Return whether object was opened for reading.

If False, read() will raise OSError.

readall()

Read until EOF, using multiple read() call.

readinto()
readline()

Read and return a line from the stream.

If size is specified, at most size bytes will be read.

The line terminator is always b’n’ for binary files; for text files, the newlines argument to open can be used to select the line terminator(s) recognized.

readlines()

Return a list of lines from the stream.

hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

seek()

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

  • 0 – start of stream (the default); offset should be zero or positive
  • 1 – current stream position; offset may be negative
  • 2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise OSError. This method may need to do a test seek().

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writelines()

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

class apache_beam.io.filesystemio.PipeStream(recv_pipe)[source]

Bases: object

A class that presents a pipe connection as a readable stream.

Not thread-safe.

Remembers the last size bytes read and allows rewinding the stream by that amount exactly. See BEAM-6380 for more.

read(size)[source]

Read data from the wrapped pipe connection.

Parameters:size – Number of bytes to read. Actual number of bytes read is always equal to size unless EOF is reached.
Returns:data read as str.
tell()[source]

Tell the file’s current offset.

Returns:current offset in reading this file.
Raises:ValueError – When this stream is closed.
seek(offset, whence=0)[source]