Bug 1080 - Add support for memory mapped big file I/O via specialized InputStream and OutputStream, incl. mark/reset
Summary: Add support for memory mapped big file I/O via specialized InputStream and Ou...
Alias: None
Product: Gluegen
Classification: JogAmp
Component: core (show other bugs)
Version: 2.3.0
Hardware: All all
: --- enhancement
Assignee: Sven Gothel
Depends on:
Reported: 2014-09-25 23:42 CEST by Sven Gothel
Modified: 2019-03-29 17:54 CET (History)
0 users

See Also:
SCM Refs:
ae17a5895088e321bc373318cc1e144a2f822f29 95c4a3c7b6b256de4293ed1b31380d6af5ab59d0 92a6d2c1476fd562721f231f89afba9342ed8a20 00a9ee70054872712017b5a14b19aa92068c8420 a7a3d5ab98ee0ad33fdef50bf081afeb8295ebe4 bd240ebfe09b7c7a21689dee8be0cc673eb7f340
Workaround: ---


Note You need to log in before you can comment on or make changes to this bug.
Description Sven Gothel 2014-09-25 23:42:36 CEST
It is desired to read and write big files via InputStream and OutputStream 
while having mark/reset supported in a most efficient way.

BufferedInputStream, which does support mark/reset,
can only handle up to 2MiB files due to byte[] usage.

This is even more restricted on some platforms, 
since it uses heap memory which might be not available.

Further, performance is not ideal.


Add memory mapped InputStream and OutputStream implementations
supporting mark/reset.
Comment 1 Sven Gothel 2014-09-25 23:52:13 CEST
commit ae17a5895088e321bc373318cc1e144a2f822f29

Add read support for memory mapped big file I/O via specialized InputStream impl., incl. mark/reset

- ByteBufferInputStream simply impl. InputStream for an arbitrary 2MiB restricted ByteBuffer
  - Users may only need a smaller implementation for 'smaller' file sizes
    or for streaming a [native] ByteBuffer.

- MappedByteBufferInputStream impl. InputStream for any file size,
  while slicing the total size to memory mapped buffers via the given FileChannel.
  The latter are mapped lazily and diff. flush/cache methods are supported
  to ease virtual memory usage.

- TestByteBufferInputStream: Basic unit test for basic functionality and perf. stats.
Comment 2 Sven Gothel 2014-09-26 12:30:30 CEST

Fix TestByteBufferInputStream: Handle OutOfMemoryError cause in IOException (Add note to FLUSH_NONE); Reduce test load / duration.
Comment 3 Sven Gothel 2014-09-26 12:31:10 CEST

Bug 1080 - Add write support for memory mapped big file I/O via specialized OutputStream impl.

Added MappedByteBufferOutputStream as a child instance of MappedByteBufferInputStream,
since the latter already manages the file's mapped buffer slices.

Current design is:
  - MappedByteBufferInputStream (parent)
    - MappedByteBufferOutputStream

this is due to InputStream and OutputStream not being interfaces,
but most functionality is provided in one class.

We could redesign both as follows:
  - MappedByteBufferIOStream (parent)
    - MappedByteBufferInputStream
    - MappedByteBufferOutputStream

This might visualize things better .. dunno whether its worth the
extra redirection.


  - Adding [file] resize support via custom FileResizeOp
  - All construction happens via ctors
  - Handle refCount, incr. by ctor and getOutputStream(..), decr by close
  - Check whether stream is closed already -> IOException
  - Simplify / Reuse code

  - Adding simple write operations
Comment 4 Sven Gothel 2014-09-26 12:32:20 CEST
Basic functionality now added incl. unit tests
passed on Windows and GNU/Linux 32- and 64bit
using JRE7 and JRE8 (Oracle/OpenJDK).

Further refinements may happen via a followup bug report.
Comment 5 Sven Gothel 2014-09-27 19:56:08 CEST
To render the MappedByteBuffer*Stream more useful, 
we might add JNI native mmap and munmap ?

This would enhance 'flushing' of a mapped buffer slice
and hopping to the next.
Right now, we use an array of slices,
but native mmap/munmap could remove such use,
map the current 'window' directly
and also ensuring the unmap and hence release.
Currently the unmap is only impl. in a fuzzy way,
i.e. via GC or private 'cleaner' method.


Also a r/w method using ByteBuffers might seem useful as well.

Comment 6 Sven Gothel 2014-09-29 03:58:46 CEST
commit 00a9ee70054872712017b5a14b19aa92068c8420
  Refine MappedByteBuffer*Stream impl. and API [doc], 
  adding stream to stream copy 
  as well as direct memory mapped ByteBuffer access
Comment 7 Sven Gothel 2014-10-03 03:24:38 CEST

- Validate active and GC'ed mapped-buffer count
  in cleanAllSlices() via close() ..

- Fix missing unmapping last buffer in notifyLengthChangeImpl(),
  branch criteria was off by one.

- cleanSlice(..) now also issues cleanBuffer(..) on the GC'ed entry,
  hence if WeakReference is still alive, enforce it's release.

- cleanBuffer(..) reverts FLUSH_PRE_HARD -> FLUSH_PRE_SOFT
  in case of an error.

- flush() -> flush(boolean metaData) to expose FileChannel.force(metaData).

- Add synchronous mode, flushing/syncing the mapped buffers when
  in READ_WRITE mapping mode and issue FileChannel.force() if not READ_ONLY.

  Above is implemented via flush()/flushImpl(..) for buffers and FileChannel,
  as well as in syncSlice(..) for buffers only.

  flush*()/syncSlice() is covered by:
    - setLength()
    - notifyLengthChange*(..)
    - nextSlice()

  Always issue flushImpl() in close().

- Windows: Clean all buffers in setLength(),
  otherwise Windows will report:

- Windows: Catch MappedByteBuffer.force() IOException

- Optimization of position(..)
  position(..) is now standalone to allow issuing flushSlice(..)
  before gathering the new mapped buffer.
  This shall avoid one extra cache miss.

  Hence rename positionImpl(..) -> position2(..).

- All MappedByteBufferOutputStream.write(..) methods
  issue syncSlice(..) on the last written current slice
  to ensure new 'synchronous' mode is honored.


Unit tests:

- Ensure test files are being deleted

- TestByteBufferCopyStream: Reduced test file size to more sensible values.
Comment 8 Sven Gothel 2014-10-03 04:18:00 CEST

MappedByteBufferInputStream: Default CacheMode is FLUSH_PRE_HARD now (was FLUSH_PRE_SOFT)

FLUSH_PRE_SOFT cannot be handled by some platforms, e.g. Windows 32bit.

FLUSH_PRE_HARD is the most reliable caching mode
and it will fallback to FLUSH_PRE_SOFT if no method for 'cleaner' exists.

Further, FLUSH_PRE_HARD turns our to be the fastest mode as well.