|Summary:||Add support for memory mapped big file I/O via specialized InputStream and OutputStream, incl. mark/reset|
|Product:||[JogAmp] Gluegen||Reporter:||Sven Gothel <sgothel>|
|Component:||core||Assignee:||Sven Gothel <sgothel>|
ae17a5895088e321bc373318cc1e144a2f822f29 95c4a3c7b6b256de4293ed1b31380d6af5ab59d0 92a6d2c1476fd562721f231f89afba9342ed8a20 00a9ee70054872712017b5a14b19aa92068c8420 a7a3d5ab98ee0ad33fdef50bf081afeb8295ebe4 bd240ebfe09b7c7a21689dee8be0cc673eb7f340
Description Sven Gothel 2014-09-25 23:42:36 CEST
It is desired to read and write big files via InputStream and OutputStream while having mark/reset supported in a most efficient way. BufferedInputStream, which does support mark/reset, can only handle up to 2MiB files due to byte usage. This is even more restricted on some platforms, since it uses heap memory which might be not available. Further, performance is not ideal. +++ Add memory mapped InputStream and OutputStream implementations supporting mark/reset.
Comment 1 Sven Gothel 2014-09-25 23:52:13 CEST
commit ae17a5895088e321bc373318cc1e144a2f822f29 Add read support for memory mapped big file I/O via specialized InputStream impl., incl. mark/reset - ByteBufferInputStream simply impl. InputStream for an arbitrary 2MiB restricted ByteBuffer - Users may only need a smaller implementation for 'smaller' file sizes or for streaming a [native] ByteBuffer. - MappedByteBufferInputStream impl. InputStream for any file size, while slicing the total size to memory mapped buffers via the given FileChannel. The latter are mapped lazily and diff. flush/cache methods are supported to ease virtual memory usage. - TestByteBufferInputStream: Basic unit test for basic functionality and perf. stats.
Comment 2 Sven Gothel 2014-09-26 12:30:30 CEST
95c4a3c7b6b256de4293ed1b31380d6af5ab59d0 Fix TestByteBufferInputStream: Handle OutOfMemoryError cause in IOException (Add note to FLUSH_NONE); Reduce test load / duration.
Comment 3 Sven Gothel 2014-09-26 12:31:10 CEST
92a6d2c1476fd562721f231f89afba9342ed8a20 Bug 1080 - Add write support for memory mapped big file I/O via specialized OutputStream impl. Added MappedByteBufferOutputStream as a child instance of MappedByteBufferInputStream, since the latter already manages the file's mapped buffer slices. Current design is: - MappedByteBufferInputStream (parent) - MappedByteBufferOutputStream this is due to InputStream and OutputStream not being interfaces, but most functionality is provided in one class. We could redesign both as follows: - MappedByteBufferIOStream (parent) - MappedByteBufferInputStream - MappedByteBufferOutputStream This might visualize things better .. dunno whether its worth the extra redirection. +++ MappedByteBufferInputStream: - Adding [file] resize support via custom FileResizeOp - All construction happens via ctors - Handle refCount, incr. by ctor and getOutputStream(..), decr by close - Check whether stream is closed already -> IOException - Simplify / Reuse code MappedByteBufferOutputStream: - Adding simple write operations
Comment 4 Sven Gothel 2014-09-26 12:32:20 CEST
Basic functionality now added incl. unit tests passed on Windows and GNU/Linux 32- and 64bit using JRE7 and JRE8 (Oracle/OpenJDK). Further refinements may happen via a followup bug report.
Comment 5 Sven Gothel 2014-09-27 19:56:08 CEST
To render the MappedByteBuffer*Stream more useful, we might add JNI native mmap and munmap ? This would enhance 'flushing' of a mapped buffer slice and hopping to the next. Right now, we use an array of slices, but native mmap/munmap could remove such use, map the current 'window' directly and also ensuring the unmap and hence release. Currently the unmap is only impl. in a fuzzy way, i.e. via GC or private 'cleaner' method. +++ Also a r/w method using ByteBuffers might seem useful as well. +++
Comment 6 Sven Gothel 2014-09-29 03:58:46 CEST
commit 00a9ee70054872712017b5a14b19aa92068c8420 Refine MappedByteBuffer*Stream impl. and API [doc], adding stream to stream copy as well as direct memory mapped ByteBuffer access
Comment 7 Sven Gothel 2014-10-03 03:24:38 CEST
a7a3d5ab98ee0ad33fdef50bf081afeb8295ebe4 - Validate active and GC'ed mapped-buffer count in cleanAllSlices() via close() .. - Fix missing unmapping last buffer in notifyLengthChangeImpl(), branch criteria was off by one. - cleanSlice(..) now also issues cleanBuffer(..) on the GC'ed entry, hence if WeakReference is still alive, enforce it's release. - cleanBuffer(..) reverts FLUSH_PRE_HARD -> FLUSH_PRE_SOFT in case of an error. - flush() -> flush(boolean metaData) to expose FileChannel.force(metaData). - Add synchronous mode, flushing/syncing the mapped buffers when in READ_WRITE mapping mode and issue FileChannel.force() if not READ_ONLY. Above is implemented via flush()/flushImpl(..) for buffers and FileChannel, as well as in syncSlice(..) for buffers only. flush*()/syncSlice() is covered by: - setLength() - notifyLengthChange*(..) - nextSlice() Always issue flushImpl() in close(). - Windows: Clean all buffers in setLength(), otherwise Windows will report: - Windows: Catch MappedByteBuffer.force() IOException - Optimization of position(..) position(..) is now standalone to allow issuing flushSlice(..) before gathering the new mapped buffer. This shall avoid one extra cache miss. Hence rename positionImpl(..) -> position2(..). - All MappedByteBufferOutputStream.write(..) methods issue syncSlice(..) on the last written current slice to ensure new 'synchronous' mode is honored. +++ Unit tests: - Ensure test files are being deleted - TestByteBufferCopyStream: Reduced test file size to more sensible values.
Comment 8 Sven Gothel 2014-10-03 04:18:00 CEST
bd240ebfe09b7c7a21689dee8be0cc673eb7f340 MappedByteBufferInputStream: Default CacheMode is FLUSH_PRE_HARD now (was FLUSH_PRE_SOFT) FLUSH_PRE_SOFT cannot be handled by some platforms, e.g. Windows 32bit. FLUSH_PRE_HARD is the most reliable caching mode and it will fallback to FLUSH_PRE_SOFT if no method for 'cleaner' exists. Further, FLUSH_PRE_HARD turns our to be the fastest mode as well.