An Eager Avocado

Eager Avocado

I give myself very good advice, but I very seldom follow it.

Read/Write data with rhdf5

,

The inconsistency

Original matrix a

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    6   11   16   21   26   31   36
## [2,]    2    7   12   17   22   27   32   37
## [3,]    3    8   13   18   23   28   33   38
## [4,]    4    9   14   19   24   29   34   39
## [5,]    5   10   15   20   25   30   35   40

Matrix created and written with rhdf5

The matrix written to a data set created by rhdf5

## Error in h5checktypeOrOpenLoc(file): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.
## Error in h5checktypeOrOpenLoc(file, readonly = TRUE): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.

This matrix when shown in HDFView looks like this

"a-rhdf5"

Matrix created in HDFView and filled by rhdf5

The matrix written to a data set created by HDFView, with Dimensions = 5 x 8

"a-gui"

And read into R

h5write(a,hf,"matrices/a-gui")
## Error in h5checktypeOrOpenLoc(file): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.
(a.gui = h5read(hf,"matrices/a-gui"))
## Error in h5checktypeOrOpenLoc(file, readonly = TRUE): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.

Cause and Fix

The cause of such inconsistency is in the way h5write and h5read works with HDF5 file. Any flattening of nD array is done by-row in HDFView, and by-column in R. So when h5write is called, the input matrix is first flattened by column in R, and then write to the file by row. Conversely, when h5read is called, the data set is read by row, and R represent it by col. Essentially any reading operation into R will transpose the matrix stored in the file (as seen in HDFView). So to get back the original, we should always transpose the data set after reading.

In order to preserve the input matrix when writing into HDF5 file, we need to transpose the matrix before writing, so it will show up correctly in C-program (like HDFView)

"a-gui-fixed"

And then transpose it again after reading it into R to get back the original dimension

h5write(t(a),hf,"matrices/a-gui")
## Error in h5checktypeOrOpenLoc(file): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.
a.gui = h5read(hf,"matrices/a-gui")
## Error in h5checktypeOrOpenLoc(file, readonly = TRUE): Error in h5checktypeOrOpenLoc(). Cannot open file. File 'rhdf5_demo.h5' does not exist.
t(a.gui)
## Error in t(a.gui): object 'a.gui' not found