Hdf2nc

From LOFS
Revision as of 18:38, 7 October 2019 by Orf (talk | contribs) (More descriptions of arguments)

Jump to: navigation, search

Converts any subset of LOFS raw data to a single netCDF file.

Typical usage: hdf2nc --time=[time] --histpath=[histpath] --base=[ncbase] --x0=[X0] --y0=[Y0] --x1=[X1] --y1=[Y1] --z0=[Z0] --z1=[Z1] [varname1 ... varnameN]

--time=[time](required): The model time requested in seconds

--histpath=[histpath] (required): Top level directory that contains the 3D model data

--ncbase=[ncbase] (required): base name of netCDF file

--x0=X0 (optional) westmost index in X. Defaults to westmost index in X found in the saved data.

--x1=X1 (optional) eastmost index in X. Defaults to eastmost index in X found in the saved data.

--y0=Y0 (optional) southmost index in Y. Defaults to southmost index in Y found in the saved data.

--y1=Y1 (optional) northmost index in Y. Defaults to northmost index in Y found in the saved data.

--z0=Z0 (optional) bottommost index in Z. Defaults to 0.

--x0=X0 (optional) topmost index in Z. Defaults to topmost index in Z found in the saved data.

(X0,Y0,Z0)(X1,Y1,Z1): This defines the volume (or space) that you wish to convert to netCDF fdata, with respect to integer array indices that span the full model domain, namely (0,0,0) to (nx-1,ny-1,nz-1). Each of these are optional. If none of these are passed to hdf2nc, the full 3D range of data saved will be converted to netCDF. If only some of them are provided, the remainder default to the min/max values from the saved data (the code will extract these values from what has been saved, making no assumptions).

varname1...varnameN (optional) The list of variables, separated by whitespace, that you wish to convert.

--debug (optional) Turn on debugging output.

--recache (optional) Force regeneration of cache files.

--swaths (optional) Read and write all 2D swath data to netCDF files.

--compress (optional) Turn on lossless gzip compression for 3D data saved in netCDF files.

--offset (optional) Supplied X0,X1,Y0,Y1 values are with respect to what was saved, not (0,0). For instance, if only a subset of the domain was saved, say ranging from 500 to 1000 in X and 600 to 1100 in Y, the following two statements arguments would produce identical results:

--x0=50 --y0=100 --x1=250 --y1=300 --offset
--x0=550 --y0=700 --x1=750 --y1=900

hdf2nc Example

h2ologin2:~/project-bagm/brainstorm2017/15m/history.fs8-15m-a% hdf2nc  --histpath=3D --base=frob --time=7101 --x0=2300 --y0=2300 --x1=2400 --y1=2400 --z1=10 uinterp winterp dbz
histpath = 3D
ncbase = frob
time = 7101.000000
X0 = 2300
Y0 = 2300
X1 = 2400
Y1 = 2400
Z1 = 10
Setting Z0 to default value of 0
Read cached num_time_dirs of 146
ntimedirs: 146
Read cached sorted time dirs
Read cached num node dirs
Read cached nodedir
Read cached firstfilename and all times

We are requesting the following fields: uinterp winterp dbz 

Working on surface 2D thrhopert and dbz (
acca
czza
czza
aaaa

acca
czza
czza
aaaa
)
Working on uinterp (
acca
czza
czza
aaaa
)
Working on winterp (
acca
czza
czza
aaaa
)
Working on dbz (
acca
czza
czza
aaaa
)
h2ologin2:~/project-bagm/brainstorm2017/15m/history.fs8-15m-a% lt
total 168656
drwxr-xr-x  10 orf PRAC_bagm      4096 Jul 31 11:12 ./
-rw-r--r--   1 orf PRAC_bagm   1476356 Jul 31 11:12 frob.07101.000000.nc
-rw-r--r--   1 orf PRAC_bagm       116 Jul 31 11:12 frob.07101.000000.nc.cmd

Discussion

If no array index options are provided to the command line, hdf2nc will attempt to convert the entire model domain to a netCDF file. In other words, X0,Y0,Z0,X1,Y1,Z1 are optional arguments to hdf2nc. However for typical use cases with large amounts of data, you will want to convert only a subset of the full model domain!

The output of hdf2nc includes information on some basic metadata, plus some output that tracks the reading of the individual hdf5 files that comprise LOFS. Each letter that comprises the output that looks like

acca
czza
czza
aaaa

represents the successful reading of data from a single hdf5 file. It's kind of a 'base 26' representation of the percentage of data (in the horizontal) requested from each file. If a z is printed, that means the full horizontal range of data was requested. If a is printed, a tiny piece of the horizontal range was selected. All intermediate letters represent the space between these two extremes. This output is for your entertainment only; you are essentially watching the assembly of the netCDF file from LOFS data in real time.

hdf2nc always produces a 2D surface plot of density potential temperature perturbation from base state (proportional to buoyancy) and surface (calculated) radar reflectivity. These fields are used so much that they are always written to the netCDF files whether they are requested or not.

Regarding the mention of cached data, the LOFS read routines will look for existing cache files before going out and getting all the metadata from hdf5 files, which, for large amounts of data, is very expensive. Since the data layout never changes (unlesss you change it) the cached files speed things up quite a bit. If you ever change your LOFS data (say, adding new time directories), you must remove the cache files and let LOFS regenerate them so they will contain the new information. Cache files all are prefixed by .cm1hdf5_ and can always be removed, as they will always be regenerated.

In this example, the output file name is frob.07101.000000.nc, indicating data that was retrieved at t=07101.000000 seconds. Note that LOFS allows for the saving and retrieval of data saved in intervals of less than one second, as time is represented as a floating point variable.

Rationale

LOFS splits the model domain and times into files spread across hundreds directories in large simulations. Often times you may wish to analyze, plot, or visualize a subset of the full model domain at a given time, perhaps to make plots or to feed into visualization software that understands the netCDF format which is one of the most commonly used data formats used in atmospheric science.