Hdf2nc

From LOFS
Revision as of 19:31, 7 October 2019 by Orf (talk | contribs)

Jump to: navigation, search

Converts any subset of LOFS raw data to a single netCDF file.

Typical usage: hdf2nc --time=[time] --histpath=[histpath] --base=[ncbase] --x0=[X0] --y0=[Y0] --x1=[X1] --y1=[Y1] --z0=[Z0] --z1=[Z1] [varname1 ... varnameN]

--time=[time](required): The model time requested in seconds

--histpath=[histpath] (required): Top level directory that contains the 3D model data

--ncbase=[ncbase] (required): base name of netCDF file

--x0=X0 (optional) westmost index in X. Defaults to westmost index in X found in the saved data.

--x1=X1 (optional) eastmost index in X. Defaults to eastmost index in X found in the saved data.

--y0=Y0 (optional) southmost index in Y. Defaults to southmost index in Y found in the saved data.

--y1=Y1 (optional) northmost index in Y. Defaults to northmost index in Y found in the saved data.

--z0=Z0 (optional) bottommost index in Z. Defaults to 0.

--x0=X0 (optional) topmost index in Z. Defaults to topmost index in Z found in the saved data.

(X0,Y0,Z0)(X1,Y1,Z1): This defines the volume (or space) that you wish to convert to netCDF fdata, with respect to integer array indices that span the full model domain, namely (0,0,0) to (nx-1,ny-1,nz-1). Each of these are optional. If none of these are passed to hdf2nc, the full 3D range of data saved will be converted to netCDF. If only some of them are provided, the remainder default to the min/max values from the saved data (the code will extract these values from what has been saved, making no assumptions).

varname1...varnameN (optional) The list of variables, separated by whitespace, that you wish to convert.

--debug (optional) Turn on debugging output.

--recache (optional) Force regeneration of cache files.

--swaths (optional) Read and write all 2D swath data to netCDF files.

--compress (optional) Turn on lossless gzip compression for 3D data saved in netCDF files.

--nc3 (optional) Save NetCDF version 3 files with 64 bit offsets instead of the default option of NetCDF version 4 files (which are HDF5)

--interp (optional) Read uinterp, vinterp, winterp directly from LOFS (for cases when u, v, w were not saved). NOTE! Diagnostics such as vorticity cannot be calculated with this option!

--allvars (optional) Convert all saved 3D LOFS variables (same result as listing them all at the command line)

--offset (optional) Supplied X0,X1,Y0,Y1 values are with respect to what was saved, not (0,0). For instance, if only a subset of the domain was saved, say ranging from 500 to 1000 in X and 600 to 1100 in Y, the following two statements arguments would produce identical results:

 --x0=50 --y0=100 --x1=250 --y1=300 --offset
--x0=550 --y0=700 --x1=750 --y1=900

NOTE: u, v, and w (corresponding to ua, va, wa CM1 3D arrays that exist on the Arakawa C staggered mesh) must be saved in LOFS files for diagnostics to be calculated. This is enforced in order to achieve the highest order accuracy possible with diagnostics (specifically those involving derivatives of velocity components), rather than using velocity variables that have already been averaged to the scalar mesh. However, if u, v, and w have been saved and uinterp, vinterp, winterp have been requested, the code will interpolate those values from ua, va, and wa such so for post-processing and visualization, all variables lie on the same mesh. It is strongly recommended for LOFS to only save the native staggered velocity variables if you wish to calculate diagnostics. Alternatively, you may wish to calculate diagnostics from within CM1, and save interpolated velocity data. We choose to save the minimum amount of data possible to save on disk space, and push the diagnostic calculations to the post-processing stage.

The following is a list of available additional fields that can be calculated within the hdf2nc code (u, v, w must be saved, as opposed to uinterp, vinterp, winterp):

Calculated fields
uinterp u component of wind interpolated to scalar mesh
vinterp v component of wind interpolated to scalar mesh
winterp w component of wind interpolated to scalar mesh
xvort x component of vorticity
yvort y component of vorticity
zvort z component of vorticity
vortmag 3D vorticity vector magnitude


hdf2nc Example

h2ologin2:~/project-bagm/brainstorm2017/15m/history.fs8-15m-a% hdf2nc  --histpath=3D --base=frob --time=7101 --x0=2300 --y0=2300 --x1=2400 --y1=2400 --z1=10 uinterp winterp dbz
histpath = 3D
ncbase = frob
time = 7101.000000
X0 = 2300
Y0 = 2300
X1 = 2400
Y1 = 2400
Z1 = 10
Setting Z0 to default value of 0
Read cached num_time_dirs of 146
ntimedirs: 146
Read cached sorted time dirs
Read cached num node dirs
Read cached nodedir
Read cached firstfilename and all times

We are requesting the following fields: uinterp winterp dbz 

Working on surface 2D thrhopert and dbz (
acca
czza
czza
aaaa

acca
czza
czza
aaaa
)
Working on uinterp (
acca
czza
czza
aaaa
)
Working on winterp (
acca
czza
czza
aaaa
)
Working on dbz (
acca
czza
czza
aaaa
)
h2ologin2:~/project-bagm/brainstorm2017/15m/history.fs8-15m-a% lt
total 168656
drwxr-xr-x  10 orf PRAC_bagm      4096 Jul 31 11:12 ./
-rw-r--r--   1 orf PRAC_bagm   1476356 Jul 31 11:12 frob.07101.000000.nc
-rw-r--r--   1 orf PRAC_bagm       116 Jul 31 11:12 frob.07101.000000.nc.cmd

Discussion

If no array index options are provided to the command line, hdf2nc will attempt to convert the entire model domain to a netCDF file. In other words, X0,Y0,Z0,X1,Y1,Z1 are optional arguments to hdf2nc. However for typical use cases with large amounts of data, you will want to convert only a subset of the full model domain!

The output of hdf2nc includes information on some basic metadata, plus some output that tracks the reading of the individual hdf5 files that comprise LOFS. Each letter that comprises the output that looks like

acca
czza
czza
aaaa

represents the successful reading of data from a single hdf5 file. It's kind of a 'base 26' representation of the percentage of data (in the horizontal) requested from each file. If a z is printed, that means the full horizontal range of data was requested. If a is printed, a tiny piece of the horizontal range was selected. All intermediate letters represent the space between these two extremes. This output is for your entertainment only; you are essentially watching the assembly of the netCDF file from LOFS data in real time.

hdf2nc always produces a 2D surface plot of density potential temperature perturbation from base state (proportional to buoyancy) and surface (calculated) radar reflectivity. These fields are used so much that they are always written to the netCDF files whether they are requested or not.

Regarding the mention of cached data, the LOFS read routines will look for existing cache files before going out and getting all the metadata from hdf5 files, which, for large amounts of data, is very expensive. Since the data layout never changes (unlesss you change it) the cached files speed things up quite a bit. If you ever change your LOFS data (say, adding new time directories), you must remove the cache files and let LOFS regenerate them so they will contain the new information. Cache files all are prefixed by .cm1hdf5_ and can always be removed, as they will always be regenerated.

In this example, the output file name is frob.07101.000000.nc, indicating data that was retrieved at t=07101.000000 seconds. Note that LOFS allows for the saving and retrieval of data saved in intervals of less than one second, as time is represented as a floating point variable.

Rationale

LOFS splits the model domain and times into files spread across hundreds directories in large simulations. Often times you may wish to analyze, plot, or visualize a subset of the full model domain at a given time, perhaps to make plots or to feed into visualization software that understands the netCDF format which is one of the most commonly used data formats used in atmospheric science.