CMIP5 product detection¶
drslib includes an algorithm for detecting the required CMIP5 product from filenames generated by CMOR2 developed by Martin Juckes. The module drslib.p_cmip5 contains an API to the algorithm and drs_tool will use the algorithm to allocate a product for incoming data when used with the --detect-product option.
A detailed discussion of the algorithm used is available as the document Identifying the product [PDF].
drs_tool configuration¶
Product detection requires extra information to be configured about the CMIP5 experiment and the data being processed. Before first using the product detection feature you should initialise the CMIP5 experiment description data using drs_tool init:
$ drs_tool init --shelve-dir=DIR
where DIR is a directory suitable for containing the data files. On an ESG datanode a suitable command might be:
$ drs_tool init --shelve-dir=/usr/local/share/p_cmip5/data
In addition to the shelve directory p_cmip5 requires additional information about the model output being processed to be included in an external configuration file (see `Input configuration file`_).
Both these parameters can be configured using metaconfig as follows:
[metaconfig]
configs = drslib
[drslib:p_cmip5]
shelve-dir = /usr/local/share/p_cmip5/data
config = /usr/local/share/p_cmip5/model.ini
The configuration file.¶
A configuration file is required by the p_cmip5 module (which assigns the data to DRS product=output1 or output2) for processing piControl data, and may optionally contain additional information which will ensure consistency of of the assigment for some other datasets (details below).
The configuration file is in standard ini-file format and should contain one section for each model name you want to process. Within each section set of option name/value pairs are listed to specify relevant properties of the model.
An Example¶
For instance the definition for 2 models HADCM3 and HIGEM1-2 could look as follows:
[HADCM3]
category=centennial
branch_year_piControl_to_historical=1820
base_year_historical=1850
branch_year_esmControl_to_esmHistorical=1850
base_year_esmHistorical=1850
[HIGEM1-2]
category=other
branch_year_piControl_to_historical=1820
base_year_historical=1850
branch_year_esmControl_to_esmHistorical=1850
base_year_esmHistorical=1850
Option names¶
1. category¶
Value: | either ‘centennial’ or ‘other’ |
---|---|
Description: | The category specifies which suite of experiments the model is being used for. This information is use to determine what data should be prioritised for quality control and DOI assignment. The aim is to ensure consistency between experiments and between modelling groups. If a model is used both for centennial and decadal experiments, specify ‘centennial’. |
2. branch_year_piControl_to_historical¶
Value: | integer |
---|---|
Description: | The year of the piControl data used to initiate the historical run. Required if piControl data for tables aero, day or 6hrPlev is archived. This information can be determined from the global attribute “branch” in the historical data files and the base year from the time units of the piControl experiment. |
3. base_year_historical¶
Value: | integer |
---|---|
Description: | The year of the start of the historical run. This is required becuase it is needed when processing piControl data, and thus cannot generally be obtained from the data files being processed. |
4. branch_year_esmControl_to_esmHistorical¶
Value: | integer |
---|---|
Description: | The year of the piControl data used to initiate the historical run. See notes on 2. branch_year_piControl_to_historical |
5. base_year_esmHistorical¶
Value: | integer |
---|---|
Description: | Start of esmHistorical expt. See notes on 3. base_year_historical |
6. base_year_abrupt4xCO2 [optional]¶
Value: | integer |
---|---|
Description: | The year of start of the abrupt4xCO2 run. Used for processing abrupt4xCO2 data – only needed if the base year specified by the time units does not correspond to start of experiment. |
7. base_year_piControl [optional]¶
Value: | integer |
---|---|
Description: | The year of start of the piControl run. Only needed if the base year specified by the time units does not correspond to start of experiment. Used in determining which years of data from the aero table, piControl experiment in the decadal suite are replicated. |
8. base_year_1pctCO2 [optional]¶
Value: | integer |
---|---|
Description: | The year of start of the 1pctCO2 run. Only needed if the base year specified by the time units does not correspond to start of experiment. |
Invoking drs_tool with product detection¶
drs_tool will allocate a product to files in the incoming directory when invoked with the --detect-product option.
drs_tool list will show the product deduced in each dataset_id and list datasets for which the product could not be determined as incomplete. drs_tool todo and drs_tool upgrade will only operate on datasets for which the product could be determined.
In the following example data from the UK Met Office Hadley Centre is processed into the DRS hierarchy.
$ drs_tool list -I ./mohc_holding/ -R ./cmip5 --detect-product
...
[INFO] drslib.p_cmip5: Product deduced as output1, selected years [112/56] assigned to output1
[INFO] drslib.p_cmip5: Deducing product for <DRS cmip5.%.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1.%.va.2001120106-2002120100>
[INFO] drslib.p_cmip5: Product deduced as output1, selected years [112/56] assigned to output1
[INFO] drslib.p_cmip5: Deducing product for <DRS cmip5.%.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1.%.va.2002120106-2003120100>
[INFO] drslib.p_cmip5: Product deduced as output1, selected years [112/56] assigned to output1
[INFO] drslib.p_cmip5: Deducing product for <DRS cmip5.%.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1.%.va.2003120106-2004120100>
[INFO] drslib.p_cmip5: Product deduced as output1, selected years [112/56] assigned to output1
[INFO] drslib.p_cmip5: Deducing product for <DRS cmip5.%.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1.%.va.2004120106-2005120100>
[INFO] drslib.p_cmip5: Product deduced as output1, selected years [112/56] assigned to output1
==============================================================================
DRS Tree at .
------------------------------------------------------------------------------
cmip5.output1.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrLev.r1i1p1 0:0 1120:1371721676416
cmip5.output1.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1 0:0 224:116004074368
------------------------------------------------------------------------------
2 datasets awaiting upgrade
==============================================================================
# Select the 6hrPlev dataset for publishing and check commands to be issued
$ drs_tool todo -I ./mohc_holding/ -R . --detect-product cmip5.%.%.%.%.%.%.6hrPlev
==============================================================================
DRS Tree at .
------------------------------------------------------------------------------
Publisher Tree cmip5.output1.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1 todo for version 20101006
mv ./mohc_holding/psl_6hrPlev_HadGEM2-ES_historical_r1i1p1_194912010600-195012010000.nc ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/files/psl_20101006/psl_6hrPlev_HadGEM2-ES_historical_r1i1p1_194912010600-195012010000.nc
ln -s ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/files/psl_20101006/psl_6hrPlev_HadGEM2-ES_historical_r1i1p1_194912010600-195012010000.nc ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/v20101006/psl/psl_6hrPlev_HadGEM2-ES_historical_r1i1p1_194912010600-195012010000.nc
...
mv ./mohc_holding/va_6hrPlev_HadGEM2-ES_historical_r1i1p1_200412010600-200512010000.nc ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/files/va_20101006/va_6hrPlev_HadGEM2-ES_historical_r1i1p1_200412010600-200512010000.nc
ln -s ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/files/va_20101006/va_6hrPlev_HadGEM2-ES_historical_r1i1p1_200412010600-200512010000.nc ./cmip5/output1/MOHC/HadGEM2-ES/historical/6hr/atmos/6hrPlev/r1i1p1/v20101006/va/va_6hrPlev_HadGEM2-ES_historical_r1i1p1_200412010600-200512010000.nc
==============================================================================
# Do the upgrade
$ drs_tool upgrade -I ./mohc_holding/ -R . --detect-product cmip5.%.%.%.%.%.%.6hrPlev
...
# List the results.
# Not including --detect-product lists datasets that are incompletely specified
$ drs_tool list -I ./mohc_holding -R mohc_dryrun
==============================================================================
DRS Tree at mohc_dryrun
------------------------------------------------------------------------------
cmip5.output1.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrPlev.r1i1p1.v20101005 224:116004074368
------------------------------------------------------------------------------
Incompletely specified incoming datasets
------------------------------------------------------------------------------
cmip5.%.MOHC.HadGEM2-ES.historical.6hr.atmos.6hrLev.r1i1p1
==============================================================================