Table Of Contents

Previous topic

DRS Schemes

Next topic

Translating CMIP3 to CMIP5

This Page

Module Reference

drslib.drs – DRS objects and utilities

The drs module contains a minimal model class for DRS information and some utility functions for converting filesystem paths to and from DRS objects.

More sophisticated conversions can be done with the drslib.translate and drslib.cmip5 modules.

class drslib.drs.BaseDRS(*argv, **kwargs)

Base class of classes representing DRS entries.

This class provides an interface to: 1. Define and expose the components of the DRS and their order 2. Convert components in and out of serialised form 3. Determine whether a DRS entry is complete 4. Define the publishing level of datasets represented by this DRS

This class provides default implementations of: 1. serialisation to dataset-id with or without version

Subclasses decide what components make up the DRS.

Variables:
  • DRS_ATTRS – a sequence of component names in the order they appear in the DRS identifier.
  • PUBLISH_LEVEL – the last component name which is part of the published dataset-id.
classmethod from_dataset_id(klass, dataset_id, **components)

Return a DRS object fro a ESG Publisher dataset_id.

If the dataset_id contains less than 10 components all trailing components are set to None. Any component of value ‘%’ is set to None

E.g. >>> drs = DRS.from_dataset_id(‘cmip5.output.MOHC.%.rpc45’) >>> drs.institute, drs.model, drs.experiment, drs.realm (‘MOHC’, None, ‘rpc45’, None)

classmethod from_json(klass, json_obj, **components)

Create a DRS object from a ceda-cc compatible json object.

ceda-cc may use different keys than the DRS terms used internally so these should be mapped here.

Json_obj:A dictionary containing a ceda-cc representation of the drs terms as exported in json.
is_complete()

Returns boolean to indicate if all components are specified.

Returns True if all components excluding those in self.OPTIONAL_ATTRS have a value.

is_publish_level()

Returns boolian to indicate if the all publish-level components are specified.

to_dataset_id(with_version=False)

Return the esgpublish dataset_id for this drs object.

If version is not None and with_version=True the version is included.

class drslib.drs.CmipDRS(*argv, **kwargs)

Represents a DRS entry. DRS objects are dictionaries where DRS components are also exposed as attributes. Therefore you can get/set DRS components using dictionary or attribute notation.

In combination with the translator machinary, this class maintains consistency between the path and filename portion of the DRS.

Variables:
  • activity – string
  • product – string
  • institute – string
  • model – string
  • experiment – string
  • frequency – string
  • realm – string
  • variable – string
  • table – string of None
  • ensemble – (r, i, p)
  • version – integer
  • subset – (N1, N2, clim) where N1 and N2 are (y, m, d, h, mn, sec) and clim is boolean
  • extended – A string containing miscellaneous stuff. Useful for representing irregular CMIP3 files
class drslib.drs.DRSFileSystem(drs_root)

Represents the mapping scheme between DRS objects and a filesystem.

Instances of this class deal with how DRS objects are partitioned into Publicaiton-level datasets and how files within a DRS are mapped to the filesystem.

Variables:
  • drs_cls – The subclass of BaseDRS used in this filesystem.
  • publish_level – the last component name which is part of the published dataset-id.
  • drs_root – The path to the root directory of a DRS filesystem. This path represents the activity level of the DRS.
drs_to_linkpath(drs, version=None)

Return the full path of the symbolic link for this drs

drs_to_publication_path(drs)

Returns a directory path from a DRS object. Any DRS component that is set to None will result in a wildcard ‘*’ element in the path.

This function does not take into account of MIP tables of filenames.

Parameters:drs – The DRS object from which to generate the path
drs_to_realpath(drs)

Return the full path to the real file for drs (as oposed to the symbolic link).

drs_to_storage(drs)

Return the subpath within the files directory for this DRS instance.

filename_to_drs(filename)

Return a DRS instance deduced from a filename.

filepath_to_drs(filepath)

Return a DRS instance deduced from a full path.

Iterate over files of a particular version also returning it’s respective link into the latest version.

Parameters:
  • version – iterate over a specific version or all versions if None
  • into_version – the version into which symbolic links will be made, if None same as version, if both are None same as self.latest
Yield:

filepath, linkpath

publication_path_to_drs(path, activity=None)

Create a DRS object from a filesystem path.

This function is more lightweight than using drslib.translator but only works for the parts of the DRS explicitly represented in a path.

Parameters:path – The path to convert. This is either an absolute path

or is relative to the current working directory.

storage_to_drs(subpath)

Return a DRS instance representing the DRS components deducible from its subpath within the files directory.

drslib.translate – Translate DRS filepaths

drslib.cmip5 – DRS objects and utilities

drslib.mip_table – Loading CMOR MIP Tables

Simple parser for MIP tables.

My interpretation of the format from reading the CMIP5 tables.

class drslib.mip_table.MIPTable(filename)

Hold information from a MIP table.

This information is used to enforce DRS vocabularies.

Property name:The name of the MIP table as used in DRS filenames.
Property variables:
 A list of variables in this table.
Property experiments:
 A list of valid experiment ids for this table.
get_variable_attr(variable, attr)

Retrieve an attribute of variable.

If the attributes isn’t in the variable entry the global value is returned

class drslib.mip_table.MIPTableStore(table_glob)

Holds a collection of mip tables.

Property tables:
 A mapping of table names to IMIPTable instances
add_table(filename)

Read filename as a MIP table and add it to the store.

Returns:The added MIPTable instance.
get_global_attr(table, attr)

Return global table attribute.

get_global_attr_mv(table, attr)

Return the value of a variable’s attribute in a given table.

get_variable_attr(table, variable, attr)

Return the value of a variable’s attribute in a given table.

get_variable_attr_mv(table, variable, attr)

Return the value of a variable’s attribute in a given table.

drslib.mip_table.iter_entries(fh)

Generate events (entry_name, value_dict) by reading a MIP table from a file object.

drslib.mip_table.iter_table(fh)

Generates events (entry, value, comment) by reading a MIP table from a file object.

drslib.mip_table.read_model_table(table_csv)

Read Karl’s CMIP5_models.xls file in CSV export format and return a map of institute to model name.

This function is invoked internally to load CMIP5_models.xls from inside drslib.

drslib.mip_table.split_comment(line)

Detect comment.

Quoted ‘!’ characters are detected.

drslib.drs_tree – Managing DRS directory structure versioning

drslib.p_cmip5 – CMIP5 product detection

The p_cmip5 module will decide whether data should be assigned to DRS product=output1 or output2. Background and the algorithm steps are given in requested_subset_decision_tree_v0_5.pdf.

Set-up

There is a set-up step performed by the code in p_cmip5/init.py, using the “init(shelve_dir)” function. This modules takes information from the spreadsheets CMIP5_archive_size_template.xls and standard_output_17Sep2010_mod.xls and stores it in python shelves used by the main code. The shelves are placed in “shelve_dir”.

The cmip5_product class

The drslib.p_cmip5.product module provides a cmip5_product class with a “find_product” method. At instantiation, the class picks up configuration tables The configuration file (sample in ini/sample_1.ini, ini/smaple_2.ini) needs to contain information about each model. This information is used for a small number of cases and the format is not yet stable:

class cmip5_product:

  def __init__(self,mip_table_shelve='sh/standard_output_mip',
                    template='sh/template',
                    stdo='sh/standard_output',
                    config='ini/sample_1.ini',
                    override_product_change_warning=False,
                    policy_opt1='all_rel',not_ok_excpt=False):

Optional arguments

mip_table_shelve:
 shelve containing information about MIP tables;
template:shelve containing information mapping experiment names to labels used in the standard_output spreadsheet;
stdo:shelve containing information from the standard_output spreadsheet;
config:configuration file;
override_product_change_warning:
 in some cases it is possible that adding new data to previously published data can change the product designation of the previously published data. The default behaviour is to givw an error return at this point. This can be over-ridden, the code will then provide the product of the file to be added and lists of changes that need to be made to previously published data. It is not expected, however, that the ESG publisher will support such updates: the user should instead compile a new set of files to submit using all the previously published files and the new files.
policy_opt1:this controls two options for treatment of data blocks in which time slices are requested. The default is that, if the time slices are specified using relative dates (e.g. relative to start of experiment) and the number of years submitted is les than the number of years requested, all years submitted will be assigned to output1 without examining the dates. There is an option (depricated) to extend this catch-all approach to time slices specified with absolute dates.
not_ok_excpt:if True, raise an exception if product can not be designated as output1 or output2.

The find_product method

The principal interface to the cmip5_product class is the find_product method:

def find_product(self,var,table,expt,model,path,startyear=None,endyear=None,verbose=False,
                path_output1=None, path_output2=None,selective_ads_scan=True):

Required arguments

var:DRS variable name
table:MIP table
expt:DRS experiment name
model:Model name
path:Path to directory containing all the files of one atomic dataset.

Optional arguments

startyear:first year of the file to be assessed
endyear:last year of file to be assessed (not currently used)
verbose:if True, provide additional comments to logger
path_output1:path to last published output1 data for this atomic dataset, if new data is to be considered as an addition;
path_output2:path to last published output1 data for this atomic dataset, if new data is to be considered as an addition;
selective_ads_scan:
 when scanning the atomic dataset directory, look only at files matching the variable, table, experiment and model. The False option is provided to facilitate testing using dummy data files, and should not otherwise be used.
return:if “not_ok_excp=True”, the method will return True if it can assign output1 or output2, otherwise an exception will be raised. if “not_ok_excpt=False”, the method will return False when it cannot assign output1 or output2, with a message in pc.reason (where pc is an instance of the cmip5_product class).

Attributes containing results

After successful completion of the find_product method, the followin attibutes of the instance contain information:

product:the product to be assigned to the file;
reason:a short summary of the reasons for the assigment;
rc:a return code, ‘OKnnn’ if successful, ‘ERRnnn’ if not.

Usage

The following code fragment illustrates usage of the module:

## import module
import p_cmip5_v5 as p

## create and instance of the cmip5_product class, specifying a configuration file
##
pc2 = p.cmip5_product( config='ini/sample_2.ini')

## test a file: using the variable, mip table, experiment id, model and specifying the path of the atomic dataset directory containing all
## the submitted files.
## In some cases the decision as to which product the file belongs in will depend on the contents of this directory.
## verbose=True results in additional messages being printed to standard out.

if pc2.find_product( var, mip, expt,model,path,startyear=startyear, verbose=verbose):
  print 'product is: ', pc2.product

##  A True return means the method has identified the product
else:
##  A False return means the product could not be identified
  print 'Dont know what to do with this data:: ',pc2.reason

Testing the p_cmip5 module: test_p_cmip5.py

The test_p_cmip5 module can be used to test the p_cmip5 module. E.g. run the following from the directory containing the “test” subdirectory:

$ nosetests --tests=test/test_p_cmip5.py