Working with data and PMAT

This chapter will discuss the data components associated with PMAT. The first section will detail the formatting guidelines of the input files, followed by another discussion regarding the various output files generated by the software suite.

Input Data Formatting

The two input files discussed in this section include a YAML configuration file and a Comma Separated Value data file.

Configuration Input

The role of the configuration file is to store a series of parameters that includes sensor information, analytic parameters, logging options, and the site identifiers for the University of Wyoming’s Upper-Air database and the University of Utah’s MesoWest database.

Sensor fields

sensor.name (string):

Detail: The name of the sensor.
Note: If there are multiple of the same sensor use the notation _N with N being the index of the sensor.
Example: Sensor 09, Sensor 10_1, Sensor 10_2

sensor.error (float):

OPTIONAL
Detail: The manufacturer reported error on the sensor
Example: 2.5, 5.0

sensor.color (string):

Detail: A hexadecimal color code that will be used to identify the sensor on visualizations.
Example: FF0000, 0000FF

sensor.ratio (string):

OPTIONAL
Detail: The distance to spot ratio that is reported by the manufacturer.
Example: 12 to 1, 21 to 1

sensor.emissivity (float):

OPTIONAL
Detail: The emissivity of the sensor as reported by the manufacturer.
Example: 0.95

sensor.poster (boolean):

Detail: A boolean that will decide whether the sensor will be shown in the poster-specific plots
Example: true, false

sensor.active (boolean):

Detail: A boolean that will decide whether the sensor will be used in the analysis.
Example: true, false

Analysis

train_fraction (float):

Detail: The fraction of data being used to create the training set.
Note: A value between 0 and 1.
Example: 0.8

rel_difference (float):

Detail
Example: 2

iteration.step (integer):

Detail: The number of steps the analysis will run
Example: 1, 100, 1000

Logging

verbose (string):

Detail: An identifier for the level of logging
Example: DEBUG, WARN, ERROR, INFO

Import

For information regarding the usage of external files for PWV or RH measurements, refer to …

mesowest.id (string):

Detail: The measurement site identifier for the MesoWest database
Example: KONM, KRAP

wyoming.id (string):

Detail: The measurement site identifier for the Wyoming Upper-Air database
Example: ABQ, EPZ

wyoming.weight (string):

Detail: The weighting used on the PWV measurements for analysis.
Note: If there is multiple sites, these values should add to 0.5.
Example: 0.4, 0.2, 0.5

Raw Data File

The raw data file is processed, through pattern identification, allowing for a flexible format with few strict requirements. One of these requirements is that the sky and ground temperature should be separated into groups and ordered the same way as the configuration file. Here are three examples of data files:

Dataset Example 1

Dataset Example 2

Dataset Example 3

It should be noted that the columns do not have to be in any set order, with one small caveat, the model pulls the data from columns with headers containing specific words or phrases. The caveat is with regards to Ground and Sky temperature readings. The temperature measurements must go in consecutive order by sensor as determined by _pmat.yml.

For example, if the order of the sensors in _pmat.yml is 1610 TE, FLIR i3, and then AMES 1. Then the order of the ground and sky temperature measurements in the dataset should be: 1610 TE, FLIR i3, and then AMES 1. (As seen in Dataset 2).

Date (datetime, ``YYYY-MM-DD``):

Detail: The date of the measurements.

Time (datetime, ``HH:MM``):

Detail: The local time of the measurements

Sky temperature (float):

Detail: The sky temperature measurements. The header of this column should be Sensor Name (Sky), where Sensor Name is the name of the sensor used in the configuration file.

Ground temperature (float):

Detail: The ground temperature measurements. The header of this column should be Sensor Name (Ground), where Sensor Name is the name of the sensor in the configuration file.

Output Data Formatting

There are a variety of data files generated by the software suite. The data files are stored as CSV files, with each row presenting data for a single day.

General data files

The primary data file [master_data.csv] generated is the full dataset that includes:

Date
time
sky condition (clear sky/overcast)
ground temperature
sky temperature
Radiosonde PWV
Relative Humidity
Dewpoint
User comments

Machine learning

The machine learning data file includes five columns:

Date
Average brightness temperature
Average PWV
Relative Humidity
Sky Condition

This data set supports the classification of data by the sky condition label.

Analytic results

The main analytical results are stored as YAML configuration files. The results of each step in the iterative analysis process are saved to a file with the name _output.yml. An example of this file is presented below. [sample of _output.yml] [table of the fields in _output.yml]

The averaged results of the steps are also stored in a YAML file. [sample of _results.yml] [table of the fields in _results.yml]