Working with data and PMAT

This chapter will discuss the data components associated with PMAT. The first section will detail the formatting guidelines of the input files, followed by another discussion regarding the various output files generated by the software suite.

Input Data Formatting

The two input files discussed in this section include a YAML configuration file and a Comma Separated Value data file.

Configuration Input

The role of the configuration file is to store a series of parameters that includes sensor information, analytic parameters, logging options, and the site identifiers for the University of Wyoming’s Upper-Air database and the University of Utah’s MesoWest database.

See also

Chapter 5.1.1 shows the structure of the data fields. The filename of this file must be _pmat.yml.

Sensor fields

sensor.name (string):
Detail

The name of the sensor.

Note

If there are multiple of the same sensor use the notation _N with N being the index of the sensor.

Example

Sensor 09, Sensor 10_1, Sensor 10_2

sensor.error (float):
OPTIONAL

Detail

The manufacturer reported error on the sensor

Example

2.5, 5.0

sensor.color (string):
Detail

A hexadecimal color code that will be used to identify the sensor on visualizations.

Example

FF0000, 0000FF

sensor.ratio (string):
OPTIONAL

Detail

The distance to spot ratio that is reported by the manufacturer.

Example

12 to 1, 21 to 1

sensor.emissivity (float):
OPTIONAL

Detail

The emissivity of the sensor as reported by the manufacturer.

Example

0.95

sensor.poster (boolean):
Detail

A boolean that will decide whether the sensor will be shown in the poster-specific plots

Example

true, false

sensor.active (boolean):
Detail

A boolean that will decide whether the sensor will be used in the analysis.

Example

true, false

Analysis

train_fraction (float):
Detail

The fraction of data being used to create the training set.

Note

A value between 0 and 1.

Example

0.8

rel_difference (float):
Detail

Example

2

iteration.step (integer):
Detail

The number of steps the analysis will run

Example

1, 100, 1000

Logging

verbose (string):
Detail

An identifier for the level of logging

Example

DEBUG, WARN, ERROR, INFO

Import

For information regarding the usage of external files for PWV or RH measurements, refer to …

mesowest.id (string):
Detail

The measurement site identifier for the MesoWest database

Example

KONM, KRAP

wyoming.id (string):
Detail

The measurement site identifier for the Wyoming Upper-Air database

Example

ABQ, EPZ

wyoming.weight (string):
Detail

The weighting used on the PWV measurements for analysis.

Note

If there is multiple sites, these values should add to 0.5.

Example

0.4, 0.2, 0.5

Raw Data File

The raw data file is processed, through pattern identification, allowing for a flexible format with few strict requirements. One of these requirements is that the sky and ground temperature should be separated into groups and ordered the same way as the configuration file. Here are three examples of data files:

It should be noted that the columns do not have to be in any set order, with one small caveat, the model pulls the data from columns with headers containing specific words or phrases. The caveat is with regards to Ground and Sky temperature readings. The temperature measurements must go in consecutive order by sensor as determined by _pmat.yml.

For example, if the order of the sensors in _pmat.yml is 1610 TE, FLIR i3, and then AMES 1. Then the order of the ground and sky temperature measurements in the dataset should be: 1610 TE, FLIR i3, and then AMES 1. (As seen in Dataset 2).

Date (datetime, ``YYYY-MM-DD``):
Detail

The date of the measurements.

Time (datetime, ``HH:MM``):
Detail

The local time of the measurements

Sky temperature (float):
Detail

The sky temperature measurements. The header of this column should be Sensor Name (Sky), where Sensor Name is the name of the sensor used in the configuration file.

Ground temperature (float):
Detail

The ground temperature measurements. The header of this column should be Sensor Name (Ground), where Sensor Name is the name of the sensor in the configuration file.

Output Data Formatting

There are a variety of data files generated by the software suite. The data files are stored as CSV files, with each row presenting data for a single day.

General data files

The primary data file [master_data.csv] generated is the full dataset that includes:

  • Date

  • time

  • sky condition (clear sky/overcast)

  • ground temperature

  • sky temperature

  • Radiosonde PWV

  • Relative Humidity

  • Dewpoint

  • User comments

Machine learning

The machine learning data file includes five columns:

  • Date

  • Average brightness temperature

  • Average PWV

  • Relative Humidity

  • Sky Condition

This data set supports the classification of data by the sky condition label.

Analytic results

The main analytical results are stored as YAML configuration files. The results of each step in the iterative analysis process are saved to a file with the name _output.yml. An example of this file is presented below. [sample of _output.yml] [table of the fields in _output.yml]

The averaged results of the steps are also stored in a YAML file. [sample of _results.yml] [table of the fields in _results.yml]