Working with data and PMAT
This chapter will discuss the data components associated with PMAT. The first section will detail the formatting guidelines of the input files, followed by another discussion regarding the various output files generated by the software suite.
Input Data Formatting
The two input files discussed in this section include a YAML configuration file and a Comma Separated Value data file.
Configuration Input
The role of the configuration file is to store a series of parameters that includes sensor information, analytic parameters, logging options, and the site identifiers for the University of Wyoming’s Upper-Air database and the University of Utah’s MesoWest database.
See also
Chapter 5.1.1 shows the structure of the data fields. The filename of this file must be _pmat.yml
.
Sensor fields
- sensor.name (string):
- Detail
The name of the sensor.
- Note
If there are multiple of the same sensor use the notation
_N
with N being the index of the sensor.- Example
Sensor 09, Sensor 10_1, Sensor 10_2
- sensor.error (float):
- OPTIONAL
- Detail
The manufacturer reported error on the sensor
- Example
2.5, 5.0
- sensor.color (string):
- Detail
A hexadecimal color code that will be used to identify the sensor on visualizations.
- Example
FF0000, 0000FF
- sensor.ratio (string):
- OPTIONAL
- Detail
The distance to spot ratio that is reported by the manufacturer.
- Example
12 to 1, 21 to 1
- sensor.emissivity (float):
- OPTIONAL
- Detail
The emissivity of the sensor as reported by the manufacturer.
- Example
0.95
- sensor.poster (boolean):
- Detail
A boolean that will decide whether the sensor will be shown in the poster-specific plots
- Example
true, false
- sensor.active (boolean):
- Detail
A boolean that will decide whether the sensor will be used in the analysis.
- Example
true, false
Analysis
- train_fraction (float):
- Detail
The fraction of data being used to create the training set.
- Note
A value between 0 and 1.
- Example
0.8
- rel_difference (float):
- Detail
- Example
2
- iteration.step (integer):
- Detail
The number of steps the analysis will run
- Example
1, 100, 1000
Logging
- verbose (string):
- Detail
An identifier for the level of logging
- Example
DEBUG, WARN, ERROR, INFO
Import
For information regarding the usage of external files for PWV or RH measurements, refer to …
- mesowest.id (string):
- Detail
The measurement site identifier for the MesoWest database
- Example
KONM, KRAP
- wyoming.id (string):
- Detail
The measurement site identifier for the Wyoming Upper-Air database
- Example
ABQ, EPZ
- wyoming.weight (string):
- Detail
The weighting used on the PWV measurements for analysis.
- Note
If there is multiple sites, these values should add to 0.5.
- Example
0.4, 0.2, 0.5
Raw Data File
The raw data file is processed, through pattern identification, allowing for a flexible format with few strict requirements. One of these requirements is that the sky and ground temperature should be separated into groups and ordered the same way as the configuration file. Here are three examples of data files:
It should be noted that the columns do not have to be in any set order, with one small caveat, the model pulls the data from columns with headers containing specific words or phrases. The caveat is with regards to Ground and Sky temperature readings. The temperature measurements must go in consecutive order by sensor as determined by _pmat.yml
.
For example, if the order of the sensors in _pmat.yml
is 1610 TE, FLIR i3, and then AMES 1. Then the order of the ground and sky temperature measurements in the dataset should be: 1610 TE, FLIR i3, and then AMES 1. (As seen in Dataset 2).
- Date (datetime, ``YYYY-MM-DD``):
- Detail
The date of the measurements.
- Time (datetime, ``HH:MM``):
- Detail
The local time of the measurements
- Sky temperature (float):
- Detail
The sky temperature measurements. The header of this column should be Sensor Name (Sky), where Sensor Name is the name of the sensor used in the configuration file.
- Ground temperature (float):
- Detail
The ground temperature measurements. The header of this column should be Sensor Name (Ground), where Sensor Name is the name of the sensor in the configuration file.
Output Data Formatting
There are a variety of data files generated by the software suite. The data files are stored as CSV files, with each row presenting data for a single day.
General data files
The primary data file [master_data.csv
] generated is the full dataset that includes:
Date
time
sky condition (clear sky/overcast)
ground temperature
sky temperature
Radiosonde PWV
Relative Humidity
Dewpoint
User comments
Machine learning
The machine learning data file includes five columns:
Date
Average brightness temperature
Average PWV
Relative Humidity
Sky Condition
This data set supports the classification of data by the sky condition label.
Analytic results
The main analytical results are stored as YAML configuration files. The results of each step in the iterative analysis process are saved to a file with the name _output.yml
. An example of this file is presented below.
[sample of _output.yml]
[table of the fields in _output.yml]
The averaged results of the steps are also stored in a YAML file. [sample of _results.yml] [table of the fields in _results.yml]