# Module: Introduction to Statistics for Climatology - UCAR COMET

## Summary

The effective use of climate data and products requires an understanding of what the statistical parameters mean and which parameters best summarize the data for particular climate variables.

This module addresses both concerns, taking a two-pronged approach:

1) focusing on the statistical parameters (mean, median, mode, extreme values, percent frequency of occurrence and time, range, standard deviation, and data anomalies), defining what they mean and how they are calculated using climate data as examples, and

2) focusing on weather and climate variables, identifying the statistical parameters that best represent each one. The module concludes with a discussion of data quality and its impact on weather and climate products.

Used this activity? Share your experiences and modifications

## Learning Goals

The module is intended for forecasters and others interested in improving their understanding of the basic statistics used in climate products so they can make better use of climatology products for planning and operational purposes. Basic knowledge of meteorology is beneficial although not required. This module is part of COMET's Climatology for Forecasters series.

Objectives:

1. Define mean, mode, frequency of occurrence and time, extreme value, range, standard deviation, and data anomalies.

2. Using climate data, calculate each statistical parameter (other than standard deviation).

3. Understand which statistical parameters best describe various climate variables.

4. Describe the impacts of data quality on climatology products.

## Context for Use

STATISTICAL PARAMETERS

Here's a quick summary of the statistical parameters covered in this module.

Mean
Definition: Describes the middle of a dataset (when no outliers are present)
Calculation: Add the individual values and divide by the number of values
Advantages: A frequently used measure; is unique; is only one mean; is useful when comparing datasets
Disadvantage: Is affected by extreme values
Mnemonic (memory aid) for distinguishing between the mean, median, and mode:

The median is the middle like the one in the road,
while the mean is the average and the mode is the most,
frequent that is.

Median
Definition: The middle value of a dataset when the values are placed in order from lowest to highest
Calculation: Place the values in order from lowest to highest, then find the middle value
Advantages: Is not as affected by extreme values as the mean; useful when comparing datasets; is unique; is only one value
Disadvantage: Not used as often as the mean

Mode
Definition: The most frequently occurring item(s)
Calculation: Find the value(s) that occur(s) most often
Advantages: Not affected by extreme values
Disadvantages: Not used as often as the mean and median; not necessarily a unique value; may be more than one value; when no values are repeated, the mode is every value and therefore useless; when there is more than one mode, it is difficult to interpret and/or compare

Extreme Values
Definition: The maximum and minimum value of a dataset
Calculation: Find the maximum and minimum values
Advantages: Are not affected by repeating values; are unique
Disadvantages: Only provides information about two values; can unduly influence the mean

Range
Definition: The difference between the maximum and minimum
Calculation: Find the difference between maximum and minimum values
Advantages: Is not affected by repeating values; is unique
Disadvantages: Only provides information about two values; can unduly influence the mean

Frequency of Occurrence
Definition: Describes the proportion of a dataset that occurs at a certain value or within a range of values
Calculation: Find the number of occurrences of a particular value, divide by the total number of values, and multiply by 100
Advantage: Consolidates and provides information across the entire dataset
Disadvantage: Is highly influenced by the range selected

Standard Deviation
Calculation: Take the square root of the sum of the difference for each value from the mean, squared, divided by the number of values minus 1
Advantages: Provides a measure of how individual values compare to the mean; if the dataset is normally distributed, it indicates how much of the dataset is clustered about the mean (68% are within +/- 1 STD; 95% are within +/- 2 STD; 99.7 are within +/- 3 STD)
Disadvantage: Is affected by extreme values

Anomalies
Definition: The difference between an observed value and a long-term mean value; can be positive or negative
Calculation: Choose a base period of sufficient length for the parameter of interest; calculate the average of the parameter for that period; then take the difference between the long-term value (average) and the observation
Advantage: Indicates differences from long-term averages
Disadvantage: Any significant deviation from the baseline is indicated as an anomaly

### CLIMATE AND WEATHER PARAMETERS

Here are some guidelines to follow when determining which statistical parameter to use with various climate variables. Note that the list is not exhaustive since this module did not examine the full range of climate and statistical parameters.

Temperature

• For average lows and highs, use the mean over days/months/etc.
• For information on &gt;90&Acirc;&ordm;F (32&Acirc;&deg;C) or
• For annual max/min, use range

Wind

• For wind direction, use mode when the wind is blowing
• For wind speed, use the mean of the prevailing wind direction when the wind is blowing
• To determine how long a certain wind occurred, use percent frequency of time (a wind rose)

Weather events

• For rain, use mean in combination with range and extreme values
• For fog, use percent frequency of time
• For thunderstorms, use the number of days and frequency of occurrence with a time period of hours
• For El Ni&Atilde;&plusmn;o/La Ni&Atilde;&plusmn;a events, use anomalies and climate probabilities

Data quality

• Determine if the data are normally distributed; if so, 99.7% of observations are within +/- 3 standard deviations
• Consider if data make sense meteorologically and with other values
• Determine if the instruments are working

## Teaching Notes and Tips

The module is intended for forecasters and students interested in improving their understanding of the basic statistics used in climate products so they can make better use of climatology products for planning and operational purposes. Basic knowledge of meteorology is beneficial although not required.

For background information on climatology (what it is, the factors that create an area's climate, and the sources and uses of climate information), access the module Introduction to Climatology in COMET'sClimatology for Forecasters series.

## Assessment

#### Flash Flood Processes: International Edition Quiz Disclaimer

Note: At the request of COMET sponsors, this quiz is designed to demonstrate your successful completion of the module. Because the quiz is being administered online, upon completion you will be given only feedback on which questions you answered correctly and incorrectly. No additional instructional feedback is provided. However, should you need to, you can use this information to revisit the module to determine how to answer correctly and then attempt the quiz again.

## References and Resources

The COMET® Program is sponsored by NOAA National Weather Service (NWS), with additional funding by:

Air Force Weather Agency (AFWA)
Australian Bureau of Meteorology (BoM)
European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT)