Precipitation Station Data

This file describes the steps and programs involved in processing precipitation station data starting with data format conversions, and ending with interpolated data fields for use with the VIC model.

Unless otherwise noted:

To simplify downloading of files, use the right mouse button to save the link directly to disk.


File Contents


PROCESSING INSTRUCTIONS:

Station Selection

In order to reduce the work involved in selecting stations for large areas, two programs were developed to read through files containing station locations and information about period of record.

Data Retrieval

Once the IDs of the desired stations have been determined, the data files must be extracted from the CD-ROMs. Actual retrieval is not automated, you will have to retrieve the stations one at a time (not as bad as it sounds), or write a batch file or program yourself. Station data on the Wallis, Lettenmaier, Wood CD-ROM, can be accessed directly from any PC CD-ROM drive, or by mounting the CD on the UNIX systems. The EarthInfo Summary of the Day CD-ROM must be accessed using the EarthInfo software (program sd, mounted on the PCs in Wilcox 159). This software makes extracting single stations fairly easy, but can be a problem when retrieving large numbers of stations. For multiple stations it is easier to use the CD-ROM software to select all of the stations and put them into a single file. reform.c will then extract individual stations files, from the larger data file.

The data you retrieve from the EarthInfo CDs have to be daily values in NCDC format (you can specify these options when you retrieve the data files from the CD's). Because the program uses the filename for the precipitation stations to check what actions have to be undertaken, you will have to follow some conventions in naming the files, or you will have to adjust some of the code in some of the programs (These naming conventions are followed automatically by NCDC.split.f when it extracts individual station files). The naming conventions I followed are:

State Numbers

NCDCState NameW-L-W
01ALABAMA 01
02ARIZONA 04
03ARKANSAS 05
04CALIFORNIA 06
05COLORADO 08
06CONNECTICUT 09
07DELAWARE 10
08FLORIDA 12
09GEORGIA 13
10IDAHO 16
11ILLINOIS 17
12INDIANA 18
13IOWA 19
14KANSAS 20
15KENTUCKY 21
16LOUISIANA 22
17MAINE 23
18MARYLAND 24
19MASSACHUSETTS 25
20MICHIGAN 26
21MINNESOTA 27
22MISSISSIPPI 28
23MISSOURI 29
24MONTANA 30
25NEBRASKA 31
26NEVADA 32
27NEW HAMPSHIRE 33
28NEW JERSEY 34
29NEW MEXICO 35
30NEW YORK 36
31NORTH CAROLINA 37
32NORTH DAKOTA 38
33OHIO 39
34OKLAHOMA 40
35OREGON 41
36PENNSYLVANIA 42
37RHODE ISLAND 44
38SOUTH CAROLINA 45
39SOUTH DAKOTA 46
40TENNESSEE 47
41TEXAS 48
42UTAH 49
43VERMONT 50
44VIRGINIA 51
45WASHINGTON 53
46WEST VIRGINIA 54
47WISCONSIN 55
48WYOMING 56
49unassigned
50ALASKA 02
51HAWAII 15
66PUERTO RICO 72
67VIRGIN ISLANDS 78
91PACIFIC ISLANDS
DISTRICT OF COL. 11
AMERICAN SAMOA 60
GUAM DISTRICT 66

Format Conversions

Data extracted from the Wallis, Lettenmaier, Wood CDs is already in the proper format for processing. Data extracted from the EarthInfo CDs must be converted from NCDC format to the standard format. There are three possible steps involved with converting the NCDC format:

  1. Look at the raw data file using more. If the data has been written to the file as a single line, run the file through NCDC.filter to add carriage returns to the end of each record:
    cat filename | NCDC.filter > outfile

    This filter was designed for data retrieved with the newest EarthInfo software
  2. Next run the data through reorder.c, this will reorder the data, and throw out stations that do not have TMAX, TMIN, and PRCP data (it will also strip out other data types, like SNOW):
    reorder inputfile outputfile

    This program sorts the data file so that records are ordered TMAX, TMIN, PRCP (Hydroclimate software puts PRCP first, and reform will fail)
  3. Run reform.c on the filtered data file:
    reform begin year inputfile outputpath

Estimating Missing Data

The program Estimate.Precip will estimate missing data records by using measurements from the closest of the nearest 10 stations which has data.

Gridding Data Files

The program Interp.Precip will create gridded data files the contain the daily precipitation, maximum and minimum temperature and Hamon evaporation.

Interp.Precip sitefile datadir outdir startdate enddate ullat ullng lrlat lrlng resolution

PREPROCESSING FORMAT CONVERSIONS:

All data processing requires that the data files be converted to a standard format. For a description of formats, see below.

NCDC CDROM Raw Data File PreFilter

The NCDC CDROM produced NCDC format Summary of the Day files have no line breaks between records. In order for reform.c to work, carriage returns must be inserted to split each record. The program NCDC.filter reads a raw NCDC file from standard input, and writes a corrected file to standard output. To use this program type:

cat < filename > | NCDC.filter > outfile

NCDC format to Standard Format

The program reform converts an individual data file from NCDC format to the standard format. Inputs are a start year, a file in NCDC format, and a path to which write standard format files. The starting year ensures that all files begin at the same time, and that records missing from the start of the files can be fixed. This program will extract multiple station files which contain TMAX, TMIN and PRCP records, and create files using the Wallis, Lettenmaier, Wood naming conventions.

The script NCDC.script will read through a directory of data files, and convert all of them.


PROCESSING PRECIPITATION STATION DATA

Fixing Missing Data:

Data files often included missing data, both as day measurements, or complete months. The program Estimate.Precip reads standard format data files, and uses the 10 closest stations to estimate missing data. Inputs to this program include a station location file, a directory containing the site data files, and the start and end dates.

This program will stop running if it cannot find 5 of the 10 possible nearest neighbor files. To prevent this from happening, it is best to filter the station file, so that it contains only the stations used.

Output from this program is a set of filled station logs from start to end date.

Extending Precipitation Records:

The program Cat.Precip will tack a more recent precpitation file onto the end of an older data file. Where the two files overlap, the program will compare the new and old values. It will then print differences to stderr, and average the two values (unless the data flags indicate one value is better than the other). Both files must be in standard format.

The script Cat.script will process all files from two directories containing the new and old data files, into new files in a third directory.

Interpolating Precipitation Data:

The program Interp.Precip creates gridded output files from a set of precipitation stations. Data station positions are listed in a station file, and must be filtered to contain only those data files available.

At this time the program does not accept a mask file, so output files are generated for every grid box. For the format of the output data see the Gridded Data Format below. Grids are defined by their upper left and lower right corners, and a resolution (all values in degrees).

The interpolation is a running mean, which uses 4 nearest neighbors. To provide a better spatial distribution of the nearest neighbors, a quadrant search was implemented.


UTILITY PROGRAMS

Identifying Stations in a Basin:

In order to simplify determing which stations fall within a specific region, the programs gstation.c and select_stations.c were developed by Bart Nijssen. These programs read through data files which contain location, and duration of record for sets of precipitation data, and weed out those that do not fall within a user provided mask file. The program gstations.c works with stations from the Wallis, Lettenmaier, Wood dataset, while select_stations.c reads through the entire NCDC dataset.

Both programs require the mask file name, out file name, and grid resolution as command line arguments. gstation.c must be run where it can find /nfs/hydro4/usr4/nijssen/station_lists/ptall.huc (this file is also located in /nfs/hydro4/usr2/lettenmaier/bart/stations). select_stations.c must be run so that it can located the file /nfs/hydro4/usr4/nijssen/station_lists/data (also found in /nfs/hydro4/usr2/lettenmaier/bart/stations).

The output files from gstation.c will be in the following format:

Center of grid box:    44.75    93.75
Stations: 
"MAPLE PLAIN                  21 " 27  5136  0 MN 45.00  -93.65   970.   7000000
Center of grid box:    44.75    93.25
Stations: 
"FARMINGTON 3NW               21 " 27  2737  0 MN 44.67  -93.18   980.   7000000
"MINNEAPOLIS WSFO AP          21 " 27  5435  0 MN 44.88  -93.22   834.   7000000
where

Stations NameState CodeStation ID??StateLatLong????
MINNEAPOLIS WSFO AP 21 2754350MN44.88-93.22834.7000000

The output files from select_stations.c will be in the following format:

46.25  91.75: 4755251 MINONG 5 WSW            46.04 091.52 01080  329 6407 9999
46.25  91.75: 4778921 SOLON SPRINGS           46.21 091.49 01080  329 4801 9999
46.25  90.25: 2041041 IRONWOOD DAILY GLOBE    46.28 090.11 01430  436 4801 9999
45.75  95.75: 2124764 ELBOW LAKE              45.59 095.58 01210  369 4808 8901
45.75  95.75: 2156384 MORRIS WC EXP STN       45.35 095.53 01140  347 4801 9999
45.75  95.25: 2101124 ALEXANDRIA FAA AIRPORT  45.52 095.23 01420  433 4801 9999
where

Grid LatGrid LongStation IDStation NameLatLong????Start Year and Month??
45.7595.252101124ALEXANDRIA FAA AIRPORT45.52095.230142043348019999

NOTE: Latitude and Longitude output from select_stations.c is in dd.mmsss format, not fractional degrees, dd.ddddd.

These files must then be extracted from the CD-ROMs as described above.

Filtering Station Files:

The program filter.sites will filter out stations which are either outside a region defined by the coordinates of the upper left and lower right, or not on a list of available files. This resulting station file can be used in all prcessing.

Statistics Data Check:

The program Stats.Precip can be run on the fixed precipitation files, and will compute the mean, min and max values for PRCP, TMAX, and TMIN. It is useful for checking data files for bad values. Output from this program is in the form of yearly tables, with statistics computed on a monthly basis.

The script Stats.script can be used to run the statistics program over an entire directory of data files. Separate statistics files are computed for each data file.


DATA FORMATS

NCDC Format:

Each record contains a month's data and is composed of the following variables:

Variable typeRecord NameOffsetLengthComments
char Record Type03unused
char Station ID38Used in File Names
char Element Type114"PRCP",TMAX", and "TMIN"
char Element Unit152unused
int Year 174
int Month 212
char Filler 234unused
int Num of Values273Always 31
Total Length30
The following structure repeats 31 times per line

Variable typeRecord NameOffsetLengthComments
int Day 02
int Hour 22Always "98"
char Sign 41
int Value 55
char Flag 1 101"M" for Missing, "A" for accumulation
int Flag 2 111"1" for data
Total Length 12
Total Record (One Line) 402
Missing data is marked by an "M" in Flag 1.
Months with no data do not appear in the file.

Standard Format:

Each record is composed of 4 lines. The first three lines contain 8 data structures, the 4th line varies with days in the month (4 for normal February, 7 for months with 31 days). Each Line contains the following variables:

Variable typeRecord NameOffsetLength
char Station ID010
char Element Type105
int Year 155
int Month 202
int 8 day week222
Total Length 24

The following structure repeats up to 8 times per line

Variable typeRecord NameOffsetLength
int Flag 02
int Value 25
Total Length 7

Maximum Line			80
Minimum Line			52
Missing data is marked by a value of -9999, and a flag of 9.

Gridded Daily File Format:

These files contain data for each grid point. The file name gives it's position within the grid (e.g. data_36.500_-100.500.dat is for the grid box centered at 36.5 N and 100.5 W). Each line in the data file represents a new day's data and contains the following information:
Variable Type	Record Name
double		PRCP (in mm)
double		TMAX (in C)
double		TMIN (in C)
double		potential evaporation	Calculated using the Hamon equation
					This was kept in the new files to
					match Fayez's data files, however
					I believe that these values are unused
					in the model.
Each data value is separated by a " ".

Station File Information:

These files contain information about precipitation site names and locations. Each line is a different station, and is in the following format:
Variable TypeRecord NameOffsetLength
double Grid Latitde05(.2)
double Grid Longitude57(.2)
char ":" 121
char Station ID138
char Station Name2124
double Site Latitude456(.2)
double Site Longitude517(.2)
int ? 586
int Elevation 644
int Start of Record Year and Month (YYMM) 685
int Precent Coverage (*100) 735
char flag? 781

Mask File Information:

The basin mask file used by gstation.c and select_stations.c must be in the following format:

An example of the mask file, taken from the Red River at 1 Degree by 1 Degree Resolution:

40
107
9
17
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 

Return to Data Types Menu
Return to Work Links Page
Hydrology Homepage / University of Washington / hydro@hydro.washington.edu