Precipitation Station Data
This file describes the steps and programs involved in processing
precipitation station data starting with data format conversions, and
ending with interpolated data fields for use with the VIC model.
Unless otherwise noted:
- source code and makefiles are in /nfs/meter/home/cherkaue/source/precip
- executables are in /nfs/meter/home/cherkaue/bin/plane
- scripts are in /nfs/meter/home/cherkaue/source/precip, and should be
edited for each processing task
To simplify downloading of files, use the right mouse button to save the link
directly to disk.
File Contents
In order to reduce the work involved in selecting stations for large areas, two
programs were developed to read through files containing station
locations and information about period of record.
Once the IDs of the desired stations have been determined, the data files must
be extracted from the CD-ROMs. Actual retrieval is not automated, you will have
to retrieve the stations one at a time (not as bad as it sounds), or write a batch
file or program yourself. Station data on the Wallis, Lettenmaier, Wood
CD-ROM, can be accessed directly from any PC CD-ROM drive, or by
mounting
the CD on the UNIX systems. The EarthInfo Summary of the Day CD-ROM must be
accessed using the EarthInfo software (program sd, mounted on the PCs in Wilcox 159).
This software makes extracting single stations fairly easy, but can be a
problem when retrieving large numbers of stations. For multiple stations
it is easier to use the CD-ROM software to select all of the
stations and put them into a single file. reform.c will then extract individual
stations files, from the larger data file.
The data you retrieve from the EarthInfo CDs have to be daily values in NCDC
format (you can specify these options when you retrieve the data files
from the CD's). Because the program uses the filename for the
precipitation stations to check what actions have to be undertaken,
you will have to follow some conventions in naming the files, or you will
have to adjust some of the code in some of the programs (These naming conventions
are followed automatically by NCDC.split.f when it extracts individual station
files). The naming conventions I followed are:
- if the station is from the Wallis, Lettenmaier, and Wood dataset,
the name looks as follows:
where s indicates that the station is already complete, AA is the
state number, and BBBB is the station ID.
- if the station is from the summary of the day dataset, then the name
looks as follows:
where n indicates that the station in not yet completed, and AA and
BBBB mean the same as before
- The length of the filename should be 12 characters (including the .)
for US stations, and 13 characters for Canadian stations. This
because the programs use the file length to see whether the data are
in degrees Fahrenheit and hundreds of inches or in degrees Celsius
and tenths of mm
State Numbers |
| NCDC | State Name | W-L-W |
| 01 | ALABAMA | 01 |
| 02 | ARIZONA | 04 |
| 03 | ARKANSAS | 05 |
| 04 | CALIFORNIA | 06 |
| 05 | COLORADO | 08 |
| 06 | CONNECTICUT | 09 |
| 07 | DELAWARE | 10 |
| 08 | FLORIDA | 12 |
| 09 | GEORGIA | 13 |
| 10 | IDAHO | 16 |
| 11 | ILLINOIS | 17 |
| 12 | INDIANA | 18 |
| 13 | IOWA | 19 |
| 14 | KANSAS | 20 |
| 15 | KENTUCKY | 21 |
| 16 | LOUISIANA | 22 |
| 17 | MAINE | 23 |
| 18 | MARYLAND | 24 |
| 19 | MASSACHUSETTS | 25 |
| 20 | MICHIGAN | 26 |
| 21 | MINNESOTA | 27 |
| 22 | MISSISSIPPI | 28 |
| 23 | MISSOURI | 29 |
| 24 | MONTANA | 30 |
| 25 | NEBRASKA | 31 |
| 26 | NEVADA | 32 |
| 27 | NEW HAMPSHIRE | 33 |
| 28 | NEW JERSEY | 34 |
| 29 | NEW MEXICO | 35 |
| 30 | NEW YORK | 36 |
| 31 | NORTH CAROLINA | 37 |
| 32 | NORTH DAKOTA | 38 |
| 33 | OHIO | 39 |
| 34 | OKLAHOMA | 40 |
| 35 | OREGON | 41 |
| 36 | PENNSYLVANIA | 42 |
| 37 | RHODE ISLAND | 44 |
| 38 | SOUTH CAROLINA | 45 |
| 39 | SOUTH DAKOTA | 46 |
| 40 | TENNESSEE | 47 |
| 41 | TEXAS | 48 |
| 42 | UTAH | 49 |
| 43 | VERMONT | 50 |
| 44 | VIRGINIA | 51 |
| 45 | WASHINGTON | 53 |
| 46 | WEST VIRGINIA | 54 |
| 47 | WISCONSIN | 55 |
| 48 | WYOMING | 56 |
| 49 | unassigned |
| 50 | ALASKA | 02 |
| 51 | HAWAII | 15 |
| 66 | PUERTO RICO | 72 |
| 67 | VIRGIN ISLANDS | 78 |
| 91 | PACIFIC ISLANDS |
| DISTRICT OF COL. | 11 |
| AMERICAN SAMOA | 60 |
| GUAM DISTRICT | 66 |
Data extracted from the Wallis, Lettenmaier, Wood CDs is already in the
proper format for processing. Data extracted from the EarthInfo CDs must
be converted from NCDC format to the standard format. There are three
possible steps involved with converting the NCDC format:
- Look at the raw data file using more. If the data has been written
to the file as a single line, run the file through NCDC.filter to add
carriage returns to the end of each record:
cat filename | NCDC.filter > outfile
This filter was designed for data retrieved with the newest EarthInfo software
- Next run the data through reorder.c, this will reorder the data, and
throw out stations that do not have TMAX, TMIN, and PRCP data (it will also
strip out other data types, like SNOW):
reorder inputfile outputfile
This program sorts the data file so that records are ordered TMAX, TMIN, PRCP (Hydroclimate software puts PRCP first, and reform will fail)
- Run reform.c on the filtered data file:
reform begin year inputfile outputpath
The program Estimate.Precip will estimate missing data records by using
measurements from the closest of the nearest 10 stations which has data.
The program Interp.Precip will create gridded data files the contain the
daily precipitation, maximum and minimum temperature and Hamon evaporation.
Interp.Precip sitefile datadir outdir startdate enddate ullat ullng lrlat lrlng resolution
All data processing requires that the data files be converted to
a standard format. For a description of formats, see below.
The NCDC CDROM produced NCDC format Summary of the Day files have no
line breaks between records. In order for reform.c to work, carriage
returns must be inserted to split each record. The program
NCDC.filter reads a raw NCDC file
from standard input, and writes a corrected file to standard output. To
use this program type:
cat < filename > | NCDC.filter > outfile
The program reform converts an
individual data file from NCDC format
to the standard format. Inputs are a start year,
a file in NCDC format, and a path to which write standard format files.
The starting year ensures that all files begin at the
same time, and that records missing from the start of the files can be fixed.
This program will extract multiple station files which contain TMAX, TMIN
and PRCP records, and create files using the Wallis,
Lettenmaier, Wood naming conventions.
The script NCDC.script will read through a directory of data files, and
convert all of them.
Data files often included missing data, both as day measurements, or
complete months. The program Estimate.Precip
reads standard format data files, and uses the 10 closest stations to
estimate missing data. Inputs to this program include a
station location file, a directory containing the site data files,
and the start and end dates.
This program will stop running if it cannot find 5 of the 10 possible
nearest neighbor files. To prevent this from happening, it is best
to filter the station file, so that it contains only
the stations used.
Output from this program is a set of filled station logs from start to
end date.
The program Cat.Precip will tack a more
recent precpitation file onto the end of an older data file. Where
the two files overlap, the program will compare the new and old values.
It will then print differences to stderr, and average the two values
(unless the data flags indicate one value is better than the other).
Both files must be in standard format.
The script Cat.script will process all files from two directories
containing the new and old data files, into new files in a third
directory.
The program Interp.Precip creates
gridded output files from a set of precipitation
stations. Data station positions are listed in a
station file, and must be filtered to contain only those data
files available.
At this time the program does not accept a mask file, so output files
are generated for every grid box. For the format of the output data see
the Gridded Data Format below. Grids are defined
by their upper left and lower right corners, and a resolution (all
values in degrees).
The interpolation is a running mean, which uses 4 nearest neighbors.
To provide a better spatial distribution of the nearest neighbors, a
quadrant search was implemented.
In order to simplify determing which stations fall within a specific region,
the programs gstation.c and
select_stations.c were developed by
Bart Nijssen.
These programs read through data files which contain location, and duration of
record for sets of precipitation data, and weed out those that do not fall
within a user provided mask file. The program gstations.c works with stations
from the Wallis, Lettenmaier, Wood dataset, while select_stations.c reads
through the entire NCDC dataset.
Both programs require the mask file name, out file name, and grid resolution
as command line arguments. gstation.c must be run where it can find /nfs/hydro4/usr4/nijssen/station_lists/ptall.huc (this file is also located in
/nfs/hydro4/usr2/lettenmaier/bart/stations).
select_stations.c must be run so that it can located the file
/nfs/hydro4/usr4/nijssen/station_lists/data (also found in
/nfs/hydro4/usr2/lettenmaier/bart/stations).
The output files from gstation.c will be in the following format:
Center of grid box: 44.75 93.75
Stations:
"MAPLE PLAIN 21 " 27 5136 0 MN 45.00 -93.65 970. 7000000
Center of grid box: 44.75 93.25
Stations:
"FARMINGTON 3NW 21 " 27 2737 0 MN 44.67 -93.18 980. 7000000
"MINNEAPOLIS WSFO AP 21 " 27 5435 0 MN 44.88 -93.22 834. 7000000
where
| Stations Name | State Code | Station ID | ?? | State | Lat | Long | ?? | ?? |
| MINNEAPOLIS WSFO AP 21 | 27 | 5435 | 0 | MN | 44.88 | -93.22 | 834. | 7000000 | |
The output files from select_stations.c will be in the following format:
46.25 91.75: 4755251 MINONG 5 WSW 46.04 091.52 01080 329 6407 9999
46.25 91.75: 4778921 SOLON SPRINGS 46.21 091.49 01080 329 4801 9999
46.25 90.25: 2041041 IRONWOOD DAILY GLOBE 46.28 090.11 01430 436 4801 9999
45.75 95.75: 2124764 ELBOW LAKE 45.59 095.58 01210 369 4808 8901
45.75 95.75: 2156384 MORRIS WC EXP STN 45.35 095.53 01140 347 4801 9999
45.75 95.25: 2101124 ALEXANDRIA FAA AIRPORT 45.52 095.23 01420 433 4801 9999
where
| Grid Lat | Grid Long | Station ID | Station Name | Lat | Long | ?? | ?? | Start Year and Month | ?? |
| 45.75 | 95.25 | 2101124 | ALEXANDRIA FAA AIRPORT | 45.52 | 095.23 | 01420 | 433 | 4801 | 9999 |
NOTE: Latitude and Longitude output from select_stations.c is in dd.mmsss format, not fractional degrees, dd.ddddd.
These files must then be extracted from the CD-ROMs as described above.
The program filter.sites will filter out
stations which are either outside a region defined by the coordinates
of the upper left and lower right, or not on a list of available files.
This resulting station file can be used in all prcessing.
The program Stats.Precip can be run on the
fixed precipitation files, and will compute the
mean, min and max values for PRCP, TMAX, and TMIN. It is useful for
checking data files for bad values. Output from this program is in
the form of yearly tables, with statistics computed on a
monthly basis.
The script Stats.script can be used to run the statistics program over
an entire directory of data files. Separate statistics files are computed
for each data file.
Each record contains a month's data and is composed of the following variables:
| Variable type | Record Name | Offset | Length | Comments |
| char | Record Type | 0 | 3 | unused |
| char | Station ID | 3 | 8 | Used in File Names |
| char | Element Type | 11 | 4 | "PRCP",TMAX", and "TMIN" |
| char | Element Unit | 15 | 2 | unused |
| int | Year | 17 | 4 | |
| int | Month | 21 | 2 | |
| char | Filler | 23 | 4 | unused |
| int | Num of Values | 27 | 3 | Always 31 |
| Total Length | 30 |
The following structure repeats 31 times per line
| Variable type | Record Name | Offset | Length | Comments |
| int | Day | 0 | 2 | |
| int | Hour | 2 | 2 | Always "98" |
| char | Sign | 4 | 1 | |
| int | Value | 5 | 5 | |
| char | Flag 1 | 10 | 1 | "M" for Missing, "A" for accumulation |
| int | Flag 2 | 11 | 1 | "1" for data |
| Total Length | 12 |
| Total Record (One Line) | 402 |
Missing data is marked by an "M" in Flag 1.
Months with no data do not appear in the file.
Each record is composed of 4 lines. The first three lines contain 8 data
structures, the 4th line varies with days in the month (4 for normal February,
7 for months with 31 days). Each Line contains the following variables:
| Variable type | Record Name | Offset | Length |
| char | Station ID | 0 | 10 |
| char | Element Type | 10 | 5 |
| int | Year | 15 | 5 |
| int | Month | 20 | 2 |
| int | 8 day week | 22 | 2 |
| Total Length | 24 |
The following structure repeats up to 8 times per line
| Variable type | Record Name | Offset | Length |
| int | Flag | 0 | 2 |
| int | Value | 2 | 5 |
| Total Length | 7 |
Maximum Line 80
Minimum Line 52
Missing data is marked by a value of -9999, and a flag of 9.
These files contain data for each grid point. The file name gives it's
position within the grid (e.g. data_36.500_-100.500.dat is for the grid
box centered at 36.5 N and 100.5 W). Each line in the data file represents
a new day's data and contains the following information:
Variable Type Record Name
double PRCP (in mm)
double TMAX (in C)
double TMIN (in C)
double potential evaporation Calculated using the Hamon equation
This was kept in the new files to
match Fayez's data files, however
I believe that these values are unused
in the model.
Each data value is separated by a " ".
These files contain information about precipitation site names and locations.
Each line is a different station, and is in the following format:
| Variable Type | Record Name | Offset | Length |
| double | Grid Latitde | 0 | 5(.2) |
| double | Grid Longitude | 5 | 7(.2) |
| char | ":" | 12 | 1 |
| char | Station ID | 13 | 8 |
| char | Station Name | 21 | 24 |
| double | Site Latitude | 45 | 6(.2) |
| double | Site Longitude | 51 | 7(.2) |
| int | ? | 58 | 6 |
| int | Elevation | 64 | 4 |
| int | Start of Record Year and Month (YYMM) | 68 | 5 |
| int | Precent Coverage (*100) | 73 | 5 |
| char | flag? | 78 | 1 |
The basin mask file used by gstation.c and select_stations.c must be in the
following format:
- Northwest Latitude (degrees)
- Northwest Longitude (degrees)
- Number of Rows
- Number of Columns
- Mask Grid
- 0 - Not in Region
- 1 - In Region
An example of the mask file, taken from the Red River at 1 Degree by 1 Degree
Resolution:
40
107
9
17
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
Return to Data Types Menu
Return to Work Links Page
Hydrology Homepage /
University of Washington /
hydro@hydro.washington.edu