220 likes | 328 Views
FORTRAN Short Course Week 4. Kate Thayer-Calder March 10, 2009. Topics for this week. Searching in Unix Grep, Regular Expressions Multi-Dimensional Arrays User Defined Datatypes Missing data Reading and writing scientific data. Unix Wildcards.
E N D
FORTRAN Short CourseWeek 4 • Kate Thayer-Calder • March 10, 2009
Topics for this week • Searching in Unix • Grep, Regular Expressions • Multi-Dimensional Arrays • User Defined Datatypes • Missing data • Reading and writing scientific data
Unix Wildcards • * - matches all files with none or more of the pattern • ls *a returns all files ending in ‘a’ • ls a* returns all files starting with ‘a’ • ? - matches exactly one character • ?ouse would return house and mouse, but not grouse.
grep • Searches through a file looking for a specific string or pattern, returns the lines where the string occurs • grep -i ‘alien’ ufo.txt (case insensitive) • grep -w ‘abduct’ ufo.txt (whole word only) • grep -riw ‘saucer’ * (recursively thru subdirectories) • or lines where it does not occur: • grep -v ‘censor’ ufo.txt • Can grep multiple files, just add them to the list on the line: grep -i ‘parameterization’ *.txt • Unix lexicon: “Can’t grep dead trees.”
Regular Expressions • aka RegEx, is a special string for describing a pattern of text • RegExs can be used with grep or other unix commands and programs for sifting through text • They can get really huge, confusing, and powerful, we’ll just look at a few simple options. • For more: http://www.regular-expressions.info or just man regex
grep combinations • Just some ideas for using grep with other Unix commands: • ls -al | grep ‘Jan’ • ps -ef | grep ‘501’ • man ftp | grep -i ‘directory’ • head -30 ‘mydata.txt’ | grep ‘temperature’
But... • Searching has become much easier than it once was, usually your desktop search engine will filter through files looking for your keywords • So, let’s talk about more Fortan!
Multi Dimensional Arrays • type, dimension(dim1,dim2,...) :: name • REAL, dimension(lon,lat,height,time) :: temp • Higher dimensional arrays are usually stored contiguously in memory or binary files, in COLUMN MAJOR order • See example Multiarrays.f90
Column Major • Fortran fills up each dimension in order • So for i,j,k array, i fills first, then j, then k • But do loops work inside out • Write out k first, then j, then i • To fix this, write your do-loops from the last index in to the first. • Do time=1,days • Do lon=1,360 • Do lat=1,180 • Read (10,fmt) Data(lat,lon,time) • enddo • enddo • enddo
Array Transformation lons • Reshape function is pretty cool • Matrix = RESHAPE( Source, Shape ) • A = RESHAPE( B, (/3,2/) ) • Another way to index your array elements uses ‘mod’ and integer division • lat = array(MOD(i,num_lats)+1) • lon = array(i/num_lats + 1) lats
Allocatable Arrays • Sometimes, you don’t know how large you want your array to be until runtime. • Fortran 90 has “allocatable arrays” that can be declared without fixed dimensions, and filled in when the program is running. • These can be filled from stdin, or a variable in a file, or a calculation based on previous work, or any other run-time value. • See example Multiarrays2.f90
WHERE statements • An easy way to initialize or set sections of arrays • WHERE (array expression) • array assignment block • ELSEWHERE • array assignment block 2 • END WHERE • This is called “masking”
FORALL Construct • This statement indicates to the compiler that the operations can be performed in parallel (no operations depend on the value of the operation on other elements in the array) • FORALL (triplet) • variable = expression
Atmospheric Data • You’ll see data stored in arrays in many ways: • MyData(pressure, temp, mixingratio, height) • MyPressure(height), MyTemp(height), MyMixingRatio(height) • Pressure(lat,lon,height,time), Temperature(lat,lon,height,time)
The Perils of Parallel Arrays • It is common in our science to see people using multiple arrays of data that are all the same shape but for different variables (Temperature, Pressure, u wind, v wind, ...) • This is considered bad form in computer science, it would be better to have one array with multiple values possible at each point. Why? • This gets confusing if you implement a 5-D array, however.
User Defined Data Types • Fortran gives us a nice way to describe more complex data structures by creating new data types. • Instead of 4 arrays with different variables in each, we can have one array with four values at each point. • TYPE name • DataType :: Component_name • .... • END TYPE name • We can create variables with this type or arrays of variables of this type • TYPE (name) :: VariableName • TYPE (name), Dimension(d1,d2,d3,d4) :: ArrayName • Example: StdAtmos.f90
INF and NaN • INF is defined as the value given to any Real that is outside the limits of the type. • Fortran has +INF and -INF • NaN (Not a Number) is produced as the result of an improper floating point calculation. • NaN is not equal to either INF. In fact, in the IEEE standard, NaN is not even equal to itself. • INF or NaN are occasionally used as placeholders for missing data. • See Example: WriteExample2.f90
Missing Data • Any observational dataset is going to have holes. • If missing data is not given as an “outside the bounds” value (-9999 or 9999.0) it is often replaced with INF or NaN. • Most Fortran implementations will read INF or NaN in as a Real value (it is a real Real), we need to check for it before doing calculations, or we’ll get a runtime error. • See Example: ReadBadData.f90
NetCDF Data • NetCDF is an I/O library that is widely used in the earth sciences. • Once the files are installed, you can use their procedures to open and access the files • Each files is “self-describing,” all of the data is annotated (dimension, units, range of values, missing data values, etc...) • Examples: read_netCDF.f90 with data from NCEP (NCEP.Precip.0100-1204.nc)
Zonal Average Example • Modelers and Dynamicists like to look at the atmosphere in latitudinal bands. • Don’t have to worry about missing data here... • Loading in precip data is pretty simple if you know the parameters. • When you do a zonal average, first average in time at each point and then average across all longitudes. • Could come up with a less memory intensive way to get the same result... • Example: PlayWithPrecip.f90
What did we talk about? • Searching in Unix • Grep, Regular Expressions • Multi-Dimensional Arrays • User Defined Datatypes • Missing data • Reading and writing scientific data