|
|
|
Programming options in Unix
Last revision August 5, 2004
If you do a lot of data analysis or computation in your research or classes,
eventually you may need to write some kind of program of your own.
- You may need to do things that cannot be done with existing programs.
- You may find that existing programs can do what you want in a pipeline, but
it is too long or cumbersome to use by typing on the command line.
- You may find yourself needing to do a repetitive task that involves putting
together existing programs to operate on different files. Here, you do not want
to have to retype the pipeline or set of commands every time.
There are three basic alternatives for writing your own programs.
- Simply a set of normal shell commands collected together into a file, or script,
that can be executed by merely typing the name of the script as a command.
- The ability to specify arguments to the script and to substitute their values
within the script makes it possible to generalize a sequence of commands.
- Shell scripts also have testing and flow-control statements (if-then-else;
loops) that make them able to conditionally or repetitively execute programs.
- awk is a pattern-scanning and processing language. It is very
good for writing small programs to transform a data file from one format to another.
awk scripts are often executed from shell scripts that have correctly
set up the file arguments first.
- The current (1985) version of awk has extensions that allow it
to process arguments and call other programs, like a shell script, and use functions
like a compiled programming language.
- awk can also be used to prototype programs that can then be re-written
in a compiled language (like C) to execute more efficiently.
- Another good interpreted language that is becoming very popular is perl.
This language combines the functions of shell scripts and awk and
many functions of the C language. It is very powerful, but correspondingly complex.
- Use a high-level compiled language. The basic choices are C,
C++, and Fortran . We also have Pascal
and java compilers on pangea. We do not support other languages such
as Basic or Modula-2 on pangea. Translator programs
are sometimes available to convert the lesser used languages to C,
which can then be compiled, for example, p2c,
which translates Pascal programs into C programs.
- These languages offer more power than the scripting languages. Complicated
programs also execute much faster because they have been compiled, or translated,
into the native machine language of the computer.
- There are well developed tools for maintaining large programming projects
in these languages (e.g., make) and for debugging executing programs
(e.g., dbx).
- The C language was really developed in conjunction with Unix
by many of the same people. It is the language of choice for Unix because it has
the most power, flexibility, and portability.
- The C Programming Language, by Brian W. Kernighan and Dennis
M. Ritchie, is the standard reference book for the language written by its inventors.
The latest version of this book describes the ANSI standard C. This
book is considered difficult to read by many. There is a wide selection of other
books available in the bookstore.
- The 1977 ANSI standard Fortran (called Fortran 77)
is also implemented on Unix in a way that produces object code compatible with
C language routines.
- Fortran is known and preferred by many because much existing
software has been written in Fortran and because it was the first
high level language implemented on many machines.
- Any good book that describes Fortran 77 can be used as a reference
for syntax and techniques. A standard reference used by many here is Fortran
77, by Harry Katzan. This is more of a reference than tutorial. Another
reference for the Fortran language that looks good is the Professional
Programmer's Guide to Fortran 77, by Clive Page, which has been made freely
available on the web by its author, along with links to many other Fortran
references and resources, at
http://www.star.le.ac.uk/~cgp/fortran.html
|