General
Nature: systems language; procedural
language
History: C was developed from
1969-1972 by Dennis Ritchie (with assistance by Brian W. Kernighan) of Bell
Telephone Laboratories for use in systems programming for UNIX.
“C
is a computer programming language. It was developed out of the construction of
the UNIX operating system. It has a modular programming structure and is thus
useful in object oriented programming, as well as in developing graphical user
interfaces. C++ is a superset of C. Other dialects
include Small-C and Visual C.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.
Hello World example
main()
{
printf("Hello World\n");
}
Structure
Format: free form
Lexical
elements
source
code character set:
A C compiler may use any character set that includes at least the following
characters: the 52 upper case and lower case alphabetic characters ( A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c
d e f g h i j k l m n o p q r s t u v w x y z ), the 10 decimal digits (
0 1 2 3 4 5 6 7 8 9 ), the blank or space
character, and 29 designated graphic characters ( !
# % ^ & * ( ) - _ + = ~ [ ] \ | ; : ' " { } , . < > / ?
). Five formatting characters (backspace, horizontal tab, verticle tab, form
feed, and carriage return) are often used in C (formatting characters are
treated as spaces). The dollar sign ($) and the at sign (@) are also commonly
used (but not required by the standard). Some form of line separator is
required, but it doesn’t have to be an actual character or character sequence.
Execution
character set: The
execution character set for C is required to have the standard characters of
the source code character set, plus a null character and a newline character.
The null character must have the value 0 and is used to mark the end of
strings. The newline character is used to divide character streams into lines
during input or output. Run time libraries may convert between the newline
character and some other character(s) (or lack of characters) during execution
(such as compacting the carriage return/line feed combination into the newline
character or generating the newline character at the end of a logical record or
transforming between various record separators and the newline character).
White
space: White space
in C includes the blank (space character), horizontal tab, end-of-line,
vertical tab, form feed, and comments. White space is ignored by the compiler
(except when required to separate tokens or when used in a character or string
constant), and therefore can be used freely by the programmer to make the
program easy for a human to read. Some implementations of C treat nonstandard
source characters as either white space or line breaks.
Line
termination: Each
line in a C source program is terminated with an end-of-line character or
character sequence. Optionally, certain formatting characters (such as carriage
return, form feed or vertical tab) can also terminate lines. An empty line is a
line that consists of only a terminating character or character sequence or
white space and line termination. A logical source line can be continued past a
line termination by using the backslash character (\) or the ANSI C trigraph
??/ immediately before the line termination. String constants and preprocessor
command lines can cross line breaks through the use of logical source lines. In
some implementations of C, tokens can also cross line breaks through the use of
logical source lines.
line
length: Many C
compilers impose a maximum line length (both for physical source lines and for
logical source lines). ANSI C requires logical source lines of at least 509
characters.
Escape
characters: The backslash
character (\) is used as an escape character, allowing a programmer to include
characters that would normally have a special meaning for the compiler.
Alternative
characters: ANSI C
includes nine trigraphs (three character sequences) for encoding required
characters outside of the ISO 646-1083 Invariant Code Set. These trigraphs
always start with two consecutive question marks. These are:
Trigraph
|
normal
|
??(
|
[
|
??)
|
]
|
??<
|
{
|
??>
|
}
|
??/
|
\
|
??!
|
|
|
??'
|
^
|
??-
|
~
|
??=
|
#
|
Multibyte
characters: ANSI C
supports both wide characters and multibyte characters.
Wide
characters are binary characters that are more than one byte, typically used
for expressing large alphabets.
Multibyte
characters are the external representation of a wide character, in either the
source or exeuction character set.
Comments: Comments are started by the
occurence of the two character sequence /* at any time other than within a
character or string constant. Comments are terminated by the two character
sequence */. ANSI C requires that comments be replaced with a single space
character, but many C compilers remove comments without inserting a space
character. Some non-UNIX C compilers allow “nestable comments”, which violates
both original and ANSI C. To comment out large sections of a C program, use
preprocessor commands:
#if 0
…
#endif
Tokens
A
C compiler always collects characters into the longest possible tokens, even if
the result is not valid C. White space always divides tokens. White space can
be used to prevent misinterpretation of C source code (for example,
x--y
would be tokenized as the illegal x -- y
[combining the two hyphens into a
single token], while x - -y
would be toeknized as the valid x - - y
). White space must be used to
separate an identifier, reserved word, integer constant, floating point
constant from a following identifier, reserved word, integer constant, or
floating point constant. Although ANSI C requires that comments be replaced by
white space, many compilers don’t, which can lead to unwanted token merging.
Operators: C has 15 simple operators ( ! % ^ & * - + = ~ | . < > / ? ), 11
compound assignment operators ( += -= *= /= %=
<<= >>= &= ^= |= == ), and 10 other compound operators (
-> ++ -- << >> <= >= !=
&& || ).
Separators: C has 9 separator tokens ( ( ) [ ] { } , ; : ).
Creating
a program
The
typical steps for compiling a program in C on a UNIX machine are:
step
|
command
|
input
|
output
|
create source code
|
ed
emacs use any text editor |
type from keyboard or
terminal
|
source code
|
check
(for lexical errors) |
lint
|
source code file
|
listing with warnings
|
preprocess
|
cc
(or cpp) |
source code file
|
c code file
|
compile
(convert to assembly for specific hardware platform) |
cc2
|
c code file
|
assembly source code
file
|
assemble
(for specific hardware platform) |
asm
(or as) (or masm) |
assembly language file
|
a.out
object code file |
link
|
link
|
object code file
|
executable code
|
run
|
program name
|
file with
executable code |
results of program
|
Porting
www.digital.com/info/porting_assistant “The Digital Porting Assistant
(available for Digital UNIX 3.2, and shipped as part of the developer toolkit
on Digital UNIX 4.0) is a graphical environment which aids in the porting
process. In addition to doing lint-like checking of C and Fortran code, it also
contains extensive on-line help regarding developing software on Digital UNIX.”