C


General


Nature: systems language; procedural language


History: C was developed from 1969-1972 by Dennis Ritchie (with assistance by Brian W. Kernighan) of Bell Telephone Laboratories for use in systems programming for UNIX.
“C is a computer programming language. It was developed out of the construction of the UNIX operating system. It has a modular programming structure and is thus useful in object oriented programming, as well as in developing graphical user interfaces. C++ is a superset of C. Other dialects include Small-C and Visual C.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

Hello World example


#include

main()
{
   printf("Hello World\n");
}

 

Structure


Format: free form

Lexical elements


source code character set: A C compiler may use any character set that includes at least the following characters: the 52 upper case and lower case alphabetic characters ( A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z ), the 10 decimal digits ( 0 1 2 3 4 5 6 7 8 9 ), the blank or space character, and 29 designated graphic characters ( ! # % ^ & * ( ) - _ + = ~ [ ] \ | ; : ' " { } , . < > / ? ). Five formatting characters (backspace, horizontal tab, verticle tab, form feed, and carriage return) are often used in C (formatting characters are treated as spaces). The dollar sign ($) and the at sign (@) are also commonly used (but not required by the standard). Some form of line separator is required, but it doesn’t have to be an actual character or character sequence.

Execution character set: The execution character set for C is required to have the standard characters of the source code character set, plus a null character and a newline character. The null character must have the value 0 and is used to mark the end of strings. The newline character is used to divide character streams into lines during input or output. Run time libraries may convert between the newline character and some other character(s) (or lack of characters) during execution (such as compacting the carriage return/line feed combination into the newline character or generating the newline character at the end of a logical record or transforming between various record separators and the newline character).

White space: White space in C includes the blank (space character), horizontal tab, end-of-line, vertical tab, form feed, and comments. White space is ignored by the compiler (except when required to separate tokens or when used in a character or string constant), and therefore can be used freely by the programmer to make the program easy for a human to read. Some implementations of C treat nonstandard source characters as either white space or line breaks.

Line termination: Each line in a C source program is terminated with an end-of-line character or character sequence. Optionally, certain formatting characters (such as carriage return, form feed or vertical tab) can also terminate lines. An empty line is a line that consists of only a terminating character or character sequence or white space and line termination. A logical source line can be continued past a line termination by using the backslash character (\) or the ANSI C trigraph ??/ immediately before the line termination. String constants and preprocessor command lines can cross line breaks through the use of logical source lines. In some implementations of C, tokens can also cross line breaks through the use of logical source lines.

line length: Many C compilers impose a maximum line length (both for physical source lines and for logical source lines). ANSI C requires logical source lines of at least 509 characters.

Escape characters: The backslash character (\) is used as an escape character, allowing a programmer to include characters that would normally have a special meaning for the compiler.

Alternative characters: ANSI C includes nine trigraphs (three character sequences) for encoding required characters outside of the ISO 646-1083 Invariant Code Set. These trigraphs always start with two consecutive question marks. These are:

Trigraph
normal
??(
[
??)
]
??<
{
??>
}
??/
\
??!
|
??'
^
??-
~
??=
#

Multibyte characters: ANSI C supports both wide characters and multibyte characters.

Wide characters are binary characters that are more than one byte, typically used for expressing large alphabets.

Multibyte characters are the external representation of a wide character, in either the source or exeuction character set.

Comments: Comments are started by the occurence of the two character sequence /* at any time other than within a character or string constant. Comments are terminated by the two character sequence */. ANSI C requires that comments be replaced with a single space character, but many C compilers remove comments without inserting a space character. Some non-UNIX C compilers allow “nestable comments”, which violates both original and ANSI C. To comment out large sections of a C program, use preprocessor commands:

#if 0
 
#endif

Tokens


A C compiler always collects characters into the longest possible tokens, even if the result is not valid C. White space always divides tokens. White space can be used to prevent misinterpretation of C source code (for example, x--y would be tokenized as the illegal x -- y [combining the two hyphens into a single token], while x - -y would be toeknized as the valid x - - y). White space must be used to separate an identifier, reserved word, integer constant, floating point constant from a following identifier, reserved word, integer constant, or floating point constant. Although ANSI C requires that comments be replaced by white space, many compilers don’t, which can lead to unwanted token merging.

Operators: C has 15 simple operators ( ! % ^ & * - + = ~ | . < > / ? ), 11 compound assignment operators ( += -= *= /= %= <<= >>= &= ^= |= == ), and 10 other compound operators ( -> ++ -- << >> <= >= != && || ).

Separators: C has 9 separator tokens ( ( ) [ ] { } , ; : ).

 

Creating a program


The typical steps for compiling a program in C on a UNIX machine are:
step
command
input
output
create source code
ed
emacs
use any text editor
type from keyboard or terminal
source code
check
(for lexical errors)
lint
source code file
listing with warnings
preprocess
cc
(or cpp)
source code file
c code file
compile
(convert to assembly for specific hardware platform)
cc2
c code file
assembly source code file
assemble
(for specific hardware platform)
asm
(or as)
(or masm)
assembly language file
a.out
object code file
link
link
object code file
executable code
run
program name
file with
executable code
results of program

 

Porting

www.digital.com/info/porting_assistant “The Digital Porting Assistant (available for Digital UNIX 3.2, and shipped as part of the developer toolkit on Digital UNIX 4.0) is a graphical environment which aids in the porting process. In addition to doing lint-like checking of C and Fortran code, it also contains extensive on-line help regarding developing software on Digital UNIX.”