=================================== ===== Learning Manual for the ===== == SAFE-C Programming Language == =================================== This document describes Samuro's SAFE-C programming language for people who wish to learn the complete language. Summmary: the SAFE-C programming language keeps the simplicity, fast speed and flavor of the C programming language while slightly modifying rules so that memory access becomes safe. The goal of Safe-C is that faulty programs are stopped immediately either at compilation or at the runtime point of the error so that correction is easy and fast. The benefit is that the maintenance cost of a software product drastically drops and customer satisfaction increases. =========================================================================== Belgium, March 2011, final version 1.41, contact: marcsamu@hotmail.com =========================================================================== 0. Concepts of Safe-C 1. Lexical Elements 2. Data Types 3. Declarations 4. Names 5. Primaries 6. Expressions 7. Statements 8. Packages 9. Generics 10. Compilation Units 11. Implementation Issues 12. Libraries Appendix A : International support B : Unicode conversion for source files =========================================================================== 0. Concepts of Safe-C ===================== Here's a small reminder of the C language family evolution : 1972 : C 1983 : C++ 1995 : Java 2001 : C# - C provides the minimum basic concepts to write a program. - C++ adds classes and exceptions. - Java and C# introduce a new syntax, the concepts of memory safety, virtual machine, and automatically reclaimed heap objects. The more we advance in time, the more programming languages become large and complex to understand. Many new concepts are introduced for "marketing reasons", because other competing languages "have this too". If you look at programs developped by the really skilled programmers, like those who develop the Windows or Unix operating systems, you will perhaps be surprised to learn that they don't use any of these new languages. Yes, they still use the C from 1972. Why ? Because C gives the skilled programmers the freedom to write the best programs whichs speed can't be matched by any other language available. Yes, the skilled professional programmers all use C, because all the concepts added by the other languages "are not really needed" ! So why do we introduce Safe-C, if C is that good ? C is not a very strict language : a C compiler allows the programmer to make mistakes without complaining. Later on, the program might crash, or worse it might have defects that allow hackers to gain access to your computer. A professional skilled C programmer knows this, and is prepared to take this risk in order to have a program that is faster than its competitors. However things have changed since 1972 : average programs have become "much larger" in size, and even skilled programmers who make very few mistakes will still leave a number of undetected errors in any larger project. Programs require then frequent updates to correct these mistakes. Safe-C wants to drastically remove these errors through stricter language rules, but without adding complex concepts like other languages do because we want to keep the fast speed, simplicity and flavor of C. The goal of Safe-C is that faulty programs are stopped immediately either at compilation or at the runtime point of the error, so that correction is easy and fast. The benefit is that the maintenance cost of a software product drastically drops and customer satisfaction increases. =========================================================================== 1. Lexical Elements =================== 1.1 Source File --------------- The language supports source files in ANSI, as well as in UTF-8 and UTF-16 Unicode encodings. 1.2 Comments ------------ Single-line comments begin with "//" and continue to the end of the line. Multi-line comments begin with "/*" and end with "*/". Multi-line comments do not nest. The compiler might generate a warning if the characters /* are found in a comment. Example: // this is a single-line comment /* this is a multi-line comment */ 1.3 Identifiers --------------- Identifiers are composed of letters, digits or underscores. They must not begin with a digit. Example: total_amount MAX buffer2 Lower and upper case letters are distinct. 1.4 Integer Literals -------------------- Integer literals can have decimal, hexadecimal or binary forms. Example: 64 // compatible with any signed or unsigned integer 65_000_000 // underscores for improved readability 0xFF // hexadecimal value 0b1111_0000 // binary value 176L // long Underscores are allowed in integer literals for improved readability. The most negative integer value (-2^63) is not representable by an integer literal; the expression long'min can be used instead. If the literal has a suffix L it is of type long. C's octal numbers are not supported, the compiler will give a warning for any integer literal that begins with a zero digit followed by other digits. 1.5 Floating-Point Literals --------------------------- A floating-point literal can have the following types : - if a suffix f or F is provided, it has type 'float', - if a suffix d or D is provided, it has type 'double'. - if no suffix is provided, its type depends on the context. Example: 1.2f 3.023_223E+2d 0x80.0p-1 A hexadecimal floating-point literal begins with 0x and is used for precise machine representations : the mantissa is specified in base 16, and the exponent is specified in base 10. 1.6 Character Literals ---------------------- Characters literals are constant values of type char (or wchar in case of prefix 'L'). Example: 'a' 'é' '\n' L'\x00FF' // 'L' prefix indicates type wchar The following escape sequences exist : \' 0x0027 Single quote \" 0x0022 Double quote \\ 0x005C Backslash \0 0x0000 Null \a 0x0007 Alert \b 0x0008 Backspace \f 0x000C Form feed \n 0x000A New line \r 0x000D Carriage return \t 0x0009 Horizontal tab \v 0x000B Vertical tab hex_escape_sequence: \x hex_digit hex_digit (for type char) \x hex_digit hex_digit hex_digit hex_digit (for type wchar) 1.7 String Literals ------------------- String literals are constant values of type string (or wstring in case of prefix 'L'). Example: "A" "Hello\n" "Hello" + " World" L"Hallöchen" L"\x1234" String literals longer than 128 characters cause a compilation error, longer constant string literals can be built using the '+' concatenation operator. 1.8 Keywords ------------ The following keywords are reserved and cannot be used as identifiers : #begin #define #elif #else #end #endif #error #if #warning Lnul _asm abort assert body bool break byte case char clear const continue default double else end enum false float for free from generic if inline int int1 int2 int4 int8 long new nul null object out package packed public ref return run short sleep string struct switch tiny true typedef _unused uint uint1 uint2 uint4 union use ushort void volatile wchar while wstring note: "all" and "as" are not keywords. 1.9 Delimiters -------------- The following character sequences have special meanings as delimiters : # (starts directive) { } ( ) [ ] ' : , ; ? ~ . .. + += ++ - -= -- -> * *= / /= // /* (start comment) % %= ! != = == => & && &= | || |= ^ ^= < <= << <<= > >= >> >>= 1.10 Preprocessing directives ----------------------------- Compilation symbols have a constant boolean value (true or false). They are only used within preprocessing directives. They are two kinds of compilation symbols : a) global symbols - must be written all in UPPER CASE; - are defined in the make configuration file (mk.cfg); - are global for all the project. Example: #if WIN from std_windows use io; #elif UNIX from std_unix use io; #elif MAC from std_mac use io; #else #error os not supported #endif b) local symbols - must be written all in lower case; - are defined using the preprocessing directive #define - are local to the source file where they are defined. Example: #define debug 0 // turn off debugging for this source file #if debug // do this #else // do that #endif #if !debug // do this #endif #if 0 // remove this part of source code // ... #if true // ... #endif #endif Preprocessor directives cannot be used for macro replacement as in C. 1.11 Unsafe directive --------------------- The directive "#begin unsafe" and "#end unsafe" enclose unsafe source code. Example: #begin unsafe char* p; #end unsafe Within those unsafe regions, the following operations are possible : - declaring and calling extern and callback functions. - declaring and using unsafe pointers. 1.12. Style Rules ----------------- The following style recommandations apply in Safe-C : - matching { } pairs should appear either on the same column or on the same line. - matching "if" and "else" should start on the same column. - all cases of a switch statement should start on the same column. =========================================================================== 2. Data Types ============= 2.1 integer 2.2 enumeration 2.3 floating-point 2.4 array, open array, jagged array 2.5 struct, open struct, jagged struct 2.6 union 2.7 pointer 2.8 incomplete 2.9 function pointer 2.10 opaque 2.11 object 2.12 generic 2.13 unsafe pointer 2.1 Integer Types ----------------- The following integer types are available : Signed Integer types -------------------- tiny 1 byte (-128 to +127) short 2 bytes (-32_768 to +32_767) int 4 bytes (–2_147_483_648 to +2_147_483_647) long 8 bytes (–9_223_372_036_854_775_808 to +9_223_372_036_854_775_807) (18-19 digits) Unsigned Integer types ---------------------- byte 1 byte (0 to 255) ushort 2 bytes (0 to 65_535) uint 4 bytes (0 to 4_294_967_295) Synonyms -------- int1 1 byte (-128 to +127) int2 2 bytes (-32_768 to +32_767) int4 4 bytes (–2_147_483_648 to +2_147_483_647) int8 8 bytes (–9_223_372_036_854_775_808 to +9_223_372_036_854_775_807) (18-19 digits) uint1 1 byte (0 to 255) uint2 2 bytes (0 to 65_535) uint4 4 bytes (0 to 4_294_967_295) Attributes 'min 'max -------------------- The attributes 'min and 'max return the minimum and maximum values of an integer type or variable. N'min : minimum value (most negative) N'max : maximum value (most positive) Example: printf ("the maximum value of an int is : %d\n", int'max); 2.2 Enumeration types --------------------- An enumeration declaration declares an enumeration type and a list of enumeration literals. Example: enum Color {RED, BLUE, GREEN}; Each enumeration literal declares a constant value of the enumeration type. The following enumeration types are predefined : bool 1 byte (0 to 1) char 1 byte (0 to 255) wchar 2 bytes (0 to 65535) The type 'wchar' stores a symbol from the Unicode character set which covers all spoken languages worldwide (see Appendix A for more details). enum bool (uint1) {false, true}; enum char (uint1) {nul, .., 'A', 'B', .. }; enum wchar (uint2) {Lnul, .., L'A', L'B', .. }; false and true are the literals of enumeration type bool; nul and Lnul are the first literals of the types char and wchar, respectively. The first enumeration literal has always value 0, the second one has value 1, the third one has value 2, etc. Enumeration values are not compatible with integer types but they can be converted. Example: char c = (char)65; Attributes 'first 'last ----------------------- The attributes 'first and 'last return the first and last enumeration literals of an enumeration type. N'first : first enumeration literal N'last : last enumeration literal N denotes an enumeration type or variable. Example: enum Color {RED, GREEN, BLUE}; Color color; color'first // RED color'last // BLUE Color'first + 1 // GREEN Color'last - 1 // GREEN Color'first + 2 // BLUE Color'first + 3 // ERROR: compilation error : overflow Color'last - 3 // ERROR: compilation error : overflow Attribute 'string ----------------- The attribute 'string converts an enumeration value into its literal string. E'string : string representing the enumeration literal E E denotes a value of any enumeration type except char and wchar. Example: color = RED; printf ("color : %s\n", color'string); A compile-time or runtime error occurs if E is larger than the last enumeration literal. Representation -------------- By default, an enumeration type is mapped to the base type uint4. However, the syntax permits the specification of another base type which must be one of : uint1, uint2, uint4 (or synonyms: byte, ushort, uint). Example: enum Color (byte) {RED, GREEN, BLUE}; 2.3 Floating-point types ------------------------ The following floating-point types are predefined : float 4 bytes (1.5E-45 to 3.4E+38, 7 digits precision) double 8 bytes (5.0E-324 to 1.7E+308, 15-16 digits precision) Synonyms -------- float4 4 bytes (same as float) float8 8 bytes (same as double) 2.4 Array types --------------- An array E[L] is defined by an element type E and a length L. Example: char buffer[80]; // an array of 80 char int t1[10], t2[10]; // two arrays of 10 int char[80] buffer2; // an array of 80 char int[10] t3, t4; // two arrays of 10 int char screen1[25][80]; // a screen of 25 lines of 80 char char[80] screen2[25]; // same char[80][25] screen3; // same The length L of an array must have type int or uint and its value must be between 0 and 2_147_483_647 (int'max). Furthermore, an array's size cannot exceed int'max bytes. attribute 'length or 'length(N) ------------------------------- The attribute 'length returns an int value indicating the array length. The optional constant N specifies the array dimension (default N is 1). Example: buffer'length equals 80 t1'length equals 10 screen1'length equals 25 screen1'length(2) equals 80 Example: const char[5] welcome = "Hello"; // constant array of length 5 char[64] student; // variable of length 64 char[80]^ line = new char[80]; // heap object typedef char[10] NAME; // array type void print (char[20] title) // parameter { ref char[3] prefix = title[0:3]; // reference } array length matching --------------------- Array lengths must match exactly in assignments, function call parameters, aggregates, qualified expressions, initial expression of objects, or default expression of parameters. Example: char[5] name1 = "ABCDE"; // OK char[5] name2 = "AB"; // ERROR: array length (5 != 2) name2 = {'A', 'B', 'C'}; // ERROR: array length (5 != 3) void print (char[4] title); // parameter print ("ABCD"); // OK print ("DO"); // ERROR: array length (2 != 4) Open Arrays ----------- An open array is an array where the length is unspecified : typedef char[] string; // string is an open array of char This can also be written as : typedef char string[]; Open types can be used to declare constants, parameters and references. However, they cannot be used to declare variables or heap objects without a length specification. Example: const string welcome = "Hello"; // constant string student; // ERROR: missing length string(64) student; // OK string^ line = new string; // ERROR: missing length string^ line = new string(len); // OK string^ line = new char[len]; // OK void print (string title) // parameter { ref string prefix = title[0:3]; // reference (first 3 chars) } A function having an open array parameter (like 'print' above) can be called by passing an array of any length. Internally, the function will receive an additional hidden length parameter. Similarly, an open array reference will keep an intern variable with the actual array length (unless the length is constant). Jagged Arrays ------------- A jagged array is an array where the element type is an open or jagged type. Each array element can have a different length. Example: const string table[4] = {"This", "is", "an", "example"}; Jagged arrays are always constant but they can also appear as parameters of mode 'in' or references to these. Note that string(2) and string[2] are not the same : the first is an array of two chars, the second is an array of two strings. Open Jagged Arrays ------------------ An array can be open and jagged at the same time. Example: const string table[] = {"This", "is", "an", "example"}; constructed constants --------------------- Constants can be constructed from aggregates of other constants, or from array elements, slices or struct fields. Example: const string(3) str = "abc"; const string(3)[2] str3 = {str, str}; const string(2) str4 = str3[0][0:2]; String Variables ---------------- Important: ========== String variables have an OPTIONAL trailing nul character which is ONLY present if the string variable is not full. Example: from std use strings; void main() { string(4) name; strcpy (out name, "Luc"); // will copy "Luc" with trailing nul strcpy (out name, "Marc"); // will copy "Marc" (no nul) strcpy (out name, "Jacques"); // will cause a runtime error } In the above example, strcpy receives the length of both arrays and can perform the necessary checks. // the compilation unit 'strings' contains functions to handle string // variables with optional trailing nul : strcpy(out dest,src); : copy a string strcat(ref dest,src); : append a string to another len = strlen(s) : returns active length of string cmp = strcmp(s1,s2) : compare two strings cmp = stricmp(s1,s2) : compare two strings with insensitive case pos = strchr(s,c) : find a char in a string pos = strstr(s1,s2) : find a string s2 in a string s1 ... note: strcpy and strcat cause a runtime error if the target buffer is too small. note: similar functions exist for the wchar/wstring types: wstrcpy, wstrcat, wstrlen, wstrcmp, wstricmp, wstrchr, wstrstr. 2.5. Struct types ----------------- A struct type defines a collection of fields. Example: struct NODE { char[20] name; int count; byte[1024] free_text; NODE^ prev, next; } A struct's size cannot exceed int'max bytes. Open Structs ------------ Open struct types have a parameter called 'discriminant' whose value decides which struct fields of a switch variant exist. Example: enum TypeShape {POINT, SQUARE, CIRCLE, TRIANGLE}; struct Shape (TypeShape kind) { int x, y; switch (kind) { case POINT: null; case SQUARE: int side; case CIRCLE: int radius; case TRIANGLE: int base, height; } } The discriminant 'kind' is a parameter used to create the open struct type 'Shape'. A struct switch variant must be present if and only if there is a discriminant part, the struct type denotes then an open struct type. The keyword "null" indicates a variant without any fields. Example: // the struct type Square defines the fields x, y and side. typedef Shape(SQUARE) Square; const Square square = {x => 10, y => 10, side => 5}; Shape sh; // ERROR: missing discriminant Shape(SQUARE) sq; // OK void put_square (ref Square s); // pass address void put_shape (ref Shape s); // pass address + discriminant Shape^ p1 = new Shape; // ERROR: missing discriminant Shape^ p2 = new Shape(k); // OK (k has type TypeShape) Jagged structs -------------- A jagged struct is a struct or open struct type containing fields of an open or jagged type. Example: struct info { int code; string title; // field has open type Shape shape; // field has open type } const info msg = {10, "square", Square'{0,0,10}}; Keyword 'packed' ---------------- The keyword 'packed' can be prefixed to a struct declaration. Example: struct Foo1 // size = 8 bytes, alignement at 4 { float f; char c; } packed struct Foo2 // size = 5 bytes, alignement at 1 { float f; char c; } Use of the keyword 'packed' has several important effects : 1) the compiler will not insert alignment gaps or trailing bytes in the struct. 2) packed types are implicitely convertible to the type "byte[]" in function calls, so they can be passed from and to the outside world (network, files, ..) using many i/o functions. 3) pointers and function pointers fields are forbidden in packed structs to avoid that they be corrupted. Passing a pointer to the outside world is probably a mistake anyway. Thanks to the packing, the intern representation is precisely defined (except for MSB/LSB byte ordering) and the i/o is portable. constructed constants --------------------- Constants can be constructed from aggregates of other constants, or from array elements, slices or struct fields. Example: const Shape(SQUARE)[2] sh3 = {sh, sh}; jagged structs -------------- A constant jagged struct is a struct where at least one field is an open type. Example: const Shape[2] sh4 = {sh, sh}; 2.6 Union types --------------- Union types have all their fields mapped to the same memory area. The size of the union type is computed by taking the largest size of all its fields. Example: union U { int a; char b; } Union types are always packed implicitely. There are no constants of union types. 2.7. Pointer types ------------------ Pointers are used to access anonymous objects allocated on the heap; pointers cannot access global or local declared variables. A pointer is declared by using the symbol ^ (pronounced caret). Example: Shape^ prev, next; // two pointers to Shape Shape prev^, next^; // same Shape table[10]^; // table of 10 pointers to Shape Shape^ table[10]; // same Shape^[10] table; // same string^ name = new string (80); strcpy (out name^, "Hello"); free name; Pointers to "open type" and pointers to "non-open type" are not compatible because heap objects of an open type have an additional intern header field. Example: string^ p1; string(10)^ p2; p1 = p2; // ERROR: pointers are not compatible 2.8 Incomplete types -------------------- Incomplete types are used to declare types having mutual references. Example: typedef node2; // incomplete type struct node1 { int count; node1^ up; // pointer to struct type itself node2^ left, right; // uses the incomplete type } struct node2 { int count; node1^ left, right; } 2.9. Function Pointers ---------------------- Function pointers allow function calls that dispatch to a given function. Example: void treat_node (Shape s); // function declaration typedef void TREAT (Shape s); // function pointer type void treatment () { TREAT treat; // function pointer variable treat = null; treat = treat_node; // parameter modes and types must match if (treat != null) treat (s); } 2.10. Opaque types ------------------ Opaque types are used to declare structs whose inner fields are hidden. Opaque types can only be declared in .h units or in package declarations. Example: ---------------------------------------------- // drawing.h struct DRAW_CONTEXT; // opaque type void init (out DRAW_CONTEXT d); void circle (ref DRAW_CONTEXT d, int x, int y, int radius); ---------------------------------------------- Later, the corresponding full struct type can be declared : Example: ---------------------------------------------- // drawing.c struct DRAW_CONTEXT // full struct type { int x, y, dx, dy; IMAGE^ image; } public void init (out DRAW_CONTEXT d) { // .. } public void circle (ref DRAW_CONTEXT d, int x, int y, int radius) { // .. } ---------------------------------------------- Outside the declaration scope of the full struct, it is not allowed to take copies of the opaque type, for example using an assignment. Example: use drawing; void main() { DRAW_CONTEXT a, b; init (out a); b = a; // ERROR : assignment not allowed for opaque types } The programmer can however provide an explicit clone function within the declaration scope. 2.11. Object type ----------------- The type 'object' is predefined as : typedef byte[] object; // open array of byte The jagged type 'object[]' is used to declare functions like sprintf() which take a variable number of parameters. Example: int sprintf (out string buffer, string format, object[] arg); int sscanf (string buffer, string format, out object[] arg); int trace (string format, object[] arg); A parameter of type 'object[]' can for example be used in the following contexts : arg'length // as prefix of the 'length attribute, // to query the number of actual parameters. arg[i] // as prefix of an array element : the result has type // 'byte[]' and is convertible to any packed type. func (arg) // as an actual parameter in a function call 2.12. Generic type ------------------ Generic types are used to parametrize the types of an algorithm, see the chapter on generics. 2.13. Unsafe pointer type ------------------------- Unsafe pointers are similar to C pointers. Example: int i, p*; // note: the * symbol appears as suffix p = &i; i = *p; i = p[0]; p++; Using unsafe types is only allowed within unsafe regions. (see 1.11) =========================================================================== 3. Declarations =============== 3.1. Typedef declaration ------------------------ A typedef declaration can be used to declare an alias name for a type, to add a type specifier to a type, or to declare a function pointer type. Example: typedef int ENTIER; // alias name typedef string(20) STUDENT_NAME; typedef string^ PSTRING; typedef int SUM (int a, int b); // function pointer type 3.2. Object declaration ----------------------- An object declaration declares a constant or a variable. Example: const int MAX = 100; string buffer(MAX); int i=0, j=0; scope ----- An object can be declared at the global level (inside compilation units or packages) or at the local level (inside functions and block statements). At the global level, the initial expression, if present, must be constant. constant and variable --------------------- If the keyword 'const' is specified, the object declaration declares a constant whose value is computed at compile-time and never changes. If no keyword 'const' is specified, the object declaration declares a variable; if an expression is specified, it is the initial value of the object. runtime ------- Global objects (and also heap objects) have a unique occurence; they can be accessed by several threads that can perform simultaneous updates. Local objects can have multiple occurences : a new occurence is created each time the declaring function is called; each local occurence can only be accessed by a single thread. The keyword 'volatile' should be specified for global variables that can be modified by several threads, or that might be modified by operating system calls. It instructs the compiler to always load a fresh copy of those variables, and to save them to memory immediately, instead of keeping them in registers. Local objects larger than 4 Kbytes are transparently allocated on heap and automatically freed when leaving their declarative scope. initialization -------------- Global variables, if no expression is specified in their declaration, are always initialized with binary zeroes. Local variables, if no expression is specified, have an initially undefined value. Before a local variable can be read, the full variable must receive a value. Initialization of single array elements, slices or fields does not count as an initialization of the whole variable. Taking a variable's address using the unary operator (&) is considered as a write access followed by a read access of the variable. A parameter of mode 'out' must receive a value in all possible statement flows before the function terminates. Objects that are never read or used in an attribute might generate a warning saying that they are unused. (see 7.19. Unused statement) 3.3. Reference declaration -------------------------- A reference declares an alias to an existing object. References can be used to rename objects into shorter names. References declarations are only allowed inside a function body. Example: ref char ch = buffer[i]; // reference to array element ref string str = buffer[0:3]; // reference to array slice ref int age = student.age; // reference to age field The reference cannot be changed after creation, i.e. it always references the same object. Assignment to a reference assigns a new value to the referenced object. References to heap objects "lock" the heap object until the reference declaration goes out of scope, thus freeing the heap object before this point causes a runtime error. Example: { ref int count = p^.count; count = 2; free p; // runtime error ! } 3.4. Function declaration ------------------------- A function declaration specifies the parameters, return value and options of a function. Example: int start_treatment (); int treat (char count); void sum (int a, int b, out c); void operate (ref TABLE table); void print (int value, int width = 1); Parameters ---------- There are 3 passing modes for parameters : mode access to parameter ==== =================== in read-only (by copy for simple types, else by reference) ref read+write (by reference) out write first, then read+write (by reference) If no mode is specified, then mode "in" is implicit. If a parameter has a default constant expression (ex: 'width' above), it must have mode 'in'. A corresponding argument is then optional when calling the function. Return type ----------- A function's return type can be either a simple type (thus array, struct, union opaque or generic are not allowed), or void if the function has no parameter. Options ------- The following options can be specified : 1) inline --------- inline int treat (char count); indicates that the function should be inserted inline instead of being called. 2) callback ----------- [callback] int MainWin (HWND hwnd, UINT message, WORD w, LONG l); indicates that the function will be called by the operating system. 3) extern --------- [extern "KERNEL32.DLL"] int GetStdHandle (int nStdHandle); indicates that the body of the function is in an extern DLL. 3.5. Function body ------------------ The function body declares the inner workings of a function. Example: void sum (int a, int b, out int c) { c = a+b; } The keyword "public" is mandatory for a function body if a corresponding function declaration appeared earlier in a .h unit or package declaration. Example: // p.h void sum (int a, int b, out int c); // declaration // p.c public void sum (int a, int b, out int c) // body with keyword "public" { c = a + b; } The asymmetry between declaration and body is intentional, the pragmatic goal here is to keep the use of keyword 'public' to a minimum. function return --------------- A function having a non-void return type must have a return or abort statement as last statement. =========================================================================== 4. Names ======== 4.1. Expanded names ------------------- When an identifier declared in a unit or package is ambiguous, it can be prefixed with the unit or package name to make it unique. It is then called an expanded name. Example: strings.strcpy 4.2. Object and Value Names --------------------------- The following names can be built using objects : Array Element -------------- Example: array [ index ] The prefix must denote an array object (or an unsafe pointer object). The index must have type int or uint. An error occurs if the index is outside the range of the array. Array Slice ----------- Example: array [ offset : length ] The prefix must denote an array object (or an unsafe pointer object). Offset is the lower bound of the slice, length is the result length of the slice : both must have type int or uint. Example: given S="ABCDE", S[2:3] evaluates to "CDE". Empty slices are allowed: S[5:0] evaluates to "". An error occurs if offset or length have illegal values. Discriminant value ------------------ Example: open_struct . discriminant The prefix must denote an object of an open struct type. Struct field ------------ struct . field The prefix must denote an object of a struct or union type. If the field belong to a variant part of an open struct type, a runtime check is done to verify that the open struct's discriminant value is appropriate for accessing this field, unless this can be ensured at compile-time. Dereferenced object ------------------- Example: pointer ^ The prefix must denote a pointer object or value. The result is the heap object designated by the pointer. An error occurs if the prefix evaluates to null, or if the heap object was already freed earlier. Postfixed object ---------------- Example: int p*, q*; *p++ = *q++; The prefix must denote an object having an integer or enumeration type, or an unsafe pointer type. The object will be converted to a value, after which the object will be incremented or decremented. In case of an unsafe pointer the pointer value will be incremented/decremented by the size of the accessed type. Deref unsafe field ------------------ Example: unsafe_pointer -> discriminant_or_field The operator "->" is equivalent to the succession of the two operators "*" and ".". Function name ------------- A function name evaluates to a value of type function pointer which corresponds to the executable code start of the corresponding function body. Attribute --------- The following attributes exist: min, max, first, last, length, length(N), byte, string, size. attributes 'min 'max -------------------- I'min : minimum value (most negative) I'max : maximum value (most positive) I must denote an integer type or a variable (a constant is not allowed). attributes 'first 'last ----------------------- E'first : first enumeration literal E'last : last enumeration literal E must denote an enumeration type or a variable (a constant is not allowed). attribute 'length 'length(N) ----------------------------- A'length returns the number of elements of an array A. A must denote an array object, or a non-open array type. The optional integer constant N specifies an array dimension (>= 1) which is only valid if A is an N-dimensional array. attribute 'byte --------------- O'byte converts an object O of any packed type into an array of byte (byte[]) having the same intern representation and the same size. Example: int i = 4; float f; f'byte = i'byte; // copy i into f (4 bytes) if (f'byte[3] & 128 != 0) // assumes little-endian representation printf ("negative"); note: the 'byte attribute will return different results on big endian and little endian computer architectures. attribute 'string ----------------- Example: printf ("color : %s\n", color'string); E'string converts an enumeration value E into its literal string. The prefix must denote a value of any enumeration type, except char and wchar. An error occurs if the prefix value is larger than the last enumeration literal. attribute 'size --------------- O'size : returns the number of bytes used by object or type O. The result is a value of type uint. 4.3. Function call ------------------ A function call calls a function and optionally returns a value. Example: divide (i, j, out ans, out r); retrieve_object ( id => nr, out item => my_item); Some actual parameters may use a "named form", i.e. by prefixing the identifier of the actual parameter followed by "=>". All parameters must be provided, in order, except possibly the last parameters if a default constant expression is defined for them in the function declaration. The name of a function call must be a name that denotes a value of type function pointer; if it evaluates to null, an error occurs. Example: p^.func(exp)^.f("a"); (*p)("a"); evaluation order ---------------- Actual parameters are evaluated from left to right, as they appear in the source text. Example: int i = 0; f (i++, i++, i++); // same as f (0, 1, 2); conversion ---------- If either the formal or the actual parameter has type array of byte, then conversion is possible from/to any packed type. Example: void put (byte[] b); put (2); // type int put (2L); // type long put ((byte)2); // type byte void put2 (int i); put2 ( byte'{0,0,0,1} ); type 'object[]' --------------- If the last formal parameter has type object[], it can accept a list of actual parameters of packed types. Example: void f (string format, object[] arg) { ... arg'length arg[i]'byte } A literal string format parameter preceeding an object[] parameter of mode 'in' is checked using the following format patterns : %[flags][width][.precision]type type output ---- ------ % '%%' will write '%' d any signed integer decimal number (-61) u any unsigned integer or enum unsigned number (12) x any integer or enum hexadecimal number (7fa) e float, double scientific (3.9265e+2) f float, double floating-point (392.65) c char or string all char, or 'precision' char C wchar or wstring all wchar, or 'precision' wchar s string stops at nul, or 'precision' char S wstring stops at Lnul, or 'precision' wchar width ----- (number) Minimum number of characters to be printed (padded with spaces or zeroes if necessary). The value is not truncated even if the result is larger. * The width is not specified in the format string but as an additional int value parameter preceding the parameter to be formatted. precision --------- .number A dot without number means precision zero. For f : this is the number of digits to be printed after the decimal point. Default precision is 6. No decimal point is printed for precision zero. For s/S : this is the maximum number of characters to be printed. Default is all characters. .* The precision is not specified in the format string but as an additional int value parameter preceding the parameter to be formatted. flags ----- - for all : left-justify within the given field width (default is right-justify). + for d, e, f : prefix '+' for positive or zero numbers. 0 for d, e, f, x, u : prefix '0' digits instead of spaces; (not valid together with flag '-') The following format patterns are supported for a parameter of mode 'out' or 'ref': white space (ascii 1 to 32) --------------------------- ignore zero or more white space characters. Non-whitespace character, except percentage sign (%) ---------------------------------------------------- must match, or the function fails. Format specifiers: %[*][width]type * data is read but not stored (there is no corresponding actual parameter). width maximum number of characters to be read. type input ---- ----- % '%%' will read '%' d any signed integer decimal number preceded by optional + or - (-61) u any unsigned integer or enum unsigned number (12) x same as d or u hexadecimal number (7fa) f float or double floating-point number (0.5) (12.4E+3) e same as f c char or string fill arg, or read max 'width' chars. C wchar or wstring fill arg, or read max 'width' wchars. s char or string same as c but stop at first white-space. S wchar or wstring same as C but stop at first white-space. 4.6. Run call ------------- A run call starts a thread. One optional parameter can be passed to the thread function. Example: void my_thread (int i) { } void main() { int rc; rc = run my_thread (23); } A run call returns an int value indicating if the thread started correctly (zero indicates success, a negative value indicates an error). =========================================================================== 5. Primaries ============ A primary denotes a value and can be one of the following : integer literal --------------- Example: 123 0x10 10L floating-point literal ---------------------- Example: 1.23 3.0d character literal ----------------- Example: 'a' L'a' string literal -------------- Example: "Hello" L"Hi" nul/Lnul -------- The reserved keywords 'nul' and 'Lnul' denote a value equal to the first enumeration literal of the types 'char' and 'wchar' respectively. null ---- The reserved keyword 'null' denotes a special pointer value that does not point at anything. It is compatible with all pointer, function pointer and unsafe pointer types. aggregate --------- An aggregate is an array or struct compound value. array aggregate --------------- Two forms of array aggregates exist : a) positional array aggregate ----------------------------- Example: tab = {'A', 'B', c1, 'D', 'I', ch1, ch2}; note: the syntax allows an extra comma ',' before the closing '}'. b) open array aggregate ----------------------- Example: tab = {all => c}; struct aggregate ---------------- Two forms of struct aggregates exist : a) positional struct aggregate ------------------------------ Example: struct Node { float length; char letter; bool active; } Node n = {1.0, 'A', b}; // b can be a variable note: the syntax allows an extra comma ',' before the closing '}'. b) named struct aggregate ------------------------- Example: Node n2 = {length => 1.0, letter => 'A', active => b}; All fields in the aggregate must appear in the same order as in the struct declaration. qualified expression -------------------- A qualified expression explicitely shows the type of an expression. a) qualified parenthesized expression ------------------------------------- Example: t = tiny ' (2); d = double ' (f+1); p = ptr ' (null); str3 = string(3) ' ("ABC"); s = Shape ' (sh); b) qualified aggregate ---------------------- Example: string(3) str3; Shape(CIRCLE) sh; str3 = string ' {'a', 'b', 'c'}; str3 = string(3)' {'a', 'b', 'c'}; str3 = string ' {all => '*'}; // uses length from context sh = Shape'{0,0,12}; str3[ofs:len] = string ' {all => c}; // runtime length pstr^ = string ' {all => c}; // runtime length new allocator ------------- A 'new' allocator creates a heap object. Examples with initial expression : string^ name1 = new string'{'a', 'b', 'c'}; string^ name2 = new char[]'{'a', 'b', 'c'}; string^ name3 = new string'("Hello"); string^ name4 = new string'(name2^); string^ name5 = new char[len] ' { all => c }; string^ name6 = new string'(s); // length depends on s'length p = new Shape(CIRCLE) ' { x => x, y => y+1, radius => r }; p = new Shape ' (sh); // size depends on discriminant of sh Examples without initial expression (the object is initialized with zeroes): NODE^ p = new NODE; // pointer to a new NODE Shape^ sh = new Shape(d); // size is computed using a table string^ name7 = new string; // ERROR (open types not allowed) char[100]^ name8 = new char[100]; char[100]^ name9 = new string(100); string^ name10 = new string(100); string^ name11 = new string(len); Note that pointers to open arrays are not compatible with pointers to arrays. Example: string^ buffer = new string(30); // context : open array. string(30)^ buffer2 = new string(30); // context : non-open array buffer2 = buffer; // ERROR: non-compatible =========================================================================== 6. Expressions ============== The following operators are defined. operator priorities ------------------- Unary + - ! ~ & * -- ++ Term * / % Sum + - Simple Expression << >> Relation == != < > <= >= Conditional && || & | ^ Conditional Test ? : evaluation ---------- Expressions are evaluated from left to right, as they appear in the source text. promotion of integer types -------------------------- signed and unsigned types cannot be mixed : they always require a conversion. Example: i + u // ERROR: int and uint cannot be mixed i + (int)u // OK i < u // ERROR: int and uint cannot be compared i + 1 // integer literal is assumed to be int u + 1 // integer literal is assumed to be uint 1 - 2 // is equivalent to integer literal -1 u + (-1) // ERROR: -1 not in range of uint semantic of operators --------------------- operator ? : ------------- effect: if (arg1) return arg2; else return arg3; operators && and || ------------------- effect: && : if (!left) return false; else return right; || : if (left) return true; else return right; operators & | ^ --------------- 1) for integer effect: perform integer promotion on left & right arguments, giving result. logical bitwise operators : and, or, xor. 2) for bool effect: logical operators : and, or, xor. operators == != ---------------- effect: boolean comparison. operators < > <= >= ----------------------- effect: boolean comparison. operators << >> ----------------- effect: left and right shift operators. operators + - -------------- 1) for integer, floating-point effect: addition, subtraction. 2) for enumeration types: left : enumeration right : integer result : same as left effect: addition, subtraction. 3) for unsafe pointer types: left : unsafe pointer right : integer type, but type long is not allowed. result : same as left effect: convert the right operand to a signed int4, multiply it by the accessed type's size, possibly truncate the result to an int4; then add/subtract this offset to/from the left address operand. 4) for unsafe pointer ("-" only) types: left : unsafe pointer right : same as left result : uint effect: subtraction of both operands, conversion to an int4, division by the size of the accessed type, conversion to an uint4. A runtime error occurs if the accessed type has size zero. 5) for constant string ("+" only) types: left : constant string/wstring/char/wchar right : constant string/wstring/char/wchar result : constant string or wstring effect: concatenation of constant strings. if any argument is wide, then the result will have type wstring, otherwise the result has type string. Example: "Marie-Jeanne is a friendly person." + (CR+LF) + "She often comes around to see us." + CR+LF binary operators * / ---------------------- effect: multiplication, division. operator % ----------- effect: remainder of division. unary operators + - ------------------- effect: + : no effect. - : negation unary operators ~ ----------------- effect: integer not (inverse all bits) unary operator ! ----------------- effect: inverse bool value (zero becomes one and non-zero becomes zero). unary operator & ----------------- effect: take the address of the object. restriction: this operator is only allowed in unsafe regions (see 1.11). unary operator * ---------------- effect: the argument is evaluated, a run-time occurs if it has a null value; the result is the object accessed by the pointer value. restriction: this operator is only allowed in unsafe regions (see 1.11); Example: *p = 2; unary operators --/++ --------------------- effect: The object will be incremented / decremented; then it is converted into a value. Example: --i; conversion ---------- Conversion is allowed for the following types: source type target type ----------- ----------- integer -> integer enumeration -> integer floating-point -> integer integer -> enumeration floating-point -> floating-point integer -> floating-point unsafe pointer -> unsafe pointer Conversion from float to integer truncates (it does not round). Conversion to an enumeration type might create a value for which no enumeration literal exists. No runtime check is done during a conversion. Further conversions that do not change the byte representation are possible using the attribute 'byte. Example: int i; char c; c = (char)i; =========================================================================== 7. Statements ============= 7.1. Null statement ------------------- The null statement has no effect. Example: if (test) ; // null statement else a = 1; 7.2. Clear statement -------------------- Example: clear table, a, b, c; The clear statement fills all specified objects with binary zeroes. 7.3. Assignment statement ------------------------- An assignment statement copies the expression into the variable. Example: a = 1; Array Assignment ---------------- Example: char[80] line1, line2; line1 = line2; // copy full array variable In case of an array, a check is done that both lengths match. 7.4. pre or postfixed_statement ------------------------------- The effect of the statement is to increment/decrement the object's value; overflows are ignored. Example: --(*p)^.count; b[k]++; 7.5. Function call statement ---------------------------- A function call statement calls the specified function. Example: unlink (filename); (void) unlink (filename); p[k](i); f()(i); f(k)^.func(i); (*p)('a'); 7.6. Return statement --------------------- A return statement terminates the current function and optionally returns a value. Example: return; return 1; 7.7. Block statement -------------------- A block statement groups a series of statements. Example: { // block statement H(); I(); } 7.8. If statement ----------------- An if statement allows conditional execution. Example: if (b) { // ... } if (b) ; else ; 7.9. Switch statement --------------------- A switch statement allows conditional execution with multiple branches. Example: switch (e) { case 0: f(); break; case 1: g(); break; default: break; } 7.10. While statement --------------------- The while statement allows basic repetition of statements. Example: while (count < 10) count++; 7.11. For statement ------------------- The for statement allows a more packed form of the while statement. Example: for (i=0; i // generic type ELEMENT int compare (ELEMENT a, ELEMENT b); // return -1 if ab package BubbleSort void sort (ref ELEMENT table[]); end BubbleSort; The package body of a generic package has the same syntax as a non-generic one. Example: // bubble.c package body BubbleSort public void sort (ref ELEMENT table[]) { int i, j; ELEMENT temp; for (i=1; i0; j--) { if (compare (table[j-1], table[j]) <= 0) break; temp = table[j-1]; table[j-1] = table[j]; table[j] = temp; } } } end BubbleSort; Example (an algorithm for managing a balanced tree) : // btree.h generic int compare (KEY a, KEY b); package BalancedTree struct BALTREE; // opaque type int create (out BALTREE bt); int close (ref BALTREE bt); int insert (ref BALTREE bt, KEY k, ELEMENT e); int remove (ref BALTREE bt, KEY k); int update (ref BALTREE bt, KEY k, ELEMENT e); int retrieve (ref BALTREE bt, KEY k, out ELEMENT e); end BalancedTree; In the above examples, KEY and ELEMENT are generic types, Swapping, BubbleSort and BalancedTree are generic packages. Instantiation of generic packages --------------------------------- A generic instantiation creates a copy of a generic package by replacing each generic type by an actual type and each generic function declaration by an actual function declaration. The generic association list must be given in the order in which the identifiers appear in the generic package declaration. Example : int compare_int (int a, int b) { if (a < b) return -1; if (a > b) return +1; return 0; } package Sort_int = new BubbleSort (ELEMENT => int, compare => compare_int); void main() { int table[5] = {2, 19, 3, 9, 4}; sort (ref table); // must be written Sort_int.sort if ambiguous } =========================================================================== 10. Compilation Units ===================== 10.1. Units ----------- Two kinds of source files exist : - .h : interface - .c : implementation A .h source file can appear alone. A .c source file can, but need not, appear alone (for the main program). When .h and .c source files appear together, they must be stored in the same directory. 10.2. Startup ------------- The main entry point in a program is a function called 'main' declared in a .c source file : - there must be only one function with this name. - it must have return type void or int. - it must have either no parameters or a single parameter of mode in and type string[] that will receive a list of the program's start parameters. Example: // hello.c from std use console; void main() { printf ("Hello World !\n"); } 10.3. Import clauses -------------------- Two forms of import clauses exist : a) import from .lib library files --------------------------------- Example: from std use strings, console; from be.msc.webcam use cam/webcam; note: the library files "std.lib" and "be.msc.webcam.lib" must be present in one of the directories listed in the config file mk.cfg b) import from .h source files ------------------------------ Example: use util, util2; // "util.h" and "util2.h" in current directory use cam/webcam; // "webcam.h" in subdirectory "cam" use ../interface; // "interface.h" in directory above this one. use /src/testing; // ERROR: absolute paths are NOT allowed. note: absolute paths are not allowed because the project can then not be moved easily from one directory to another. All identifiers appearing in library names and source names must be in lower case to avoid compatibility problems on different file systems. 10.4. Aliases ------------- In case two units with the same last name are imported (possibly from two different directories), an error occurs. An alias name can then be specified to disambiguate them. Example: from std use bintree; from std use new/bintree as bintree2; bintree2.insert ( .. ); 10.5. Effect of import ---------------------- An import of unit B makes all entities of B visible. However, it does not make visible a unit C imported by unit B, although unit C with its entities is loaded too, but in some unreachable scope. In short : only identifiers of directly imported units are visible. If several interface (.h) compilation units depend on each other in a circular fashion, a compilation error occurs. A compilation error occurs if a unit imports itself or if the same unit is imported twice, even using different aliases. The order of imports is not important (reordering imports cannot create errors). 10.6. Compiling --------------- Only the .h / .c source files and an optional make configuration file (mk.cfg) must be provided to compile a project. Example: C:\test> mk hello The make utility (mk.exe) will compile "hello.c" and create "hello.exe". Dependencies are automatically traced and the corresponding source files are recompiled if they need it. A manual make file is not necessary. All source filenames must be in lower case. 10.7. Make configuration file (mk.cfg) -------------------------------------- The make configuration file must be present in the current directory, or its filename must be supplied to the compiler through a compiler option; it contains all compiler options, global compilation symbols and a list of directories to search for imported libraries : - memory model (32 or 64 bit) (default is 32) this has an effect on the size of pointer, function pointer and unsafe pointer types. - flag indicating if pointers are checked using a safe tombstone mecanism (default is yes) - flag indicating if array indexes are checked (default is yes) - flag indicating if assertions are checked (default is yes) - a list of directories in which the compiler searches for imported libraries. The following global symbols are predefined and cannot be redefined in mk.cfg : WINDOWS (1 for windows compiler version, 0 otherwise) UNIX (1 for any unix compiler version, 0 otherwise) MEM32 (1 for 32-bit memory model, 0 otherwise) MEM64 (1 for 64-bit memory model, 0 otherwise) Example: // mk.cfg [symbols] ; global compilation symbols (must be in upper case) DEBUG = 1 T24 = 1 [options] memory_model = 32 ; 32 or 64 (default is 32) pointer_check = yes ; default is yes array_check = yes ; default is yes assertion_check = yes ; default is yes [library] ; libraries are searched in the following directories : dir = ./ ; default if mk.cfg is not provided dir = c:/safe-c/lib/ =========================================================================== 11. Implementation Issues ========================= 11.1 Data Alignment ------------------- Structures that are declared with the keyword 'packed' are the only ones portable on different systems. Note that all simple types are packed by default. Advantage of packed types ------------------------- - packed types can be converted to byte[] using attribute 'byte. - packed types can be passed to a formal parameter of type byte[] (in a function call) - packed types can be passed to a formal parameter list of type object[] (in a function call) - a formal parameter of a packed type can receive an actual parameter of type byte[]. Pointers and function pointers are not allowed to appear in packed structures and be passed to I/O routines because they could get corrupted otherwise. 11.2 Implementation of pointers and heap objects ------------------------------------------------ Internally, pointers don't contain the address of the actual heap object. Instead, they point at Tombstone structures which guarantee safe pointer access. 11.3. Run-Time Errors --------------------- When a runtime error occurs, the library unit "exception" will locate the precise source line location of the error and write a corresponding file CRASH-REPORT.TXT in the application's current directory. =========================================================================== 12. Libraries ------------- The standard library "std" contains the following .h units : aes : aes encryption (advanced encryption standard) bintree : binary tree bsearch : binary search (seach element in sorted array) calendar : calendar (get_datetime) clipboard : clipboard console : printf & scanf for console applications crc : md5, adler and crc checksum db : simple database (isam files) des : DES encryption draw : drawing in memory ebcdic : ebcdic conversion engine : 3D engine exception : exception handler files : files, directories and disks fixed : large fixed-point type float1 : large floating-point type float2 : large floating-point type 2 ftp : file transfer protocol client ftps : file transfer protocol server http : internet client and server image : image (jpg, gif, png, tif, bmp) & image operations (copy, rotate) inifile : .ini files integer : large integer type interval : intervals math : math net : network card (iprtrmib.h) odbc : sql database printer : printer process : process random : random numbers rsa : rsa encryption (asymetric keys) selfile : select file dialog box service : background processes (Windows services) sorting : array sorting (bubblesort, heapsort, quicksort) sound : micro, speaker strings : char and string support (sprintf, strcpy, isdigit,..) tcpip : tcp/ip layer, ipv4 & ipv6 text : text (storing lines of text) thread : threads, synchronization, timer tracing : trace files url : convert internet url's utf : ansi, utf8, utf16 conversions webcam : webcam (vfw, directx) win : graphic user interface xml : xml reader zip : zip, unzip =========================================================================== Appendix A : International support ---------------------------------- Traditional programs used to store characters in single bytes because they were developped mainly for the US or single European countries. This has changed dramatically since Unicode normalized all character sets. Currently, the Unicode standard defines 1,114,112 characters worldwide. A.1 Unicode for intern use -------------------------- UTF-16 (type wchar) is used for storing Unicode characters internally. A.2 Unicode for extern use -------------------------- When transfering Unicode between computer systems, a packing in so-called "UTF-8 character strings" is widely used on the internet. In UTF-8, each Unicode character is encoded in 1 to 4 bytes : Unicode Character Byte1 Byte2 Byte3 Byte4 ----------------- -------- -------- -------- -------- 0 to 127 0xxxxxxx 128 to 2047 110yyyxx 10xxxxxx 2048 to 65535 1110yyyy 10yyyyxx 10xxxxxx 65536 to 1114111 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx Converting between UTF-16 and UTF-8 is straightforward and available in standard unit "utf". =========================================================================== Appendix B : Unicode conversion for source files ------------------------------------------------ Source files can be either in ANSI, UTF-8 or UTF-16 encoding. ===========================================================================