Q. How reliable are floating-point comparisons?
Floating-point numbers are the "black art" of computer programming. One reason why this is so is that there is no optimal way to represent an arbitrary number. The Institute of Electrical and Electronic Engineers (IEEE) has developed a standard for the representation of floating-point numbers, but you cannot guarantee that every machine you use will conform to the standard.
Even if your machine does conform to the standard, there are deeper issues. It can be shown mathematically that there are an infinite number of "real" numbers between any two numbers. For the computer to distinguish between two numbers, the bits that represent them must differ. To represent an infinite number of different bit patterns would take an infinite number of bits. Because the computer must represent a large range of numbers in a small number of bits (usually 32 to 64 bits), it has to make approximate representations of most numbers.
Because floating-point numbers are so tricky to deal with, it's generally bad practice to compare a floating- point number for equality with anything. Inequalities are much safer. If, for instance, you want to step through a range of numbers in small increments, you might write this:
#include <stdio.h>
const float first = 0.0;
const float last = 70.0;
const float small = 0.007;
main()
{
float f;
for (f = first; f != last && f < last + 1.0; f += small)
;
printf("f is now %g\n", f);
}
However, rounding errors and small differences in the representation of the variable small might cause f to never be equal to last (it might go from being just under it to being just over it). Thus, the loop would go past the value last. The inequality f < last + 1.0 has been added to prevent the program from running on for a very long time if this happens. If you run this program and the value printed for f is 71 or more, this is what has happened.
A safer way to write this loop is to use the inequality f < last to test for the loop ending, as in this example:
float f;
for (f = first; f < last; f += small)
;
You could even precompute the number of times the loop should be executed and use an integer to count iterations of the loop, as in this example:
float f;
int count = (last - first) / small;
for (f = first; count-- > 0; f += small)
Q. How can you determine the maximum value that a numeric variable can hold?
The easiest way to find out how large or small a number that a particular type can hold is to use the values defined in the ANSI standard header file limits.h. This file contains many useful constants defining the values that can be held by various types, including these:
Value | | Description |
CHAR_BIT | - | Number of bits in a char |
CHAR_MAX | - | Maximum decimal integer value of a char |
CHAR_MIN | - | Minimum decimal integer value of a char |
MB_LEN_MAX | - | Maximum number of bytes in a multibyte character |
INT_MAX | - | Maximum decimal value of an int |
INT_MIN | - | Minimum decimal value of an int |
LONG_MAX | - | Maximum decimal value of a long |
LONG_MIN | - | Minimum decimal value of a long |
SCHAR_MAX | - | Maximum decimal integer value of a signed char |
SCHAR_MIN | - | Minimum decimal integer value of a signed char |
SHRT_MAX | - | Maximum decimal value of a short |
SHRT_MIN | - | Minimum decimal value of a short |
UCHAR_MAX | - | Maximum decimal integer value of unsigned char |
UINT_MAX | - | Maximum decimal value of an unsigned integer |
ULONG_MAX | - | Maximum decimal value of an unsigned long int |
USHRT_MAX | - | Maximum decimal value of an unsigned short int |
For integral types, on a machine that uses two's complement arithmetic (which is just about any machine you're likely to use), a signed type can hold numbers from -2(number of bits - 1) to +2(number of bits - 1) - 1.
An unsigned type can hold values from 0 to +2(number of bits)- 1. For instance, a 16-bit signed integer can hold numbers from -215(-32768) to +215 - 1 (32767).
Q. Are there any problems with performing mathematical operations on different variable types?
C has three categories of built-in data types: pointer types, integral types, and floating-point types. Pointer types are the most restrictive in terms of the operations that can be performed on them. They are limited to
- subtraction of two pointers, valid only when both pointers point to elements in the same array. The result is the same as subtracting the integer subscripts corresponding to the two pointers.
+ addition of a pointer and an integral type. The result is a pointer that points to the element which would be selected by that integer.
Floating-point types consist of the built-in types float, double, and long double. Integral types consist of char, unsigned char, short, unsigned short, int, unsigned int, long, and unsigned long. All of these types can have the following arithmetic operations performed on them:
+ Addition
- Subtraction
* Multiplication
/ Division
Integral types also can have those four operations performed on them, as well as the following operations: % Modulo or remainder of division
<< Shift left
>> Shift right
& Bitwise AND operation
| Bitwise OR operation
^ Bitwise exclusive OR operation
! Logical negative operation
~ Bitwise "one's complement" operation
Although C permits "mixed mode" expressions (an arithmetic expression involving different types), it actually converts the types to be the same type before performing the operations (except for the case of pointer arithmetic described previously). The process of automatic type conversion is called "operator promotion."
Q. What is operator promotion?
If an operation is specified with operands of two different types, they are converted to the smallest type that can hold both values. The result has the same type as the two operands wind up having. To interpret the rules, read the following table from the top down, and stop at the first rule that applies.
If Either Operand Is | | And the Other Is | | Change Them To |
long double | - | any other type | - | long double |
double | - | any smaller type | - | double |
float | - | any smaller type | - | float |
unsigned long | - | any integral type | - | unsigned long |
long | - | unsigned > LONG_MAX | - | long |
long | - | any smaller type | - | long |
unsigned | - | any signed type | - | unsigned |
The following example code illustrates some cases of operator promotion. The variable f1 is set to 3/4. Because both 3 and 4 are integers, integer division is performed, and the result is the integer 0. The variable f2 is set to 3/4.0. Because 4.0 is a float, the number 3 is converted to a float as well, and the result is the float 0.75.
#include <stdio.h>
main()
{
float f1 = 3 / 4;
float f2 = 3 / 4.0;
printf("3 / 4 == %g or %g depending on the type used.\n", f1, f2);
}
No comments:
Post a Comment