CSAPP-chap5- Optimizing Program Performance
Capabilities and Limitations of Optimizing Compilers
limitation due to security
Invoking gcc with option-O1 or higher(e.g.,-O2 or-O3) will cause it to apply more extensive optimizations. These can further improve program performance, but they may expand the program size and they may make the program more difficult to debug using standard debugging tools.
Compilers must make safe optimization for the program, which ignored possible optimziations.
memory alising
example:
1 | int add1(long *a, long *b) { |
if *a and *b points to the same location in the memory, the result of the 2 functions are different, so the compiler cannot optimize add1 to add2.
function calls
example:
1 | long f(); |
It is normal that the func1() should be optimized to func2, because of fewer jumps to the function. However, If f() changed a part of the global varibles, it would result in difference in the return value, for example,
1 | long ans = 0; |
gcc compilers always perform unprogressive compilations because of the reasons like this.
Expressing Program Performance
we introduce the metric cycles per element, abbreviated CPE, to measure it
example:
1 | /* Compute prefix sum of vector a */ |
Using a least squares fit, we find that the run times (in clock cycles) for psum1 and psum2 can be approximated by the equations 368+9.0n and 368 +6.0n.
some ways to optimize a program
1 |
|
Eliminating Loop Inefficiencies
the test condition must be evaluated on every iteration of the loop. On the other hand, the length of the vector does not change as the loop proceeds. We could therefore compute the vector length only once and use this value in our test condition.
optimization of the code:
1 | /* Move call to vec_length out of loop */ |
This optimization is an instance of a general class of optimizations known as code motion. They involve identifying a computation that is performed multiple times and remove it outside of the loop.
Reducing Procedure Calls
As we have seen, procedure calls can incur overhead and also block most forms of program optimization.
Eliminating Unneeded Memory References
The program can use register instead of the heap memory to store the data.
example:
1 | /* Accumulate result in local variable */ |
Understanding Modern Processors
As we seek to push the optimization further, we need to optimize the microarchitecture of the processor.
TODO
