C++系統(tǒng)相關(guān)(C++編譯器性能比較)

字號(hào):

現(xiàn)在市面上,主流的C/C++編譯器包括M$的CL、gcc、Intel的icl、PGI的pgcc及Codegear的bcc(原來(lái)屬于Borland公司)。Windows上使用最多的自然是cl,而在更廣闊的平臺(tái)上,gcc則是C/C++編譯器的首選。但要提到能力優(yōu)化,排名就未必與它們的市場(chǎng)占有率一致了。
    做了一個(gè)各編譯器數(shù)值性能的比較。測(cè)試的代碼是一個(gè)求積分的程序,考試.大提示來(lái)源于intel編譯器的例子程序,修改了一個(gè)頭文件,以便每個(gè)編譯器都能編譯。
    #include
    #include
    #include
    #include
    // Function to be integrated
    // Define and prototype it here
    // | sin(x) |
    #define INTEG_FUNC(x) fabs(sin(x))
    // Prototype timing function
    double dclock(void);
    int main(void)
    {
    // Loop counters and number of interior points
    unsigned int i, j, N;
    // Stepsize, independent variable x, and accumulated sum
    double step, x_i, sum;
    // Timing variables for evaluation
    double start, finish, duration, clock_t;
    // Start integral from
    double interval_begin = 0.0;
    // Complete integral at
    double interval_end = 2.0 * 3.141592653589793238;
    // Start timing for the entire application
    start = clock();
    printf(" \n");
    printf(" Number of | Computed Integral | \n");
    printf(" Interior Points | | \n");
    for (j=2;j<27;j++)
    {
    printf("------------------------------------- \n");
    // Compute the number of (internal rectangles + 1)
    N = 1 << j;
    // Compute stepsize for N-1 internal rectangles
    step = (interval_end - interval_begin) / N;
    // Approx. 1/2 area in first rectangle: f(x0) * [step/2]
    sum = INTEG_FUNC(interval_begin) * step / 2.0;
    // Apply midpoint rule:
    // Given length = f(x), compute the area of the
    // rectangle of width step
    // Sum areas of internal rectangle: f(xi + step) * step
    for (i=1;i    {
    x_i = i * step;
    sum += INTEG_FUNC(x_i) * step;
    }
    // Approx. 1/2 area in last rectangle: f(xN) * [step/2]
    sum += INTEG_FUNC(interval_end) * step / 2.0;
    printf(" %10d | %14e | \n", N, sum);
    }
    finish = clock();
    duration = (finish - start);
    printf(" \n");
    printf(" Application Clocks = %10e \n", duration);
    printf(" \n");
    return 0;
    }
    當(dāng)然,這個(gè)代碼來(lái)自于intel,當(dāng)然非常適合intel的編譯器。以下的測(cè)試在Intel Core 2 Duo上進(jìn)行。
    gcc (GCC TDM-2 for MinGW) 4.3.0 VC 9.0 (cl 15.00.21022.08) Intel (icl 10.1) PGI (pgcc 7.16) CodeGear (bcc32 6.10)
    禁止優(yōu)化
    -O0 /Od -Od -O0 -Od
    17161 14461 12441 10514 13400
    17133 14430 11687 9956 12917
    17155 14476 11871 10099 13026
    編譯選項(xiàng) -O2
    13011 7737 4540 9348 12636
    16571 7706 4185 9148 13026
    16573 7706 4042 9183 13057
    針對(duì)平臺(tái)的優(yōu)化
    -march=core2 -O2 /arch:SSE2 /O2 -QxT -tp core2 -O2 無(wú)
    16060 7710 1938 9578
    測(cè)試的結(jié)果說(shuō)明,在數(shù)值計(jì)算方法,intel的編譯器是非常利害的,特別是針對(duì)某CPU的優(yōu)化,能提高很多性能。GCC表現(xiàn)卻有些讓人失望。在禁止優(yōu)化到-O2級(jí)優(yōu)化的對(duì)比中,可以看出intel與m$的編譯器的優(yōu)化效果是非常明顯的,而其它編譯器優(yōu)化后的提高非常有限。如果給個(gè)排名,那么將是 icl>cl>pgcc>bcc>gcc。
    另外,在一臺(tái)P4 1.5G的機(jī)器,linux環(huán)境下,測(cè)試得到
    gcc icc pgCC
    -O2 -O2 -O2
    24920000 10840000 22270000
    -O0 -O0 -O0
    28290000 19210000 24320000
    -march=pentium4 -O2 -xN -tp piv -O2
    24990000 6640000 22150000
    同樣,還是intel的表現(xiàn),而gcc最差。
    又在Athlon X2 4800+, Linux上測(cè)試,得到下表
    gcc icc pgcc
    -O0 -O0 -O0
    9390000 14950000 9950000
    -O2 -O2 -O2
    8910000 9240000 9400000
    -march=amdfam10 -O2 -msse3 -O2 -tp k8-32 -O2
    8800000 3800000 9030000
    雖然icc主要是針對(duì)intel的處理器,但只要優(yōu)化選項(xiàng)找對(duì),同樣能帶給amd cpu性能的巨大提高。gcc也回歸到普通水平。奇怪的是pgi的編譯器,估計(jì)是我還沒(méi)找到好的選項(xiàng)吧。