Multi-processor opimization?

Steve Hayashi sth at
Sun Jan 7 17:08:32 PST 2001

Well that really depends on your cache size, doesn't it?  The difference
between -O2 and -O3 as I read it is this: Loop unrolling, Inlining of
functions and renaming of registers.  -O3 has it and -O2 doesn't.

The advantage of loop unrolling is so that a processor doesn't have to
worry about branches (whether or not to loop).  Due to the nature of
pipelined processors, branch calculations can be costly in terms of speed,
so minimizing the number of branch instructions that a processor has to
take is good.  The disadvantage to that is that it takes more space.  I
don't know how many iterations of loop unrolling it does, but there IS a
disadvantage to unrolling too much, because if your instructions all
differ slightly from each other, then you will keep getting instruction
cache misses, rather than hits.

Inlining of functions can be explained by someone who knows more C than I
do (I've forgotten a lot).

Renaming of registers is an optimization technique that I believe makes
things a lot better, but I'm not entirely sure what it's supposed to mean,
since I thought that renaming registers was an optimization technique for
linkers to correct a technique used by C -> Assembly compilers.  Anyways,
as I HEARD it, when you need a register, you don't really care what the
name of the register is, so you say "Gimme a free register" and the
renaming of registers will insure that you use up all the register that
you have free, thus minimizing the number of memory accesses you have to
make.  This shouldn't affect caching at all with the possible exception of
minimizing the number of cache requests.

I have no idea how much instruction cache the average processor has, so
you'll have to look that up.  If it's decently large, then I wouldn't be
too concerned, but if it's not, you may wanna just stick with -O2.


On Sun, 7 Jan 2001, Tony Karakashian wrote:

> That's what I figured, too.  Was hoping I was wrong. :)
> While on this topic, a search for information lead me to the following
> tidbit:  "-O2 may provide better optimization because -O3 can result in
> larger binaries.  The cache-hit performance loss will then typically offset
> any optimization gain."  True?
> -T
> -----Original Message-----
> From: lfs-apps-owner at
> [mailto:lfs-apps-owner at]On Behalf Of Ghovs
> Sent: Sunday, January 07, 2001 10:26 AM
> To: lfs-apps at
> Subject: Re: Multi-processor opimization?
> Well...
> As far as I understand, optimizing for multiple processors is done by making
> applications threaded so that one process can be shared over multiple
> processors.
> Many, very many coders dream of an optimization flag to do this for them.
> rgds,
> Peter de Freitas
> Tony Karkashian wrote:
> > Having tired of dual-booting, I decided to pull my old server out of the
> > closet and use it for LFS.  I was just wondering if there were CFLAGS
> > similar to the "--mcpu=i586 --march=i586" to optimize for multi-processor
> > systems like this one?  If there is, I'd love to re-do from scratch as
> > optimized as possible. :)
> >
> > Thanks,
> >
> > -T
> --
> Unsubscribe: send email to lfs-apps-request at
> and put unsubscribe in the subject header of the message
> -- 
> Unsubscribe: send email to lfs-apps-request at
> and put unsubscribe in the subject header of the message

Unsubscribe: send email to lfs-apps-request at
and put unsubscribe in the subject header of the message

More information about the blfs-support mailing list