当前位置: 技术问答>linux和unix
###出个题给大家玩玩,第一个做出来的奖励100分,哈哈###
来源: 互联网 发布时间:2015-10-30
本文导语: 今天同事问我一个面试题,吃饭想了半天没想出来,谁第一个做出来给100分。 写一个函数,函数里面最多3行代码,不能用循环和递归。 函数的输入是一个数字(0-255),函数的输出是这个数字表示的二进制数中1的...
今天同事问我一个面试题,吃饭想了半天没想出来,谁第一个做出来给100分。
写一个函数,函数里面最多3行代码,不能用循环和递归。
函数的输入是一个数字(0-255),函数的输出是这个数字表示的二进制数中1的个数。
如输入5 (101),输出为2。
写一个函数,函数里面最多3行代码,不能用循环和递归。
函数的输入是一个数字(0-255),函数的输出是这个数字表示的二进制数中1的个数。
如输入5 (101),输出为2。
|
Table Lookup
Use a pre-built lookup table of all the 1-bit counts for every possibly byte, then index into that for each byte that comprises the word. This is the fastest method (slightly) for 32 bits on both Intel and Sparc, and (even more slightly) the fastest for 64 bits on Sparc, falling to second fastest on 64 bits on Intel. Changing the lookup table from anything but unsigned or int makes it a little slower (what with the extra casting and byte-loading the compiler is forced to add.)
unsigned numbits_lookup_table[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
6, 7, 6, 7, 7, 8
};
unsigned numbits_lookup(unsigned i)
{
unsigned n;
n = numbits_lookup_table[i & 0xff];
n += numbits_lookup_table[i>>8 & 0xff];
n += numbits_lookup_table[i>>16 & 0xff];
n += numbits_lookup_table[i>>24 & 0xff];
return n;
}
Counters
If you want a full explanation of how this works, read my writeup at counting 1 bits, but suffice it to say that you are essentially partitioning the word into groups, and combining the groups by adding them together in pairs until you are left with only one group, which is the answer. (performance notes in the next section.)
unsigned numbits(unsigned i)
{
const unsigned MASK1 = 0x55555555;
const unsigned MASK2 = 0x33333333;
const unsigned MASK4 = 0x0f0f0f0f;
const unsigned MASK8 = 0x00ff00ff;
const unsigned MASK16 = 0x0000ffff;
i = (i&MASK1 ) + (i>>1 &MASK1 );
i = (i&MASK2 ) + (i>>2 &MASK2 );
i = (i&MASK4 ) + (i>>4 &MASK4 );
i = (i&MASK8 ) + (i>>8 &MASK8 );
i = (i&MASK16) + (i>>16&MASK16);
return i;
}
Optimized Counters
call pointed out in counting 1 bits that you could optimize the Counters method further if you pay attention to which bits you care about and which you don't, which allows you to skip applying some of the masks.
Some symbols that I'll use to represent what's going on:
0: bits we know are zero from the previous step
o: bits we know are zero due to masking
-: bits we know are zero due to shifting
X: bits that might be 1 and we care about their values
x: bits that might be 1 but we don't care about their values
So a 0 plus a 0 is still a 0, obviously; the tricky ones are the others, but they're not even so bad. 0 plus X is X, since if the X is a 0 or a 1, added to 0 it will pass through unchanged. However, X plus X is XX, because the sum can range from 0 (0+0), to 10 (1+1). The same holds true with xs, once those show up.
Step 1:
oXoXoXoXoXoXoXoXoXoXoXoXoXoXoXoX
+ -XoXoXoXoXoXoXoXoXoXoXoXoXoXoXoX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Step 2:
ooXXooXXooXXooXXooXXooXXooXXooXX
+ --XXooXXooXXooXXooXXooXXooXXooXX
0XXX0XXX0XXX0XXX0XXX0XXX0XXX0XXX
Step 3:
oooo0XXXoooo0XXXoooo0XXXoooo0XXX
+ ----0XXXoooo0XXXoooo0XXXoooo0XXX
0000XXXX0000XXXX0000XXXX0000XXXX
Step 4:
oooooooo0000XXXXoooooooo0000XXXX
+ --------0000XXXXoooooooo0000XXXX
00000000000XXXXX00000000000XXXXX
Step 5:
oooooooooooooooo00000000000XXXXX
+ ----------------00000000000XXXXX
00000000000000000000000000XXXXXX
You'll notice that the higher the step, the more known zeros (0) there are. call's suggestion was to change step 5 to this:
Step 5:
ooooooooooooxxxx00000000000XXXXX
+ ----------------00000000000XXXXX
000000000000xxxx0000000000XXXXXX
(mask) ooooooooooooooooooooooooooXXXXXX
(where "(mask)" means "after adding, apply a mask".)
However, you can go back even further and apply the same technique - all the way to step 3, in fact. The best I can think to optimize this changes the last three steps into the following: Step 3:
0xxx0XXX0xxx0XXX0xxx0XXX0xxx0XXX
+ ----0XXX0xxx0XXX0xxx0XXX0xxx0XXX
0xxxXXXX0xxxXXXX0xxxXXXX0xxxXXXX
(mask) 0000XXXX0000XXXX0000XXXX0000XXXX
Step 4:
0000xxxx0000XXXX0000xxxx0000XXXX
+ --------0000XXXX0000xxxx0000XXXX
0000xxxx000XXXXX000xxxxx000XXXXX
Step 5:
0000xxxx000xxxxx000xxxxx000XXXXX
+ ----------------000xxxxx000XXXXX
0000xxxx000xxxxx00xxxxxx00XXXXXX
(mask) ooooooooooooooooooooooooooXXXXXX
Anyway, that's all very lovely, but here's the C to do it:
unsigned numbits(unsigned i)
{
const unsigned MASK1 = 0x55555555;
const unsigned MASK2 = 0x33333333;
const unsigned MASK4 = 0x0f0f0f0f;
const unsigned MASK6 = 0x0000003f;
i = (i&MASK1 ) + (i>>1 &MASK1 );
i = (i&MASK2 ) + (i>>2 &MASK2 );
i = (i + (i>>4))&MASK4;
i = (i + (i>>8));
i = (i + (i>>16)) &MASK6;
return i;
}
The performance on this method is marginally worse than the lookup method in the 32 bit cases, slightly better than lookup on 64 bit Intel, and right about the same on 64 bit Sparc. Of note is the fact that loading one of these bitmasks into a register actually takes two instructions on RISC machines, and a longer-than-32-bit instruction on the Intel, because it's impossible to pack an instruction and 32 bits worth of data into a single 32 bit instruction. See the bottom of jamesc's writeup at MIPS for more details on that...
Subtract 1 and AND
See counting 1 bits SPOILER for a fuller explanation of this one, but basically the lowest 1-bit gets zeroed out every iteration, so when you run out of 1s to zero, you've iterated to the number of bits in the word. Clever. Unfortunately, not that terribly fast; it's roughly two to three times slower than the lookup and counters methods on both architectures.
unsigned numbits_subtraction(unsigned i)
{
unsigned n;
for(n=0; i; n++)
i &= i-1;
return n;
}
Straightforwardly Examine Each Bit
The most easily understandable and slowest method: iterate over all the bits in the word; if the current bit is a 1, then increment the counter, otherwise, do nothing. That's actually done here by looking at the least-significant bit on each iteration, then shifting to the right one, and iterating until there are no more 1 bits in the word. There's a little optimization in the #ifndef here: instead of doing if (i & 1) n++;, which uses a branch instruction, just add the actual value of the least-significant bit to the counter ( n += (i & 1); ), as it will be a 1 when you want to add 1, and 0 when you don't. (We're just twiddling bits anyway, so why not?) This actually makes the processor do more adds, but adding is fast, and branching is slow, on modern processors, so it turns out to be about twice as fast. However, even "twice as fast" is still four to five times slower than the lookup method, again, on all architectures.
unsigned numbits(unsigned i)
{
unsigned n;
for(n=0; i; i >>= 1)
#ifndef MORE_OPTIMAL
if (i & 1) n++;
#else
n += (i & 1);
#endif
return n;
}
Now, why does this all matter? It doesn't, really, but it was sure a good way to waste some time, and maybe someone learned some optimizing tricks from it... (Well, I did, actually - so I hope someone else did as well.)
这是求unsigned int里1的个数,0-255也能用。
Use a pre-built lookup table of all the 1-bit counts for every possibly byte, then index into that for each byte that comprises the word. This is the fastest method (slightly) for 32 bits on both Intel and Sparc, and (even more slightly) the fastest for 64 bits on Sparc, falling to second fastest on 64 bits on Intel. Changing the lookup table from anything but unsigned or int makes it a little slower (what with the extra casting and byte-loading the compiler is forced to add.)
unsigned numbits_lookup_table[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
6, 7, 6, 7, 7, 8
};
unsigned numbits_lookup(unsigned i)
{
unsigned n;
n = numbits_lookup_table[i & 0xff];
n += numbits_lookup_table[i>>8 & 0xff];
n += numbits_lookup_table[i>>16 & 0xff];
n += numbits_lookup_table[i>>24 & 0xff];
return n;
}
Counters
If you want a full explanation of how this works, read my writeup at counting 1 bits, but suffice it to say that you are essentially partitioning the word into groups, and combining the groups by adding them together in pairs until you are left with only one group, which is the answer. (performance notes in the next section.)
unsigned numbits(unsigned i)
{
const unsigned MASK1 = 0x55555555;
const unsigned MASK2 = 0x33333333;
const unsigned MASK4 = 0x0f0f0f0f;
const unsigned MASK8 = 0x00ff00ff;
const unsigned MASK16 = 0x0000ffff;
i = (i&MASK1 ) + (i>>1 &MASK1 );
i = (i&MASK2 ) + (i>>2 &MASK2 );
i = (i&MASK4 ) + (i>>4 &MASK4 );
i = (i&MASK8 ) + (i>>8 &MASK8 );
i = (i&MASK16) + (i>>16&MASK16);
return i;
}
Optimized Counters
call pointed out in counting 1 bits that you could optimize the Counters method further if you pay attention to which bits you care about and which you don't, which allows you to skip applying some of the masks.
Some symbols that I'll use to represent what's going on:
0: bits we know are zero from the previous step
o: bits we know are zero due to masking
-: bits we know are zero due to shifting
X: bits that might be 1 and we care about their values
x: bits that might be 1 but we don't care about their values
So a 0 plus a 0 is still a 0, obviously; the tricky ones are the others, but they're not even so bad. 0 plus X is X, since if the X is a 0 or a 1, added to 0 it will pass through unchanged. However, X plus X is XX, because the sum can range from 0 (0+0), to 10 (1+1). The same holds true with xs, once those show up.
Step 1:
oXoXoXoXoXoXoXoXoXoXoXoXoXoXoXoX
+ -XoXoXoXoXoXoXoXoXoXoXoXoXoXoXoX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Step 2:
ooXXooXXooXXooXXooXXooXXooXXooXX
+ --XXooXXooXXooXXooXXooXXooXXooXX
0XXX0XXX0XXX0XXX0XXX0XXX0XXX0XXX
Step 3:
oooo0XXXoooo0XXXoooo0XXXoooo0XXX
+ ----0XXXoooo0XXXoooo0XXXoooo0XXX
0000XXXX0000XXXX0000XXXX0000XXXX
Step 4:
oooooooo0000XXXXoooooooo0000XXXX
+ --------0000XXXXoooooooo0000XXXX
00000000000XXXXX00000000000XXXXX
Step 5:
oooooooooooooooo00000000000XXXXX
+ ----------------00000000000XXXXX
00000000000000000000000000XXXXXX
You'll notice that the higher the step, the more known zeros (0) there are. call's suggestion was to change step 5 to this:
Step 5:
ooooooooooooxxxx00000000000XXXXX
+ ----------------00000000000XXXXX
000000000000xxxx0000000000XXXXXX
(mask) ooooooooooooooooooooooooooXXXXXX
(where "(mask)" means "after adding, apply a mask".)
However, you can go back even further and apply the same technique - all the way to step 3, in fact. The best I can think to optimize this changes the last three steps into the following: Step 3:
0xxx0XXX0xxx0XXX0xxx0XXX0xxx0XXX
+ ----0XXX0xxx0XXX0xxx0XXX0xxx0XXX
0xxxXXXX0xxxXXXX0xxxXXXX0xxxXXXX
(mask) 0000XXXX0000XXXX0000XXXX0000XXXX
Step 4:
0000xxxx0000XXXX0000xxxx0000XXXX
+ --------0000XXXX0000xxxx0000XXXX
0000xxxx000XXXXX000xxxxx000XXXXX
Step 5:
0000xxxx000xxxxx000xxxxx000XXXXX
+ ----------------000xxxxx000XXXXX
0000xxxx000xxxxx00xxxxxx00XXXXXX
(mask) ooooooooooooooooooooooooooXXXXXX
Anyway, that's all very lovely, but here's the C to do it:
unsigned numbits(unsigned i)
{
const unsigned MASK1 = 0x55555555;
const unsigned MASK2 = 0x33333333;
const unsigned MASK4 = 0x0f0f0f0f;
const unsigned MASK6 = 0x0000003f;
i = (i&MASK1 ) + (i>>1 &MASK1 );
i = (i&MASK2 ) + (i>>2 &MASK2 );
i = (i + (i>>4))&MASK4;
i = (i + (i>>8));
i = (i + (i>>16)) &MASK6;
return i;
}
The performance on this method is marginally worse than the lookup method in the 32 bit cases, slightly better than lookup on 64 bit Intel, and right about the same on 64 bit Sparc. Of note is the fact that loading one of these bitmasks into a register actually takes two instructions on RISC machines, and a longer-than-32-bit instruction on the Intel, because it's impossible to pack an instruction and 32 bits worth of data into a single 32 bit instruction. See the bottom of jamesc's writeup at MIPS for more details on that...
Subtract 1 and AND
See counting 1 bits SPOILER for a fuller explanation of this one, but basically the lowest 1-bit gets zeroed out every iteration, so when you run out of 1s to zero, you've iterated to the number of bits in the word. Clever. Unfortunately, not that terribly fast; it's roughly two to three times slower than the lookup and counters methods on both architectures.
unsigned numbits_subtraction(unsigned i)
{
unsigned n;
for(n=0; i; n++)
i &= i-1;
return n;
}
Straightforwardly Examine Each Bit
The most easily understandable and slowest method: iterate over all the bits in the word; if the current bit is a 1, then increment the counter, otherwise, do nothing. That's actually done here by looking at the least-significant bit on each iteration, then shifting to the right one, and iterating until there are no more 1 bits in the word. There's a little optimization in the #ifndef here: instead of doing if (i & 1) n++;, which uses a branch instruction, just add the actual value of the least-significant bit to the counter ( n += (i & 1); ), as it will be a 1 when you want to add 1, and 0 when you don't. (We're just twiddling bits anyway, so why not?) This actually makes the processor do more adds, but adding is fast, and branching is slow, on modern processors, so it turns out to be about twice as fast. However, even "twice as fast" is still four to five times slower than the lookup method, again, on all architectures.
unsigned numbits(unsigned i)
{
unsigned n;
for(n=0; i; i >>= 1)
#ifndef MORE_OPTIMAL
if (i & 1) n++;
#else
n += (i & 1);
#endif
return n;
}
Now, why does this all matter? It doesn't, really, but it was sure a good way to waste some time, and maybe someone learned some optimizing tricks from it... (Well, I did, actually - so I hope someone else did as well.)
这是求unsigned int里1的个数,0-255也能用。
|
struct haha
{
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
unsigned int e:1;
unsigned int f:1;
unsigned int g:1;
unsigned int h:1;
};
haha *s = (haha *)&inputNumber;
int count = s->a + s->b + s->c + s->d + s->e + s->f + s->g + s->h;
{
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
unsigned int e:1;
unsigned int f:1;
unsigned int g:1;
unsigned int h:1;
};
haha *s = (haha *)&inputNumber;
int count = s->a + s->b + s->c + s->d + s->e + s->f + s->g + s->h;
您可能感兴趣的文章:
本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术,将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。