当前位置:  编程技术>移动开发
本页文章导读:
    ▪Configuration Parameters: What can you just ignore        Configuration Parameters: What can you just ignore? http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/   Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-de.........
    ▪ M_PI M_PI_2 M_PI_4的含意        M_PI M_PI_2 M_PI_4的含义 M_PI 代表pi   M_PI_2 代表pi/2 M_PI_4 代表pi/4 ......
    ▪ 体味NSString的copy属性       体会NSString的copy属性 规范上NSString做属性都是写成copy的,理论上应该是复制了字符串而不是单纯的增加引用计数,其实问题只会出现在把NSMutableString赋值给NSString的时候。   @interface Demo : NSOb.........

[1]Configuration Parameters: What can you just ignore
    来源: 互联网  发布时间: 2014-02-18
Configuration Parameters: What can you just ignore?

http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/

 

Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-default.xml that you can override in hadoop-site.xml . Some specify file paths on your system, but others adjust levers and knobs deep inside Hadoop’s guts. Unfortuately, there’s little or no documentation on how to set them well. Is there a single optimal configuration? Are there some settings that can just be “set to 11?”

At Cloudera, we’re working hard to make Hadoop easier to use and to make configuration less painful. Our Hadoop Configuration Tool gives you a web-based guide to help set up your cluster. Once it’s running, though, you might want to look under the hood and tune things a bit.

The rest of this post discusses why it’s a bad idea to just set all the limits as high as they’ll go, and gives you some pointers to get started on finding a happy medium.

Why can’t you just set all the limits to 1,000,000?

Increasing most settings has a direct impact on memory consumption. Increasing DataNode and TaskTracker settings, therefore, has an adverse impact on RAM available to individual MapReduce tasks. On large hardware, they can be set generously high. In general though, unless you have several dozen more more nodes working together, dialing up settings very high wastes system resources like RAM that could be better applied to running your mapper and reducer code.

That having been said, here’s a list of some things that can be cranked up higher than the defaults by a fair margin:

File descriptor limits

A busy Hadoop daemon might need to open a lot of files. The open fd ulimit in Linux defaults to 1024, which might be too low. You can set to something more generous — maybe 16384. Setting this an order of magnitude higher (e.g., 128K) is probably not a good idea. No individual Hadoop daemon is supposed to need hundreds of thousands of fds; if it’s consuming that many, then there’s probably an fd leak or other bug that needs fixing. This would just mask the true problem until errors started showing up somewhere else.

You can view your ulimits in bash by running:

$ ulimit -a

To set the fd ulimit for a process, you’ll need to be root. As root, open a shell, and run:

# ulimit -n 16384

You can then run the Hadoop daemon from that shell; the ulimits will be inherited. e.g.:

# sudo -u hadoop $HADOOP_HOME/bin/hadoop-daemon.sh start namenode

You can also set the ulimit for the hadoop user in /etc/security/limits.conf ; this mechanism will set the value persistently. Make sure pam_limits is enabled for whatever auth mechanism the hadoop daemon is using. The entry will look something like:

hadoop hard nofile 16384

If you’re running our distribution , we ship a modified version of Hadoop 0.18.3 that includes HADOOP-4346 , a fix for the “soft fd leak” that has affected Hadoop since 0.17, so this should be less critical for our users. Users of the official Apache Hadoop release are affected by the fd leak for all 0.17, 0.18, and 0.19 versions. (The fix is committed for 0.20.) For the curious, we’ve published a list of all differences between our release of Hadoop and the stock 0.18.3 release.

If you’re running Linux 2.6.27, you should also set the epoll limit to something generous; maybe 4096 or 8192.

# echo 4096 > /proc/sys/fs/epoll/max_user_instances

Then put the following text in /etc/sysctl.conf :

fs.epoll.max_user_instances = 4096

See http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/ for more details.

Internal settings

If there is more RAM available than is consumed by task instances, set io.sort.factor to 25 or 32 (up from 10). io.sort.mb should be 10 * io.sort.factor . Don’t forget, multiply io.sort.mb by the number of concurrent tasks to determine how much RAM you’re actually allocating here, to prevent swapping. (So 10 task instances with io.sort.mb = 320 means you’re actually allocating 3.2 GB of RAM for sorting, up from 1.0 GB.) An open ticket on the Hadoop bug tracking database suggests making the default value here 100. This would likely result in a lower per-stream cache size than 10 MB.

io.file.buffer.size – this is one of the more “magic” parameters. You can set this to 65536 and leave it there. (I’ve profiled this in a bunch of scenarios; this seems to be the sweet spot.)

If the NameNode and JobTracker are on big hardware, set dfs.namenode.handler.count to 64 and same with mapred.job.tracker.handler.count . If you’ve got more than 64 GB of RAM in this machine, you can double it again.

dfs.datanode.handler.count defaults to 3 and could be set a bit higher. (Maybe 8 or 10.) More than this takes up memory that could be devoted to running MapReduce tasks, and I don’t know that it gives you any more performance. (An increased number of HDFS clients implies an increased number of DataNodes to handle the load.)

mapred.child.ulimit should be 2–3x higher than the heap size specified in mapred.child.java.opts and left there to prevent runaway child task memory consumption.

Setting tasktracker.http.threads higher than 40 will deprive individual tasks of RAM, and won’t see a positive impact on shuffle performance until your cluster is approaching 100 nodes or more.

Conclusions

Configuring Hadoop for “optimal performance” is a moving target, and depends heavily on your own applications. There are settings that need to be moved off their defaults, but finding the best value for each is difficult. Our configurator for Hadoop will do a reasonable job of getting you started.

We’d love to hear from you about your own configurations. Did you discover a combination of settings that really made your cluster sing? Please share in the comments.


    
[2] M_PI M_PI_2 M_PI_4的含意
    来源: 互联网  发布时间: 2014-02-18
M_PI M_PI_2 M_PI_4的含义

M_PI 代表pi  

M_PI_2 代表pi/2

M_PI_4 代表pi/4


    
[3] 体味NSString的copy属性
    来源: 互联网  发布时间: 2014-02-18
体会NSString的copy属性

规范上NSString做属性都是写成copy的,理论上应该是复制了字符串而不是单纯的增加引用计数,其实问题只会出现在把NSMutableString赋值给NSString的时候。

 

@interface Demo : NSObject
{
    NSString *retainString;
    NSString *copyString;
}

@property (nonatomic, retain)NSString *retainString;
@property (nonatomic, copy)NSString *copyString;
@end

@implementation Demo
@synthesize retainString;
@synthesize copyString;
-(void)dealloc
{
    [retainString release];
    [copyString release];
    [super dealloc];
}

@end

Demo *o = [[Demo alloc] init];
NSMutableString *s1 = [[NSMutableString alloc] initWithCapacity:100];
[s1 setString:@"fuckyou"];
o.retainString = s1;
o.copyString = s1;
NSLog(@"retain string is %@", o.retainString);
NSLog(@"copy string is %@", o.copyString);
[s1 setString:@"fuckme"];
NSLog(@"retain string is %@", o.retainString);
NSLog(@"copy string is %@", o.copyString);

 这样就可以看出,当使用retain方式的时候,NSMutableString的内容变化时,语义上应该不可变的NSString也变化了,而用copy则是始终保持赋值时的内容。

 

如果对实际类型就是NSString的对象用了copy,那其实就是retain,你可以通过观察引用计数来发现,而且就语义上来说也完全没有问题,同时也避免了不需要的字符串拷贝的消耗.


    
最新技术文章:
▪Android开发之登录验证实例教程
▪Android开发之注册登录方法示例
▪Android获取手机SIM卡运营商信息的方法
▪Android实现将已发送的短信写入短信数据库的...
▪Android发送短信功能代码
▪Android根据电话号码获得联系人头像实例代码
▪Android中GPS定位的用法实例
▪Android实现退出时关闭所有Activity的方法
▪Android实现文件的分割和组装
▪Android录音应用实例教程
▪Android双击返回键退出程序的实现方法
▪Android实现侦听电池状态显示、电量及充电动...
▪Android获取当前已连接的wifi信号强度的方法
▪Android实现动态显示或隐藏密码输入框的内容
▪根据USER-AGENT判断手机类型并跳转到相应的app...
▪Android Touch事件分发过程详解
▪Android中实现为TextView添加多个可点击的文本
▪Android程序设计之AIDL实例详解
▪Android显式启动与隐式启动Activity的区别介绍
▪Android按钮单击事件的四种常用写法总结
▪Android消息处理机制Looper和Handler详解
▪Android实现Back功能代码片段总结
▪Android实用的代码片段 常用代码总结
▪Android实现弹出键盘的方法
▪Android中通过view方式获取当前Activity的屏幕截...
▪Android提高之自定义Menu(TabMenu)实现方法
▪Android提高之多方向抽屉实现方法
▪Android提高之MediaPlayer播放网络音频的实现方法...
▪Android提高之MediaPlayer播放网络视频的实现方法...
▪Android提高之手游转电视游戏的模拟操控
 


站内导航:


特别声明:169IT网站部分信息来自互联网,如果侵犯您的权利,请及时告知,本站将立即删除!

©2012-2021,,E-mail:www_#163.com(请将#改为@)

浙ICP备11055608号-3