- Go to the Hadoop scheduler web UI (typically hostname:8088)
- Find one of your jobs, it should have name "MATLAB Parallel Computing Job" and be submitted by you.
- Click on "History" (or "ApplicationMaster" if it is still running)
- Under the "Attempt Type" table, click on the number that appears in the "Failed" column.
- This should show a table of task attempts. Click on "logs" for any one of the attempts.
- This should give you a list of log files, including stderr/stdout/syslog. If it speaks about "Aggregation is not enabled", you will need to either configure the cluster with log aggregation, or do these steps while the job is still running.
- If there is a matlab_crash_dump file, that will contain useful information about what went wrong.
How to integrate Matlab with Hadoop Cluster??
1 次查看(过去 30 天)
显示 更早的评论
hi, I am trying to make Matlab+Hadoop Cluster. I run matlab program on this cluster but getting this Errors at data nodes.
I am using Matlab version: R2016b, hadoop-2.7.2
Error1 : 2017-02-02 15:52:41,962 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : com.mathworks.toolbox.parallel.hadoop.worker.RemoteMvm$CommunicationLostException at com.mathworks.toolbox.parallel.hadoop.worker.RemoteMvm.feval(Unknown Source) at com.mathworks.toolbox.parallel.hadoop.link.HadoopMatlabWorker.configureWorker(Unknown Source) at com.mathworks.toolbox.parallel.hadoop.link.HadoopMatlabWorker.<init>(Unknown Source) at com.mathworks.toolbox.parallel.hadoop.link.MatlabWorkerSingleton.getOrCreateWorker(Unknown Source) at com.mathworks.toolbox.parallel.hadoop.link.MatlabMapper.setup(Unknown Source) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at com.mathworks.toolbox.parallel.hadoop.MatlabReflectionMapper.run(Unknown Source) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Error 2:
2017-02-02 15:53:34,345 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave2/10.70.0.102 to slave3:40326 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242) at com.sun.proxy.$Proxy8.getTask(Unknown Source) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 4 more
2 个评论
Rick Amos
2017-2-15
This error message typically means the MATLAB Worker either was killed or crashed and there are several things to check. First, do the errors go away when you run the follow command in MATLAB just before running mapreduce:
distcomp.feature('HadoopReuseWorker', 'No')
Further, if the errors still appear, do you see crash dumps when you look at the failed task view in the Hadoop scheduler web UI?
This can be accessed by:
回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!