Computation errors

Current eOn code and boinc distributed computing

Moderator: moderators

Computation errors

Postby rebirther » Sun Sep 05, 2010 4:05 pm

Got some more computation errors
here

Looks like its not the app but just wrong parameter.
rebirther
 
Posts: 3
Joined: Sun Sep 05, 2010 9:12 am

Re: Computation errors

Postby matt » Mon Sep 06, 2010 6:24 pm

Are you finding that the majority of your work units are crashing or just a few?
matt
 
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: Computation errors

Postby rebirther » Mon Sep 06, 2010 7:31 pm

matt wrote:Are you finding that the majority of your work units are crashing or just a few?


Around 10 of 100 but its lost computation time.
rebirther
 
Posts: 3
Joined: Sun Sep 05, 2010 9:12 am

Re: Computation errors

Postby matt » Wed Sep 08, 2010 3:39 pm

The issue should be fixed. Are you still getting bad WUs?
matt
 
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: Computation errors

Postby rebirther » Wed Sep 08, 2010 6:02 pm

matt wrote:The issue should be fixed. Are you still getting bad WUs?


My last files looking good!
rebirther
 
Posts: 3
Joined: Sun Sep 05, 2010 9:12 am

Re: Computation errors

Postby Paladin* » Fri Sep 10, 2010 8:54 am

I'm still getting them this morning, seems anything starting with 129 Error's out. Just 1 Box of errors >>> http://eon.ices.utexas.edu/eon2/results.php?hostid=411
Paladin*
 
Posts: 3
Joined: Sat Sep 04, 2010 7:50 am

Re: Computation errors

Postby Sorceress » Fri Sep 10, 2010 3:04 pm

I am having comp errors as well. WU3770305 errored out. Can someone tell me what's wrong
before I go very far? I have another in waiting.
Sorceress
 
Posts: 9
Joined: Fri Sep 10, 2010 1:39 pm
Location: Tennessee USA

Re: Computation errors

Postby b0b3r » Fri Sep 10, 2010 3:59 pm

Hello Everyone

I also have problems with new application version for windows:

http://eon.ices.utexas.edu/eon2/results.php?hostid=1100

Like You may see with version 5.04 it was OK. But with 5.07 all tasks end with error.
b0b3r
 
Posts: 1
Joined: Fri Sep 10, 2010 3:36 pm

Re: Computation errors

Postby PinkPenguin » Fri Sep 10, 2010 6:02 pm

I can confirm Paladin's statement that all the WUs that error out begin 129.... this seams to happen only on Windows Boxes as my Linux and Mac boxes are returning them OK with 1 exception.

Looks like memory handling error as the Windows WUs all return the following error trying to read address 0:
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00414168 read attempt to address 0x00000000

The only Linux WU that failed had a segment violation which would tend to confirm the Windows error.... the following is the output which seams to relate to Matter.cpp:
This code only supports cubic cells.This code only supports cubic cells.This code only supports cubic cells.This code only supports cubic cells.A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).This code only supports cubic cells.This code only supports cubic cells.A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).A carriage return ('\r') has been detected. To work correctly, new lines should be indicated by the new line character (\n).This code only supports cubic cells.This code only supports cubic cells.This code only supports cubic cells.This code only supports cubic cells.06:14:58 (28199): called boinc_finish
SIGSEGV: segmentation violation
Stack trace (2 frames):
[0x80a36e7]
[0xc37400]

Exiting...
This is the Linux example: http://eon.ices.utexas.edu/eon2/result.php?resultid=3755192

All other 129... WUs on Linux seam to be OK according to the validator. Though all WUs on the Linux boxes whether 129... or otherwise generate a SIGSEGV segmentation violation even if they are validated correctly... ;)

Only the Mac WUs don't generate errors...
PinkPenguin
 
Posts: 3
Joined: Fri Sep 10, 2010 1:25 pm

Re: Computation errors

Postby Sorceress » Sun Sep 12, 2010 4:29 am

Why are we still getting the 129*** WUs? Now instead of erroring out they are being aborted by the project. Please stop sending them if their aren't any good.
I haven't had a single one 129** validate! Geesh...
Sorceress
 
Posts: 9
Joined: Fri Sep 10, 2010 1:39 pm
Location: Tennessee USA

Re: Computation errors

Postby chill » Sun Sep 12, 2010 4:51 am

Sorceress wrote:Why are we still getting the 129*** WUs? Now instead of erroring out they are being aborted by the project. Please stop sending them if their aren't any good.
I haven't had a single one 129** validate! Geesh...


Our project is still working on making our calculations as efficient as possible on a distributed computing platform. Currently in order to parallelize as best as possible we are making more work units than we actually need. I added the ability to cancel these extra work units when they are no longer needed, but I didn't realize that users wouldn't still get credit for the work. I removed this for the 129* work units, so now they will not be canceled. Soon we won't have this problem as we are implementing a method to only make as many work units as we need.

We have gotten many users on this project very quickly! We are working hard to make good use of all this new processing power.
chill
 
Posts: 80
Joined: Tue Jul 28, 2009 9:04 pm

Re: Computation errors

Postby mitrichr » Tue Sep 14, 2010 1:55 am

I guess add me to the list. I did not keep the WU's id's but, I just signed on to your project. I have a wee bit of credit, so some were O.K.. But I had several where they had run for over 30 minutes with no work done. I aborted them.
mitrichr
 
Posts: 27
Joined: Mon Sep 13, 2010 10:42 pm

Re: Computation errors

Postby mitrichr » Tue Sep 14, 2010 2:03 am

Sorry for the bother, I could not find where to get admins, my signature is not showing up, even though I selected to show it by default.

>>RSM
mitrichr
 
Posts: 27
Joined: Mon Sep 13, 2010 10:42 pm

Re: Computation errors

Postby Neil Polson » Thu Sep 16, 2010 8:48 pm

Just got another error : stderr out

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Error: size of hessian is zero. Try with smaller min_Displacement_Hessian

</stderr_txt>
]]>
Get a few of these each day. Noticed a lot of others getting them too. The last one ran for nearly 15 mins. Losing credit when it's not my fault is annoying. Any chance of these being eradicated?
Neil Polson
 
Posts: 4
Joined: Wed Sep 15, 2010 11:00 am

Re: Computation errors

Postby mitrichr » Thu Sep 16, 2010 9:06 pm

I show a whole bunch of WU's completed successfully. But when ever I check my tasks running in BOINC Manager, I see tasks with time spent and 0% work done. This is a waste of my time. My computers run 24/7, I do not, can not watch them all of the time.

You have been given precious space and on my new i7-840QM. If I cannot know that I can go away for 10-12 hours and things are going to be O.K., I cannot afford the waste of time and electricity.
Here are just two randomly selected tasks, one succeeded, the other failed.

Succeeded 809877401_669_7571100

no work done aborted 1728637339_675_27309900

>>RSM
mitrichr
 
Posts: 27
Joined: Mon Sep 13, 2010 10:42 pm

Next

Return to eOn

Who is online

Users browsing this forum: No registered users and 0 guests