all Works 2136951010_ .... in errors

Current eOn code and boinc distributed computing

Moderator: moderators

Post Reply
OlaV_Ouafouaf
Posts: 3
Joined: Wed Sep 15, 2010 5:36 pm

all Works 2136951010_ .... in errors

Post by OlaV_Ouafouaf »

The 3 works : 2136951010_5_36020_0 - 2136951010_14_83920_0 and 2136951010_45_363770_0
didn't start and report this error :
Host Project Date Message
maison eon2 23/09/2010 21:01:37 Starting task 2136951010_45_363770_0 using client version 508
maison eon2 23/09/2010 21:01:38 Computation for task 2136951010_45_363770_0 finished
maison eon2 23/09/2010 21:01:38 Output file 2136951010_45_363770_0_0 for task 2136951010_45_363770_0 absent
maison eon2 23/09/2010 21:01:38 Output file 2136951010_45_363770_0_1 for task 2136951010_45_363770_0 absent
maison eon2 23/09/2010 21:01:38 Output file 2136951010_45_363770_0_2 for task 2136951010_45_363770_0 absent
maison eon2 23/09/2010 21:01:38 Output file 2136951010_45_363770_0_3 for task 2136951010_45_363770_0 absent
maison eon2 23/09/2010 21:01:38 Output file 2136951010_45_363770_0_4 for task 2136951010_45_363770_0 absent

(absent mean : not present)

host : http://eon.ices.utexas.edu/eon2/results.php?hostid=1973

I looked for a log and didn't find it.

Hope this will help
matt
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: all Works 2136951010_ .... in errors

Post by matt »

Thanks for reporting these. Have you seen many more failed work units? Our work units can encounter computation errors due to edge-cases and numerical error. Currently, our client gives a non-zero error code when this happens, which BOINC interprets as a failed work unit. We are working on a major update to client code. Once this is done, work units should no longer fail for these reasons.
OlaV_Ouafouaf
Posts: 3
Joined: Wed Sep 15, 2010 5:36 pm

Re: all Works 2136951010_ .... in errors

Post by OlaV_Ouafouaf »

Some precisions :
Only works units beginning by "2136951010_" are in errors and none of them execute them self correctly.
My all the other wu execute correctly.

The exit code error is "error 1 (0x1)

I have a 4th wu of this familly in error :
Project Date Message
eon2 24/09/2010 03:43:29 Starting 2136951010_68_569672_0
eon2 24/09/2010 03:43:29 Starting task 2136951010_68_569672_0 using client version 508
eon2 24/09/2010 03:43:30 Computation for task 2136951010_68_569672_0 finished
eon2 24/09/2010 03:43:30 Output file 2136951010_68_569672_0_0 for task 2136951010_68_569672_0 absent
eon2 24/09/2010 03:43:30 Output file 2136951010_68_569672_0_1 for task 2136951010_68_569672_0 absent
eon2 24/09/2010 03:43:30 Output file 2136951010_68_569672_0_2 for task 2136951010_68_569672_0 absent
eon2 24/09/2010 03:43:30 Output file 2136951010_68_569672_0_3 for task 2136951010_68_569672_0 absent
eon2 24/09/2010 03:43:30 Output file 2136951010_68_569672_0_4 for task 2136951010_68_569672_0 absent
John Hunt
Posts: 1
Joined: Sat Sep 25, 2010 12:38 pm

Re: all Works 2136951010_ .... in errors

Post by John Hunt »

Received a few of these myself -
http://eon.ices.utexas.edu/eon2/result. ... id=5401572
All other WUs behaving OK.
Hope you can fix this!
Dataman
Posts: 1
Joined: Sun Sep 26, 2010 9:51 pm

Re: all Works 2136951010_ .... in errors

Post by Dataman »

Is there an estimate for when the comp errors will be fixed? I have 26 cores here at the moment and about 10% fail. Cheers!
matt
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: all Works 2136951010_ .... in errors

Post by matt »

Right now, we are restructuring the client code and cleaning up these errors along the way. This process could take a few weeks, but once finished, we should have longer and more stable work units. (We'll also be able to use more accurate simulation methods!)
Post Reply