Runaway WUs

eOn code for long time scale dynamics

Moderator: moderators

Post Reply
Paratima
Posts: 7
Joined: Mon May 02, 2011 4:15 pm

Runaway WUs

Post by Paratima »

Yesterday morning, I found these two WUs, both running on the same system.

82347095 82343175

Both had run-times of over 6 hours. Neither was using any CPU time, just idling along, preventing other WUs from being processed.
I aborted both. Haven't seen any others on my three active systems running EON.
I will cheerfully provide any supporting information you would like.
Les
Paratima
Posts: 7
Joined: Mon May 02, 2011 4:15 pm

Re: Runaway WUs

Post by Paratima »

Caught and aborted another one this morning, after one hour and fifty minutes.
WU# 84912243
Same symptoms: run length over an hour and not using any CPU time.

If the admins are aware of this problem and have sufficient samples and info to solve it, just let me know and I'll shut up about it.
Paratima
Posts: 7
Joined: Mon May 02, 2011 4:15 pm

Re: Runaway WUs

Post by Paratima »

Yet another. WU# 88214266
Aborted after 2+ hours.
Paratima
Posts: 7
Joined: Mon May 02, 2011 4:15 pm

Re: Runaway WUs

Post by Paratima »

Still another. WU# 96019272.
Aborted after 2 hours, 39 minutes.
Just spinning - no work being done.
Cheers.
Paratima
Posts: 7
Joined: Mon May 02, 2011 4:15 pm

Re: Runaway WUs

Post by Paratima »

Two more tasks found running for 3+ hours, using no CPU time, just idling.
Getting REALLY tired of having to check all my machines every few hours.

Is there a fix for this or am I just wasting my time reporting it?

I'll CHEERFULLY post WU ID numbers and any other desired information.
upstatelabs
Posts: 20
Joined: Wed Oct 27, 2010 2:07 pm

Re: Runaway WUs

Post by upstatelabs »

Same issue here.

here is one WU that I aborted after 14hrs:
985262977_14115_189180275

Time was running but CPU was not in use for the WU.
Tex1954
Posts: 24
Joined: Fri May 27, 2011 9:47 pm

Re: Runaway WUs

Post by Tex1954 »

I get one of those sometimes, but lately the big problem has been the server aborting a lot of WU's and also getting overloaded. Many times I get this...


519 eon2 6/16/2011 9:53:32 AM Started upload of 985262977_14643_201901525_0_0
520 eon2 6/16/2011 9:53:32 AM Started upload of 453769022_18500_197012431_0_0
521 eon2 6/16/2011 9:53:34 AM [error] Error reported by file upload server: can't parse config file
522 eon2 6/16/2011 9:53:34 AM [error] Error reported by file upload server: can't parse config file
523 eon2 6/16/2011 9:53:34 AM Temporarily failed upload of 985262977_14643_201901525_0_0: transient upload error
Post Reply