Page 1 of 1

Random fpops_est

Posted: Sat Jan 29, 2011 3:47 pm
by Ananas
The fpops_est (used to calculate the estimated runtime) seems to be quite random here :

Code: Select all

    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>156749973470.000000</rsc_fpops_est>
    <rsc_fpops_est>151272401416.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>369909759747.000000</rsc_fpops_est>
    <rsc_fpops_est>161091959979.000000</rsc_fpops_est>
    <rsc_fpops_est>1706232249440.000000</rsc_fpops_est>
This - together with the extremely short deadline - often forces BOINC into panic mode,
i.e. EDF / Earliest Deadline First.

24 hours is the critical lower deadline limit for the panic mode, 6 hours more and more
constant fpops_est values would avoid this "selfishness" of the eOn workunits.

Re: Random fpops_est

Posted: Sun Jan 30, 2011 4:09 pm
by chill
The rsc_fpops_est will vary per wu as some wus belong to different simulations and each simulation has a simple algorithm for guessing the average fpops per wu.

I have increased the delay bound to 30 hours. I hope this helps with the panic mode.

Re: Random fpops_est

Posted: Wed Feb 02, 2011 9:50 am
by Ananas
Yes, thanks - that helps, no panic mode anymore :-)

About the fpops_est thing :

The results usually take between 1.5 and 5 minutes on my box, but now and then one with a really low fpops_est value must occur, causing the estimated runtime (roughly derived from : fpops_est / duration correction factor / benchmark) to shoot up to something like 35 hours (I have even seen 128 hours once).

The BOINC handler for this DCF works like this :

- result runs longer than calculated from benchmark result and fpops_est => the new DCF is calculated directly from this one result

- result runs a bit shorter than calculated from benchmark result and fpops_est => the new DCF is influenced by this result by 10%

- result runs much shorter than calculated from benchmark result and fpops_est => the new DCF is influenced by this result by just 1%

So if fpops_est is way too short just once, it takes tons of results to fix the DCF and bring it back to a correct value - this influences the client side cache handler as well as the work scheduler.

Re: Random fpops_est

Posted: Tue Feb 08, 2011 7:48 am
by Ananas
Currently it works like a charm, no jumpy estimated runtimes anymore and no panic mode :-)