Server issues?

Current eOn code and boinc distributed computing

Moderator: moderators

Saenger
Posts: 29
Joined: Thu Sep 02, 2010 4:23 pm
Contact:

Server issues?

Post by Saenger »

I have two files hanging in an DL error with this messages in BOINC:

Code: Select all

Fr 03 Sep 2010 17:15:44 CEST	eon2	Started download of parameters_passed_1826389523_0_20650.dat
Fr 03 Sep 2010 17:15:46 CEST	eon2	Temporarily failed download of parameters_passed_1826389523_0_20650.dat: HTTP error
Fr 03 Sep 2010 17:15:46 CEST	eon2	Backing off 3 hr 43 min 9 sec on download of parameters_passed_1826389523_0_20650.dat
Fr 03 Sep 2010 17:17:26 CEST	eon2	Sending scheduler request: Requested by project.
Fr 03 Sep 2010 17:17:26 CEST	eon2	Not reporting or requesting tasks
Fr 03 Sep 2010 17:17:31 CEST	eon2	Scheduler request completed
As you can see, shortly afterwards a contact was successful, I don't know what kind of HTTP error occurred.
Last edited by Saenger on Fri Sep 03, 2010 8:03 pm, edited 2 times in total.
matt
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: Server issues?

Post by matt »

We're working on the problem. In the meantime, you can work around this issue by resetting and then updating the project.
Saenger
Posts: 29
Joined: Thu Sep 02, 2010 4:23 pm
Contact:

Re: Server issues?

Post by Saenger »

matt wrote :
> We're working on the problem. In the meantime, you can work around this
> issue by resetting and then updating the project.

Only after the other 5 already sitting on my machine are through ;)


Edith says:
I simply aborted the stuck WU. Will probably be the best way to get rid of this fault.
It seems not be resent here, in case you'll miss it, it's this one: http://eon.ices.utexas.edu/eon2/workuni ... id=3125206
chill
Posts: 96
Joined: Tue Jul 28, 2009 9:04 pm

Re: Server issues?

Post by chill »

We had a problem with permissions with some of the files on our server. This means that some of the files couldn't be downloaded and resulted in stuck work units.
Paladin*

Re: Server issues?

Post by Paladin* »

The Server Chokes my Internet Connection right off almost, the Wu's are very slow to Upload or Download and when I check the Server Status it says the Upload & Download part is shut off. That mustn't be true though because the Server did give me work and tries to give me more Wu's but with the Wu's trying to Upload and Download everything just gets choked off until I have to Stop new work until everything from eon clears out in a few hours. I can hardly even use my Web Browser things get so chocked up from eon Wu's ...
carlos
Posts: 11
Joined: Wed May 25, 2005 5:25 pm

Re: Server issues?

Post by carlos »

graeme,

You need to explain here how eon works (interaction between server and clients).

Is the following description still valid (despite wrong links)?

Code: Select all

The EON project, as the name suggests, is trying to simulate atomistic systems over long time scales.

The time scale problem is a major challenge for atomistic simulations in many field including chemistry, physics, and materials science. Putting the problem simply, atoms are light and they move very quickly. In solids, atoms vibrate about 10^15 times per second. If we used computers to directly calculate the motion of atoms, it would take many thousands of years to simulate how they move over a fraction of a second. Since interesting things happen on timescales of seconds or minutes, we need new methodology to investigate these long (for atoms) time scales.

The time scale problem is different from other projects that can be easily handled with distributed computing. With Seti, for example, there are many independent work units that can be given to different processors. In the EON project, it is long times that needs to be simulated. Because time is cumulative, and what happens at one point in time affects the future, we can't just give a little bit to each processor.

EON uses a less direct approach to determine how atomic systems behave over long times. In the EON project, all processors are working on the same project, simulating the same system, all at the same time. Each work unit is a task to figure out one possible way the system can evolve.

In terms of the physics, the current state of the system (the location of all the atoms) is a basin or valley on a potential energy landscape. A mountain landscape makes a good analogy. The system wants to stay as low on the landscape as possible. But, thermal energy causes the system to move randomly so that every once in a while it moves enough to cross a ridge on the landscape and fall into another valley. For the atomic system, such an event would correspond to one or a set of atoms moving in a crystal or rearranging on a surface. This event can be characterized by a saddle point on the landscape -- the lowest point on a ridge separating two valleys. Mathematically, each processor working on the EON project is finding a saddle point, which corresponds to a possible event available to the system.

The server keeps track of the current state. This is shown graphically at http://eon.chem.washington.edu/current.php Each processor determines a possible way that the system can evolve, and calculates the rate for that process. When the server has enough processes to ensure accurate dynamics (typically after 1 hour) it chooses which process should occur using a random number algorithm called kinetic Monte Carlo. At this point, the system is updated to reflect the occurrence of the chosen process, the simulation clock is advanced by the appropriate amount, and the cycle is repeated.

We have been using this system to simulate the growth of metal films, something that is particularly important for catalysis. Understanding how atoms evolve at these small scales over long times is important, and will become increasingly important for many fields, particularly in nanoscience, chemistry and material science. 
chill
Posts: 96
Joined: Tue Jul 28, 2009 9:04 pm

Re: Server issues?

Post by chill »

The text posted by carlos is a good description of the types of calculations we are currently doing in the eon project. However, our focus has shifted to studying nanoparticles for catalysis. We hope to post more information soon.
Paladin* wrote:The Server Chokes my Internet Connection right off almost, the Wu's are very slow to Upload or Download and when I check the Server Status it says the Upload & Download part is shut off. That mustn't be true though because the Server did give me work and tries to give me more Wu's but with the Wu's trying to Upload and Download everything just gets choked off until I have to Stop new work until everything from eon clears out in a few hours. I can hardly even use my Web Browser things get so chocked up from eon Wu's ...
Paladin, how fast is your internet connection? Our servers are located here at the University of Texas (in Austin) and have extremely fast internet connectivity. The results from our work units could be as large as a megabyte. If your upload speed was something like 128kbps (an upload speed of 16KB/s) it would take a least a minute to upload the result. One thing I have been thinking about is having the boinc client compress (zip) the result files (as they are highly compressible) to make them smaller and faster to upload. This would put some additional strain on the server as it would have to decompress them, but I think we are enough idle cpu time to handle it.

So if you could just let me know how fast your internet connection is and maybe how many results you might be trying to report at once, I can figure out where the problem is.
carlos
Posts: 11
Joined: Wed May 25, 2005 5:25 pm

Re: Server issues?

Post by carlos »

chill,

As I stated before someone needs to explain how the client interacts with the server. This upload issue Paladin* is talking about always was eon issue from long time ago. eOn is that kind of project that needs permanent internet connection and increases bandwidth and slowdowns internet connection when surfing. When you have two or three machines running eOn you can start to feel your internet connection going down, imagine when you have above 5 machines. My connection is an ADSL one: 24Mb download speed, 1Mb upload speed.

This issue will increase its effect by the growth of the project. To minimize it the wu's should be longer to process.


Carlos
Saenger
Posts: 29
Joined: Thu Sep 02, 2010 4:23 pm
Contact:

Re: Server issues?

Post by Saenger »

chill wrote:One thing I have been thinking about is having the boinc client compress (zip) the result files (as they are highly compressible) to make them smaller and faster to upload. This would put some additional strain on the server as it would have to decompress them, but I think we are enough idle cpu time to handle it.
carlos wrote:This issue will increase its effect by the growth of the project. To minimize it the wu's should be longer to process.
Both solutions will be good solutions.
The WUs are currently extreme short, they could without any problem on the cruncher site be ten times as long (as long as the traffic doesn't go up with the same scale as well).
And not everyone has a real flatrate, a lot of people connect with volume packages. To decrease the traffic should be an important goal.
Simone [3dz2]
Posts: 1
Joined: Sun Sep 05, 2010 1:21 pm

Re: Server issues?

Post by Simone [3dz2] »

I'm extremely agree with increasing the duration of work units.

Bye, Bye
Simone
Paladin*

Re: Server issues?

Post by Paladin* »

chill wrote:The text posted by carlos is a good description of the types of calculations we are currently doing in the eon project. However, our focus has shifted to studying nanoparticles for catalysis. We hope to post more information soon.
Paladin* wrote:The Server Chokes my Internet Connection right off almost, the Wu's are very slow to Upload or Download and when I check the Server Status it says the Upload & Download part is shut off. That mustn't be true though because the Server did give me work and tries to give me more Wu's but with the Wu's trying to Upload and Download everything just gets choked off until I have to Stop new work until everything from eon clears out in a few hours. I can hardly even use my Web Browser things get so chocked up from eon Wu's ...
Paladin, how fast is your internet connection? Our servers are located here at the University of Texas (in Austin) and have extremely fast internet connectivity. The results from our work units could be as large as a megabyte. If your upload speed was something like 128kbps (an upload speed of 16KB/s) it would take a least a minute to upload the result. One thing I have been thinking about is having the boinc client compress (zip) the result files (as they are highly compressible) to make them smaller and faster to upload. This would put some additional strain on the server as it would have to decompress them, but I think we are enough idle cpu time to handle it.

So if you could just let me know how fast your internet connection is and maybe how many results you might be trying to report at once, I can figure out where the problem is.
I have Comcast @ 8MB so that shouldn't be a problem, I don't have Problems with other Sites. But now that you mentioned being located in Texas that is probably the problem as I do at times have problems with connections to Texas. I don't probably have the problem but Comcast seems to have at times, seems to be better this morning though & I put a few more Box's on the Project to see if the Uploads will back up or not ...

When the Uploads start to back up there can be up to 300+ Files trying to Upload @ once with only 1kb or less speed each, that's what happens, the more uploads waiting the slower the Upload speeds get which is probably normal ... But the Downloads start to slow down too when there's that many files to Upload ...
chill
Posts: 96
Joined: Tue Jul 28, 2009 9:04 pm

Re: Server issues?

Post by chill »

carlos wrote: As I stated before someone needs to explain how the client interacts with the server. This upload issue Paladin* is talking about always was eon issue from long time ago. eOn is that kind of project that needs permanent internet connection and increases bandwidth and slowdowns internet connection when surfing. When you have two or three machines running eOn you can start to feel your internet connection going down, imagine when you have above 5 machines. My connection is an ADSL one: 24Mb download speed, 1Mb upload speed.

This issue will increase its effect by the growth of the project. To minimize it the wu's should be longer to process.
In our simulations we need to perform work (saddle searches) for the current configuration of the chemical system in order to find what processes (events that change the configuration of the system such as an atom hopping on a surface) are available. Once we have done enough searches to meet our confidence goal we choose one of these processes and advance to a new state. Once we are in a new state no more saddle searches need to be done for the previous state. The work units are these saddle searches that we need to do some number of on average. This means that work units, in order to be useful, must be completed in about the amount of time it takes for us to complete all of these searches.

Currently for the systems we are studying and the potentials we are using a single saddle search takes less than a minute and we are making each work unit do 10-50 saddle searches. Each saddle search produces the same amount of data. Simply making each work unit take longer by doing more saddle searches will have two effects. First, the work unit results will be that much larger to upload (since the files grow linearly with the number of searches we do). Secondly, we will get our searches back in even bigger chunks. We might only need to do several thousand searches per state, which means for a given simulation (of which we run more than one at a time on BOINC) we never need more than several thousand searches being performed at once. This means that we would like to keep the number of saddle searches small in order to more efficiently parallelize the problem.

Another way of making work units take longer is to use more accurate potentials. The potential is the way the energy and forces in the chemical system are calculated. Right now we are using a very simple one which is why a single saddle searche take less than a minute. We hope to use more accurate ones soon, where one search might take many minutes or hours while still producing the same amount of output as our searches that currently take seconds. This would mean that one workunit would equal one search, which is very little data compared to the current work units and take much longer. These types of work units are better suited to distributed computing.

However, currently I plan on implementing the server and client side compression to help this issue somewhat.
DigiK-oz
Posts: 5
Joined: Sat Sep 04, 2010 8:51 am

Re: Server issues?

Post by DigiK-oz »

On the matter of longer/bigger workunits, will checkpointing be implemented in those? As far as I can tell, the current workunits do not checkpoint and thus, on a system restart (or even a swap to a different project running at the same PC?) they will happily restart from zero. That's no real problem with small WU's, but as they get longer it will become annoying and waste a lot of CPU power.
Saenger
Posts: 29
Joined: Thu Sep 02, 2010 4:23 pm
Contact:

Re: Server issues?

Post by Saenger »

DigiK-oz wrote:or even a swap to a different project running at the same PC?
Checkpointless WUs should not swap to another project. But if you have "Keep WUs in memory" selected, it wouldn't matter anyway.
matt
Posts: 37
Joined: Thu Jul 17, 2008 10:51 pm

Re: Server issues?

Post by matt »

When we have much longer WUs (detailed quantum mechanical calculations), we will use checkpointing.
Post Reply