I want to highlight the some aspects of performance and server load of the event. The server runs on 1 pretty standard machine (not a server):
- CPU - Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
- Memory - 6 GB
- Hard disk - 1 TB
- Ethernet: 100 mbit/s
After the press published news on their sites the server got hit by the users, google analytics registered that day from 16:30 till 00:00 the following data:
- 3402 visits
- 3154 visitors
- 13861 page views
The problem was a spike right after the media sites published the news and the site started to lag. In general I did not knew what amount of users to expect, the approach was "let's wait and see", sure I ran various "ab" benchmarks but I did not knew how many users will actually use service and where the bottle neck will be CPU, RAM, IO or Bandwidth. We also have other services like WMS/WFS that are CPU and RAM heavy, the problem with static tiles is that 1 visitor produces about 30 http requests when he zooms.
Munin grapths for the event:
As you can see from the graphs the problem was in the IO, our hard disk just could not handle the load, the traffic was pretty high too. All in all I am pretty happy with the results, and I now have a better understanding on how to scale the portal horizontally in case the load grows, but thats for another post.



Thanks it was an eye opener.
ReplyDeletenice, but you are missing some good layers, i.e. public transport - http://openmap.lt/?zoom=8&lat=46.94402&lon=28.92876&layers=B000FTT
ReplyDelete