One of things that seems extremely popular in today's competitive server market seems to be the publishing of "independent" benchmarking tests for every new system. The epitomy of these publicity campaigns, and part of an ongoing battle that pits the open source community against their arch rivals, is the series of Mindcraft benchmarks that compare NT/IIS performance against that of Linux/Apache.
I'm not a Linux evangelist, nor do I kneel at the church of Microsoft. I believe in using the right solution for the right set of circumstances, so I'd like to explain why a benchmark can actually be extremely misleading.
The first thing to know is that tests such as this are mainly sponsored by a company who wants the tests to "find" a certain fact: namely that their product is better than the competition. However, there's a great difference between testing something to find what the results are, and finding results that prove what you want to prove.
Server benchmarks are, unfortunately for consumers, a great example of using a test to find specific results. It's extremely easy to say that a server is capable of serving 100 million static HTML pages in 24 hours, but look a little further at the results.
Firstly, very, very few sites with a large amount of content use static pages now. The outdated method of writing and updating every page separately has been replaced by dynamic technologies such as ASP and Cold Fusion. These scripting languages have extremely different server loads, and different requirements from serving static pages because of the pre-processing that needs to take place on the server.
It is also extremely important to understand the circumstances of the test, and how the hardware used can be weighted towards performance gains for certain software. Taking a recently used example, the Mindcraft tests used a RAID 0 disk array for the server. RAID 0 systems show significant performance differences depending on the operating system, however, they are not known for stability in situations that require it (such as a web server that needs continuous uptime). Most system administrators prefer to use RAID 5, which displays different characteristics under the tested systems, but is much more stable in real-world applications. The choice of disk array, although not realistic, could heavily weight a benchmark test in favour of a specific system.
The various ways in which benchmarks can be subtly manipulated to produce certain results are numerous, and the practice of using tests in this way are neither limited to certain companies, nor to the software industry. As long as marketing departments are willing to pay for publicity, the onus will remain with the consumer to try to make sense of the information that we are fed. In situations where your choice of hardware is mission-critical, it will always pay to get a second opinion.