As with most things in life, computing resources are finite and there is a limit to how much can be done in any given period of time by any system. The key to understanding performance issues on servers is to know:
Some things are easier to rectify than others, and knowing when and how to upgrade or replace systems or components is important to ensure that any investments that are made will actually solve the problem that is being experienced.
Unfortunately, there is no single solution that will address all performance issues. It depends on several factors including the application mix, numbers of users, the hardware itself and external factors such as network topology. Of the many parts of a server, there are four key elements that in practice tend to influence the performance of the system as a whole:
Each of these will have their limits and the overall performance of the server will be determined by which of them is exhausted first. Table 1 below shows typical situations in which each resource type is likely to be in high demand and factors that might cause issues that are unlikely to be the result simply of under-specified hardware.
Windows Server Performance Monitoring
Table 1 – Server Resource Demands
| Resource Type | High demand usage pattern | Potential design and implementation issues |
|---|---|---|
| Processor | Mathematical computation, modelling, simulation | Poor application coding, inefficient algorithms |
| Memory | Heavy application load, high number of users | Too many applications sharing a server |
| Disk I/O | Large server databases, frequent copies between physical volumes | Poorly indexed databases, overloaded I/O channels, backups over-running |
| Network I/O | Streaming media, heavy file sharing load, file-based databases (e.g., Access) | Network topology that causes bottlenecks, poor network segmentation, can indicate malware infection |
So, enough of the generalities. Where should you look and for what should you be looking in each case?
Processor
This one’s easy, right? Open up performance monitor and look at the CPU utilization. Simple! Well, in reality certainly prolonged periods of high utilization are a bad thing but periods of 100% CPU utilization are quite normal and simply mean that the applications are making full use of the capacity of the system. Monitoring the CPU utilization is useful but even at 100% the system can still be responsive as long as the applications are well-written. In any multi-tasking, operating system applications get allocations of processor time in turn and as long as the system can switch between them quickly enough it will still give acceptable performance.
When performance really starts to be impacted is if applications are queuing for processor time and being held up by one another. Fortunately modern versions of Windows are much better at scheduling applications but there are limits to what can be achieved. For that reason one of the most useful parameters to monitor is the processor queue length. If the operating system is struggling to balance the demands of multiple applications the applications will be queued for access to the processor and the queue length will go up. As a general rule, if the processor queue length exceeds twice the number of CPUs and/or processor cores in the system, then performance will drop off dramatically.
Windows Server Performance Monitoring
If it is shown that the processor capacity in a system is the underlying problem, the options are fairly straightforward:
One thing to note is that the difference between having one processor and more than one can be very significant. In a multi-tasking operating system, such as all modern versions of Windows, the OS will schedule tasks for access to the processor and if there is only one processor in the system a single application can hog the processor despite the best efforts of the OS to schedule other tasks. With two or more processors the OS has more flexibility in allocating tasks so even if the utilization is very high the system can be more responsive.
Memory
Modern operating systems all use virtual memory to manage applications and data being processed by the system. This is a good thing in that it prevents the system from stopping if there is insufficient physical memory available – unfortunately the fact that virtual memory is based on using disk space as a substitute for physical memory means that if virtual memory is being used to a significant extent then system performance will be impacted dramatically.
Again, it might appear simple to know when a shortage of memory is proving to be a problem – look at the memory allocated and compare it to the physical memory in the machine; if it is higher, then add more memory. However, as with CPU utilization, it is a little more complicated. For example, most versions of Microsoft Exchange will allocate as much memory as they can to maximize throughput and responsiveness (through the store.exe process). If another application requires more memory then store.exe is supposed to release memory to avoid the need to swap memory to disk. Hence in most servers running Exchange it will almost always appear that all physical memory is allocated all the time.
Note: Adding more memory in this situation will make little or no difference to the performance of the server.
Significant over-allocation of memory may be an indication of a problem but the key question is how much of that memory is in active use. If an application has requested a memory allocation but is almost completely idle then the memory will be swapped out to disk and does not have to be recalled, so performance will not be overly affected. Therefore, the other key parameter that is useful to monitor in relation to memory is the page fault rate, the frequency with which the operating system has to move some data from physical memory to disk storage in order to recall other data that is required in memory at that time. Every time this happens the system has to wait
Windows Server Performance Monitoring
until the swap has completed before it can carry on with the next processing task and performance will be much slower.
The solution to a memory problem is simple, at least up to a point. Adding more memory is often the simplest and cheapest way to boost the performance of a system but with 32-bit versions of Windows the maximum addressable physical memory is 4GB, of which (by default) half is reserved for the system address space, leaving just 2GB for applications. Adding more memory beyond 4GB will not bring about any further improvement, so alternative approaches such as splitting application load across multiple servers must be employed. Of course, 64-bit versions of Windows don’t have this problem, being able to address 16TB directly.
Disk I/O
Hard drives are one of the slowest components in a server as they rely on the physical movement of heads over platters to access the positions at which data is to be read or written. Like most other components of a multi-tasking system, different applications can access disks simultaneously and it is up to the operating system to manage the requests for access to disk resources. As with other shared resources, the operating system maintains a queue for these requests and handles them in sequence, routing the data between the disk controller and the applications requesting disk access.
The raw throughput of a disk drive or array and the associated disk controller(s) clearly have a significant effect on performance but these do not change as the system is used, so it is more useful to look at the amount of time that the disks spend servicing requests, reflected in the percentage disk time. If this value is consistently high it will usually indicate that the disk system is working flat out to process all the data transfers that are being requested.
Another useful indicator of disk performance limiting overall system performance is to look at the read and write queue lengths. As with processor queues, if the counters start to show values significantly in excess of the number of devices (in this case, drives) in the system, this indicates that the system is not able to process requests as fast as the system is making them and therefore applications are likely to be held up waiting for data to be delivered from the disks.
The solution to disk bottlenecks will depend on the underlying problem.
Network
Identifying network bottlenecks is generally fairly straightforward – any network link will have a finite amount of bandwidth and the higher the proportion of this that is used up the more applications will be slowed up in communicating with other devices on the network. The actual value of utilization that will indicate a serious problem will depend on the network topology – with switched networks the utilization level can be significantly higher than can be sustained in shared networks, so whereas anything over about 30-40% in a shared network will indicate a problem, that would not be the case in a switched environment.
Modern PCs and servers have sufficient performance that they can saturate a network link quite quickly if they are performing sustained network activity such as large file copies. For each individual PC, the impact is limited to that PC but of course a server has to service requests from all the other machines on a network, so it is a point of concentration and if there are several client machines making heavy network I/O demands on the server, these must all be channeled through the server’s NIC and the switch port into which it is connected.
There are a number of ways to overcome network utilization issues including installing higher throughput components, such as Gigabit Ethernet devices; network reconfiguration, to use multiple network segments to divide up the traffic across multiple network interfaces; and teaming of network interfaces, to utilize more than one physical network port on a single subnet. Which solution is practical and appropriate will require a detailed understanding of the current network topology and application environment.
Summary
In conclusion, there are a number of parameters that can affect the overall performance of a system and understanding performance issues and how to solve them depends on analyzing the underlying root cause and identifying the most effective solution to overcome that. While there are a great number of performance metrics that can be used, in reality a relatively small number of them can be used to get a view of how a machine is performing and which of its components is becoming overloaded. In most systems there will be spikes of high utilization on all of the parameters but performance will only be impacted noticeably when one starts to show signs of consistent overloading. The use of effective tools that monitor performance metrics over a period of time and can show when these signs are developing is a valuable addition to the network monitoring process.
For more information about managing Windows Server Performance, Contact us today!