|
|
An examination of the dual-core capability of the HP
xw9300 Workstation. By employing single- and
dual-core AMD Opteron™ processor technology, users benefit from multiple
processing power options in a high-end, ultra-high-performance personal
workstation. |
|
|
|
Introduction................................................................................................................................. 2
Single vs. Multi-core Technology...................................................................................................... 3
Introduction.............................................................................................................................. 3
Overview of Dual-core Technology................................................................................................ 3
Multitasking and Multithreading.................................................................................................... 4
Applications Environment............................................................................................................ 6
Applications Licensing................................................................................................................ 6
Performance Comparison............................................................................................................... 7
Application benchmarks.............................................................................................................. 7
Ansys.................................................................................................................................. 7
3D Studio Max...................................................................................................................... 7
GeoProbe/SeisWorks............................................................................................................ 8
Performance Summary............................................................................................................... 9
Conclusion.................................................................................................................................. 9
For more information................................................................................................................... 10
Performance requirements for applications in
the major market segments for personal workstations continue to grow. In fact,
there is a “leapfrog” phenomenon at work—the more powerful the hardware
becomes, the larger the problems that can be solved, and the more functionality
software vendors add to applications. As users employ these larger problem
sizes and increased amounts of functionality, workstation technology scrambles
to increase performance, and the cycle repeats itself.
The two most common methods of increasing
performance are: (a) increase the clock speed of the system’s processor(s),
and/or (b) increasing the number of processors. (This excludes changing the
underlying processor architecture, e.g., pipeline lengths, etc. Here, the
discussion focuses on increasing performance within a specific architecture).
Increasing performance through higher clock speeds is often impractical;
increasing performance by increasing the number of processors often works quite
well—especially in multitasked and multithreaded environments (see
below).
The latest high-end workstation from HP, the
HP xw9300 (Figure
1), leverages dual-core
technology to double the number of processors (from two to four) available in
the same physical enclosure. Doubling the number of processors provides
ultra-high-end levels of performance for scientists, engineers, designers and
digital artists who have extremely complex analyses and/or advanced visualization
requirements.
Because different applications take advantage
of multiple processors differently, it is useful to examine how multiple
processors are used and where the benefit of their use lies. This paper
discusses uses of dual-core technology, the different processor options
available on the HP xw9300, and offers suggestions to help customers select the
appropriate processor configuration.
Figure 1. The current line of HP personal
workstations and the position of the HP xw9300.

The availability of a dual-core processor has
a simple implication—there are more processors available to do work in a
given environment, hopefully translating into more performance. However, one of
the first difficulties we encounter is determining what is really meant by
“performance.” For example, the popular standard benchmark SPECfp2000 from the
SPEC Corporation is often used to determine the relative floating-point
performance of a system. Assuming the benchmark has not been parallelized (see
below), a multiprocessor system would record the same results on the SPECfp2000
benchmark regardless of whether it had one, or one hundred, processors.
Another standard benchmark, the
SPECfp_rate2000, runs multiple copies of the SPECfp2000 benchmark, and provides
results based on the total throughput (number of jobs) that a system is capable
of executing in a fixed amount of time. In this case, the more processors the
better the result. Therefore, “performance” can depend on both how fast a
single job completes as well as how quickly many jobs complete.
More will be discussed about performance in
the “Applications” and “Performance” sections below. For now, keep in mind that
performance is nearly always a combination of multiprocessor throughput and
single application performance. Generally, we characterize performance as
primarily multiprocessor throughput performance (i.e., aggregate system
performance).
Dual-core technology is a design whereby more
than one processor core is placed on a die and, in general, using the same
package as a single core processor (Figure
2). In fact, the first
dual-core processors produced by AMD are socket-compatible with appropriate
compatible single core processors[1].
Using dual-core processors provide the best performance per watt, and allows HP
to provide more aggregate performance in the same workstation enclosure as is
used by single core processors.
Figure 2. Dual-core technology places two processor cores on a
single die.

Successfully employing multiple processors to
increase performance always involves splitting work up across the system’s
processors, whether these pieces of work are processes (jobs) or threads (portions of a single process). The former is called
multitasking (or multiprocessing[2]);
the latter is called multithreading. Figure
3 illustrates multitasking—note that three tasks
may use multiple processors to get more work done in a shorter amount of time.
An example of multitasking might be a routine
set of operations in a Digital Content Creation (DCC) environment. In a typical
video editing workflow, the artist must render segments of video, compress
video streams, and capture video from an external source, all of which are
compute- and I/O-intensive applications. In a multitasking environment, all of
these processes may be executing at the same time; each process is scheduled
for some amount of time on the available processors in the system. The result
is an overall reduction in the amount of time to complete the entire set of
tasks.
Figure 3. Overall system performance (throughput) can be
increased through multitasking.

In
the DCC workflow example, rendering of video frames is a highly parallelizable
operation. This is because video frames in a video segment are largely
independent, and the rendering of each frame, or groups of frames, can be
distributed across threads of the rendering application. Thus, employing
multiple processors increases the performance (reduces the time-to-result) of
the rendering application.
As a
general example, some compilers (HP Fortran and C for example) automatically
identify parallelism in a program, and generate code that is thread-level
parallelized. Additionally, industry-standard programming interfaces (APIs) and
preprocessors are available that allow programmers to explicitly parallelize applications.
Figure 4.
Performance of a single application can be accelerating by multithreading
(parallelism).

Modern
operating systems do some of each (multitasking and multithreading)—for
example, the UNIX operating system itself is multithreaded (so parts of the
operating system can run on multiple processors), however, UNIX also supports
multiprocessing (to enable multiple jobs to run simultaneously if multiple
processors are available). As shown in Figure 5, a multitasking/multithreaded
environment allows multiple threads of multiple tasks to be scheduled to run on
the available processors.
Figure
5. System and application performance can be increased through
both multitasking and multithreading.

In
Figure 5, tasks 1, 2, and 3 are broken into multiple threads (represented by
different shades of the task’s color); these threads are then scheduled across
the processors. For simplicity, this example assumes the overhead of
parallelization is minimal; in reality there may be substantial overhead in
multithreading. This is another reason why performance of multithreaded
applications is difficult to predict.
A critical criterion in deciding what
processor configuration to purchase is performance with a user’s key
applications. Users are strongly encouraged to ask the vendor of their software
applications about performance on multiprocessor systems, as the amount of
performance improvement is highly varied. Figure
6 below provides an
overview of some of the market segments for which the HP xw9300 is primarily
suited[3], as
well as an indication of the applicability of a dual-core processor
configuration to that segment.
Figure 6. General applicability of different application segments
to dual-core processor technology.
|
Application
Segment |
General
Characteristics |
Applicability to
dual-core processors |
|
Mechanical Computer-Aided Engineering
(MCAE) |
MCAE applications require high processor
performance, and many are designed to run on multiprocessor systems. |
High |
|
Digital Content Creation (DCC) |
The DCC applications that demand the most
of a workstation processor are generally complex animations, rendering, and
physics systems as well as video effects. Most DCC applications are
multithreaded. Rendering is a highly parallelizable operation and benefits
from multiple processors. |
High |
|
Scientific Research |
The Scientific Research market segment is
kind of a “catch-all” market. However, many of the software developers in
this market segment, such as imaging and life and material sciences, employ
parallelism (both thread- and process-level). |
High |
|
Oil and Gas |
The Oil and Gas industries make heavy use
of both 32- and 64-bit applications, and many of these applications are
designed to use parallelism to increase performance[4]. |
High |
An additional important performance
consideration is that of response time to the workstation user. All of the
popular operating systems today are multi-threaded to some degree.
Multithreading allows multiple operating system functions (e.g., file system
access, window management, printing functions) to be carried out
simultaneously, and systems with multiple processors will generally provide
better response time to the workstation user.
Another important issue when comparing
single- to multi-core processor technology is that of applications licensing.
Some software vendors license applications by the computer system, some by the
processor and some by the core. It is prudent to check with your independent
software vendor (ISV) before making a decision, since customers who use
software from vendors that license by individual-core may face increased
software costs when upgrading to multi-core processor systems.
AMD recommends that software developers
license their software by socket and schedule threads by available cores[5]. At
least one major software vendor, Microsoft, has announced licensing based on
the number of processor chips, regardless of the number of cores on the chip[6].
Thus, Microsoft operating systems and applications will install and run on
multi-core systems just as they do on current single-core systems.
Many benchmarks are available that attempt to
predict the performance of an application on a specific platform. Users are
strongly advised to test individual applications on specific architectures,
especially since dual-core technology is very sensitive to an application’s use
of multiple processors.
As described in the section “Single vs.
Multi-core Technology above, performance generally reflects some mixture of
multi-threading and multi-tasking. Workstation users will nearly always benefit
from multiple processors, if for no other reason multiple processors allow
multiple operating system tasks to be executing simultaneously. By having
multiple tasks executing concurrently, the operating system can respond more
quickly to interactive requests, and/or provide more resources to application
requirements. To assess performance of specific applications, several
benchmarks are shown below.
The first, Figure
7, illustrates performance
on a suite of twenty-six engineering simulation programs using the Ansys 9.0
application[7]. For
each processor configuration, the total runtime of all twenty-six benchmarks is
shown (thus, the smaller the result the better the performance). Since Ansys
has been optimized with multiprocessor configurations in mind, it benefits
quite well from the multiprocessor, and specifically dual-core, configuration
of the xw9300 workstation.
Figure 7. Comparison of performance using Ansys 9.0 standard
benchmark suite[8].

The second application benchmark, shown in Figure
8, is based on the SPECapc
(Application Performance Characterization) suite[9] and
measures performance based on the workload of a typical workstation user using
the application 3D Studio Max (v6)[10]. The
3D Studio Max benchmark includes functions such as wireframe modeling, shading,
texturing, lighting, blending, inverse kinematics, object creation and
manipulation, editing, scene creation, particle tracing, animation and
rendering.
The total number of seconds to run each test
is normalized based on a reference machine, and a composite score is computed.
Composite scores are reported for both rendering and interactive tests. An
overall composite score is also reported.
As shown in Figure
8, the rendering activity
benefits greatly from the dual-core implementation on the xw9300—this is
because the rendering algorithms are quite parallelizable and the 3D Studio Max
application implements them in a multi-threaded environment. The interactive
component, dominated by graphics and single-threaded portions of the
application, do not benefit as much from multiple processors.
Figure 8. Comparison of performance for different workloads using
the 3D Studio Max application.

Another
benchmark that reflects increased performance through dual-core technology is
that of the GeoProbe/SeisWorks applications from Landmark Graphics[11]. The applications allow interpreters to
simultaneously view multi-attribute/multi-volume seismic data, well data,
cultural data, and reservoir models. The benchmarks show two different methods
of processing, and are operating on fairly large (10 GByte) data sets. The
applications are designed to automatically take advantage of multiple
processors/cores if they are present in the system on which the application is
executing.
Figure 9. Comparison of
performance for different configurations of the xw9300 workstation on the
Landmarks Graphics applications GeoProbe and SeisWorks.

As would be expected, users in a multitasking
environment or with applications that employ multitasking or multitasking will
benefit from dual-core configurations; those that are not so structured will
not. As we have seen, many off-the-shelf applications are of this nature, and
even operating systems such as Microsoft Windows and Linux are able to use
multiple processors in day-to-day activities. Nonetheless, it is prudent for
users to check with their software vendor to determine the benefit of dual-core
technology for a specific application.
The introduction of multi-core processors
promises to bring higher levels of performance to workstation users. Users
benefit from reduced response times, faster job turnaround, and the ability to
perform multiple tasks simultaneously. As more applications are specifically
architected for multiprocessor systems, users will benefit from reduced
turn-around time on compute-intense applications.
For scientists, engineers, and content
creators that require the highest levels of performance, the HP xw9300 can
provide that performance. HP’s close partnership with independent software
vendors ensures that applications are designed for optimal performance on
critical applications, including using multiple processors to enhance
performance. Further, HP expertise with multiprocessor technology and 64-bit
operating systems combine to deliver strong problem-solving performance, high
reliability, and extreme graphics capabilities. Users that are considering
acquiring high-end levels of performance with superior price/performance are
urged to evaluate the HP xw9300 workstation.
http://www.hp.com/workstations/
HP’s
personal workstations home page.
http://www.hp.com/workstations/pws/xw9300/index.html
The
HP xw9300 personal workstation specifications.
http://www.amd.com/us-en/Processors/ProductInformation
AMD
Opteron™ processor product page.
http://multi-core.amd.com/
AMD
multi-core processor technology home page.
http://www.microsoft.com/licensing/highlights/multi-core.mspx
A
statement from Microsoft on licensing issues related to multi-core processors.
© 2005 Hewlett-Packard Development
Company, L.P. The information contained herein is subject to change without
notice. The only warranties for HP products and services are set forth in
the express warranty statements accompanying such products and services.
Nothing herein should be construed as constituting an additional warranty.
HP shall not be liable for technical or editorial errors or omissions
contained herein. Itanium is a trademark or registered
trademark of Intel Corporation in the U.S. and other countries and is used
under license. AMD, the AMD Arrow logo, AMD Opteron, combinations thereof,
are trademarks of Advanced Micro Devices, Inc. XXXX-XXXXENW, 06/2005

[1]http://multi-core.amd.com/Products/
[2] The terms
multitasking and multiprocessing are used synonymously in this paper
[3] The xw9300
workstation is well suited for wide variety of applications; for brevity, we
present the most common.
[4] Some use
thread-level parallelism, others use process-level parallelism.
[5] See
http://multi-core.amd.com/Technology/SoftwareLicensing/
[6] See
http://www.microsoft.com/licensing/highlights/multi-core.mspx
[7] Please see
http://www.ansys.com
[8] HP notation
for number of processors/cores is “nP/mC,” where n=total number of processor
modules, and m= total number of cores.
[9] Please see
http://www.spec.org/gpc/apc.static/max6info.html
[10] Please see
http://www4.discreet.com/3dsmax/
[11] See
http://www.lgc.com/