Grid Computing - August 02, 2004

Warning: Undefined array key "upload_with_nude_flag" in /www/www_com/htdocs/get_weekly_feature_ad.inc.php on line 69
Introduction

Nearly twenty years ago I was running the development organization of a CAE startup. One of the programmers had a fascination with fractals that were receiving a lot of press at the time. In particular, using certain mathematical formula one could generate some very intriguing pictures. Unfortunately, it took and still takes considerable computing timing to generate these images. Back then the IBM PC AT was our development environment. A very anemic machine by today's standards in terms of CPU horsepower and available memory and disk storage capacity and speed. We were using SCO UNIX as our operating system. This ingenious programmer found a way to distribute the execution of his fractal program in a way that tapped the unused CPU cycles in our network of computer and to generate his pictures.

In general to improve the speed of a single computer one looks to a more powerful CPU, faster memory and disk devices, configurations with greater memory capacity, memory caching and so forth. One can also look to special purpose co-processors to off load the CPU. Twenty years ago, the PC AT had an optional floating point co-processor. At that time supercomputers were typically array processors. They operated (add, subtract, multiple, invert, …) on very large matrices in the same manner that conventional computers operated on integers. In order to use these array processors, a computer program had to be compiled and linked with an appropriate library of routines. This meant that the software vendor and the array processor vendor had to cooperate to provide versions to end users that would leverage the power of the array processor.

The startup I co-founded developed a box full of electronics that included both a computational engine and a graphics processor. We were able to provide interactive display and interrogation of complex images generated by modeling and analysis programs such as shaded and hidden line images of solid objects, contour stress plots, dynamic mode shapes and assembly sequences. While such capabilities were available on expensive engineering workstations, we were the first to provide them in the PC environment with our supercharged computer.

Another approach to improve performance is to employ multiple processors in a single machine. These multiple processors can be used to support multiple impendent program executions or to execute multiple subtasks within a single executing program.

The last approach and the topic of this week's editorial is “grid computing”. Here the multiple processors do not reside on a single machine but are distributed on numerous machines linked by a network.

Grid Computing

According to the Grid Computing Info Centre “Computational Grids enable the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources (such as supercomputers, compute clusters, storage systems, data sources, instruments, people) and presents them as a single, unified resource for solving large-scale compute and data intensive computing applications This idea is analogous to electrical power network (grid) where power generators are distributed, but the users are able to access electric power without bothering about the source of energy and its location.”

Grid computing allows one to unite pools of servers, storage systems, and networks into a single large system so the power of multiple-systems resources can be delivered to a single user point for a specific purpose. To a user or an application, the system appears to be a single, enormous virtual computing system. Virtualization enables companies to balance the supply and demand of computing cycles and resources by providing users with a single, transparent, aggregated source of computing power.

Applications that can benefit from grid computing include computation intensive applications such as simulation and analysis, data intensive applications such as experimental data and image/sensor analysis, and distributed collaboration such as online instrumentation, remote visualization and engineering. High-throughput computing and on demand computing to meet peak resource requirements will also benefit.

Grid computing must deal effectively and efficiently with issues of security, workload management, scheduling, data management and resource management.

Examples of Grid Computing

A well publicized use of distributed computing is connected with SETI, or the Search for Extraterrestrial Intelligence, a scientific effort seeking to determine if there is intelligent life outside Earth. SETI researchers use many methods. One popular method, radio SETI, listens for artificial radio signals coming from other stars. UC Berkeley has the tasks of analyzing vast quantities of radio data from the Arecibo Observatory in Puerto. SETI@home launched in May 1999 is a project that lets anyone with a computer and an Internet connection participate in this effort. Participants download a special screensaver. Every SETI@Home participant receives a "work unit" from the project's lab (about 300 kilobytes of data), which is then processed by the PC whenever that user's machine is idle. Once the SETI@Home screensaver completes its analysis, the client then relays that processed information back to the lab at UC Berkeley. And when the analyzed data is successfully uploaded, the Space Sciences Lab sends out yet another work unit back to the participant's PC so that the process can be repeated.

A second example would be FightAIDS@Home, the first biomedical distributed computing project launched by Entropia. It is now run by the Olson Laboratory at the Scripps Research Institute, and uses idle computer resources to assist fundamental research to discover new drugs, using our growing knowledge of the structural biology of AIDS.

Another example is Genome@home that uses a computer algorithm based on the physical and biochemical rules by which genes and proteins behave, to design new proteins (and hence new genes) that have not been found in nature. By comparing these "virtual genomes" to those found in nature, we can gain a much better understanding of how natural genomes have evolved and how natural genes and proteins work.

Parabon Computation, Inc. is a commercial company that is building a similar distributed computing platform called Frontier using idle time on individual computers. A downloadable compute engine runs like a screen saver on a client machine, processing tasks only when the machine is idle. Results are uploaded to the server and new tasks downloaded. On July 13th 2000, Parabon launched its Compute Against Cancer program, in which it provides compute resources for analyzing the massive quantity of data collected by researchers to several research organizations searching for new and better cancer treatments. Parabon is now recruiting commercial clients. Parabon will pay for cpu usage or payments can be donated to one of their nonprofit partners.

Government Programs related to Grid Computing

Started in 1997, the Partnership for Advanced Computational Infrastructure (PACI) is a program of the NSF's Directorate for Computer and Information Science and Engineering (CISE). PACI is creating the foundation for meeting the expanding need for high-end computation and information technologies required by U.S. academic researchers. PACI partners contribute to the development of the information infrastructure by developing, applying and testing the necessary software, tools, and algorithms that contribute to the further growth of this "national grid" of interconnected high-performance computing systems.

PACI offers more than 22 high-performance computing systems that represent an unprecedented amount of computational resources made available by the NSF. The following are PACI's national partnerships and leading-edge sites: National Computational Science Alliance (Alliance), National Partnership for Advanced Computational Infrastructure (NPACI) and the Pittsburgh Supercomputing Center.

TeraGrid is a multi-year effort to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research. The TeraGrid project was launched by the NSF in August 2001 with $53 million in funding to four sites. By 2004, the TeraGrid will include 20 teraflops of computing power distributed at nine sites, facilities capable of managing and storing nearly 1 petabyte of data, high-resolution visualization environments, and toolkits for grid computing. Four new TeraGrid sites, announced in September 2003, will add more scientific instruments, large datasets, and additional computing power and storage capacity to the system. All the components will be tightly integrated and connected through a network that operates at 40 gigabits per second.

Grid Related Organizations

The Global Grid Forum (GGF) is a community-initiated forum of thousands of individuals from industry and research leading the global standardization effort for grid computing. GGF's primary objectives are to promote and support the development, deployment, and implementation of Grid technologies and applications via the creation and documentation of "best practices" - technical specifications, user experiences, and implementation guidelines.

The GGF has defined the Open Grid Services Architecture (OGSA), a services-oriented architecture. The OGSA standard defines the basics of a grid application structure that can be applied to any grid system, i.e. what grid services are, what they should be capable of, and what technologies they be based on. OGSA, however, does not go into specifics of the technicalities of the specification. The companion implementation standard, the Open Grid Services Infrastructure (OGSI) consists of specifications on how work is managed, distributed, and how service providers and grid services are described. Web services, particularly the Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL), are a major part of this specification. Currently, both OGSA and OGSI are works in progress.

The Globus Alliance conducts research and development to create fundamental technologies behind the "Grid," which lets people share computing power, databases, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. Based at Argonne National Labs, USC, University of Chicago, University of Edinburgh and Swedish Center for Parallel Computers the alliance produces open-source software that is central to science and engineering activities totaling nearly a half-billion dollars internationally and is the substrate for significant Grid products offered by leading IT companies. Sponsors include federal agencies such as DOE, NSF, DARPA, and NASA, along with commercial partners such as IBM and Microsoft.

The Globus Alliance is organized around four main activities: research in areas such as resource management, security, information services, and data management; software tools, large-scale testbeds; and applications.

The project has spurred a revolution in the way science is conducted. High-energy physicists designing the Large Hadron Collider at CERN are developing Globus-based technologies through the European Data Grid, and the U.S. efforts like the Grid Physics Network (GriPhyN) and Particle Physics Data Grid. Other large-scale e-science projects relying on the Globus Toolkit include the Network for Earthquake Engineering and Simulation (NEES), FusionGrid, the Earth Systems Grid, the DOE Science Grid, the NSF Middleware Initiative and its GRIDS Center, and the National Virtual Observatory.

The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory (“Grid3”) that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day.

The Global Alliance has developed the Globus Toolkit, now at version 3, as the first full-scale implementation of the OGSI standard. The toolkit is a community-based, open-architecture, open-source set of services and software libraries designed to support grids and grid applications. The toolkit addresses issues of security, information discovery, resource management, data management, communication, fault detection, and portability. Major toolkit components include:

Grid Resource Allocation and Management (GRAM) protocol provides resource allocation and process creation, monitoring, and management services.

Grid Security Infrastructure (GSI) protocol that provides single sign-on, run-anywhere authentication and communication protection with support for local control.

Monitoring and Discovery Service (MDS) that provides a framework for discovering and accessing system configuration and status information . such as compute server configuration, network status, or the locations of replicated datasets.

The Enterprise Grid Alliance (EGA), an open consortium of leading vendors and customers was launched in April 2004 to accelerate the adoption of grid computing in the enterprise. The initial focus areas include reference models, provisioning, security and accounting. The Alliance will address obstacles that organizations face using enterprise grids, by looking at best practices and solutions that are open and interoperable. The EGA membership includes EMC, Fujitsu Siemens Computers, and NEC; but does not include such grid supporters as IBM, Microsoft and SAP. Both IBM and Microsoft are pursuing their own grid platforms, and are backers of the Global Grid Forum, a group with interests similar to those of EGA.

Platform Computing

I interviewed Peter Nichol, the General Manager of the Electronic Market Initiatives. The Toronto based company was launched in 1992. The company sold and oem'ed its base product LSF (Load Sharing Facility) in several industries. In more recent times it has targeted five primary vertical industries (Government and Research, Financial Services, Industrial Manufacturing, Life Sciences and Electronics) and developed and tuned both product and services for these industries. The company has around 400 employees, annual revenue of about $60 million and 1,600 customers. According to Peter the Electronics industry generates approximately 50% of the firm's revenue and that 16 of the top 20 semiconductor design firms have standardized on LSF for grid computing. Customers include AMD, ARM, ATI, Cisco, NVIDIA, Synopsis and Texas Instruments.

Platform Computing has strategic relationships with IBM, HP, SGI, Dell and SAS, to support emerging computing models such as on demand, utility computing and e-business.

At DAC in June Platform announced availability of Platform LSF Electronics Edition. This included high performance capabilities; integrations with key application tools; premium support and maintenance with customized electronics consulting services; and an electronics-oriented best practices guide. Platform claims seamless integrations for this package with simulation, synthesis and verification applications from leaders such as Altera, Cadence, Denali, IBM, Magma, Mentor, Nassda, Synopsys, and Verisity.

Peter cited NVIDIA as an example of a customer success story. Three years ago they had a cluster of 500 cpu's running 30,000 jobs per day. Today they have 5,000 cpu's a ten-fold increase, and are running 500,000 jobs per day. He also cited Infineon Technologies AG as a success story. Infineon has over 45 R&D sites on three continents and more than 6,500 of its 32,000 employees in this business area. They first implemented Platform LSF eight years ago. Today the company infrastructure comprises 18 Clusters worldwide. LSF is deployed in every Infineon development center worldwide. This amount to several thousand CPUs running on Solaris and Linux systems.

Peter sees widespread adoption of grid computing occurring in three phases. The first is Enterprise Computing with virtual design organizations spread over different time zones and geographies. The need and desire to share resources is obvious. Firms require the flexibility to adjust to peak demands by sharing resources rather than configuring each site individually and independently to meet its own likely peak resource requirements. The second is Partner Grids where separate firms or organizations wish to collaborate on projects. The third phase is Service Grids where grids operate as a utility model.

Platform LSF is software for managing and accelerating batch workload processing for compute-and data-intensive applications. With Platform LSF, users can intelligently schedule and guarantee completion of batch workload across a distributed, virtualized IT environment. Platform LSF fully utilizes all IT resources regardless of operating system, including desktops, servers and mainframes to ensure policy-driven, prioritized service levels for always-on access to resources. Platform LSF is based on the production-proven, open, grid-enabling, Virtual Execution Machine (VEM) architecture.

Platform LSF Analytics uses workload and license management data to provide design centers with analytics support for project planning decisions. Platform LSF Analytics assists engineering managers to estimate project completion times, provision hardware, charge back cost, forecast usage and so forth. Out-of-the-box reporting capabilities on system metrics include workload, hardware performance, cluster performance, software license usage, and resource consumption.

Platform LSF MultiCluster extends an organization's reach to share virtualized resources beyond a single Platform LSF cluster to span geographical locations. With Platform LSF MultiCluster, local ownership and control is maintained ensuring priority access to any local cluster while providing global access across an enterprise grid.

Platform LSF License Scheduler optimizes the usage of all application licenses across Platform LSF clusters by allocating a virtualized pool of licenses to users based on an organization's established distribution policy.

From Sun Microsystem: “In the past, high-performance computing (HPC) was primarily the domain of supercomputers - dedicated, specialized number-crunching machines housed in a special environment. It was a monolithic system, complete with data storage, the compute power required to manipulate that data, and a way to extract the results of complex computations. ... Today, HPTC employs an architecture comprised of several tiers that replaces the supercomputers and workstations of yesteryears and brings an abundance of compute power to mainstream commercial markets. ... Broadly speaking, the current HPTC environment is comprised of three building blocks - massive data storage and retrieval, fast throughput computation, and high-end rendering of results through visualization techniques. Now HPTC applications are being powered by a host of heterogeneous resources of varied configurations - large and small - scattered throughout a network and pooled together through virtualization for maximum utilization.

Essentially, grid computing is at the heart of today's HPTC environment. It makes data, compute, and visualization resources available to HPTC applications - wherever and whenever they're needed.”

Sun claims Mentor Graphics, Synopsys and Motorola as customers of its Grid Everywhere Initiative announced in November 2003.

Fractals

The word fractal was coined by Benoit Mandelbrot in 1975. Fractals are geometric objects whose shape is irregular or fractured. The object appears to be the same at all scales of magnitude (self-similarity), i.e. looks the same no matter how far you zoom in. The objects reveal increasing details when magnified. The size of the object in terms of perimeter and area is indeterminate. Lastly, fractals are generated by a recursive or iterative process.

Mandelbrot's papers examined the issue of “How long is the coastline of Great Britain” raised by Lewis Richardson. The answer depends on the scale of measurement. The finer the instrument used to measure the coastline, the larger the answer. Examples of fractals are presented below.

Koch Snowflake: Start with a large equilateral triangle. Divide one side of the triangle into three equal parts and remove the middle section. Replace it with two lines the same length as the section you removed. Do this to all three sides of the triangle. You now have a star.

Repeat again and again.

Koch Snowflake

Sierpinski's Triangle starts as a triangle and every new iteration of it creates a triangle with the midpoints of the other triangles of it. The Sierpinski's Triangle has an infinite number of triangles in it.

The Mandelbrot set is a frcatal that is defined as the set of points c in the complex number (x+iy) plane for which the iteratively defined sequence

Mandelbrot Set

Intel

Last week I wrote an editorial on Intel. In it I recounted that the firm had some recent problems of missed schedules and design problems (company recalled some of its 915 G/P and 925X chipsets because of a flaw in the I/O controller that prevented some computers from starting normally at a cost of $34 million). On July 21st in an internal memo posted on the company's intranet CEO Craig Barrett wrote in part “There are many reasons for these product delays and manufacturing issues. In the end, the reasons don't matter because the result is less-satisfied customers and a less-successful Intel." and “Therefore, it is critical that everyone - beginning with senior management but extending to all of you - focus intensity on actions and attitudes that will continue Intel's strong record of technology leadership and customer satisfaction”

Separately, on July 28 AMD announced the availability of its Sempron processors, a new family of processors that redefine everyday computing for value-conscious buyers of desktop and notebook PCs. On average, this market can be characterized as users of desktop systems below $549 and notebook PCs below $999. The Semprons will all cost less than $126 and therefore less than the AMD Athlon XP and, in many cases, sell for less than Intel's low-price Celeron PC processor. Although the Sempron name (derived from the Latin word “semper,” for always) is new, the technology behind the chips is proven. Desktop Semprons will use the same basic underpinnings as desktop Athlon XP processors, while notebook Semprons will be based on mobile Athlon 64 processors but lack certain features, such as 64-bit addressing. All of the Sempron chips will run at slower clock speeds and have less cache than Athlon XP or mobile Athlon 64 chips.

Lenovo Group Ltd. is currently offering AMD Sempron processor-based desktop systems in China. Acer, Medion, Twinhead, HP and Compaq and many more OEMs and system builders are also expected to offer Sempron processor-based notebook and desktop systems in the second half of this year.

This gives AMD two brands: Athlon for the performance conscious market and Sempron for the value conscious market. The Athlon XP will likely be phased out. These brands will go head to head with Intel Pentium and Celeron. Intel commands nearly 83% of the PC microprocessor market with AMD a distant second at ~15%. However, AMD has been making inroads with its 64-bit chips that support 32-bit applications as discussed in earlier editorials.

Weekly Highlights

Accelerated Technology's Nucleus RTOS and Development Tools Now Available for Networking Developers Using Freescale Semiconductor's ColdFire Processors

SMIC Adopts Synopsys' Proteus OPC Software for 130 Nanometer Node

NanoCool Low Power Design Seminar Series Kicks off in San Jose; Sequence, Artisan, Sun, Novas, Golden Gate, Tensilica to Provide Blueprint for Defeating Design Enemy #1

AWR To Develop Design Platform for TSMC Silicon Germanium Process

Gartner Says Second Quarter Worldwide DRAM Revenue Increased 20 Percent from First Quarter of 2004; Hynix Semiconductor Moved Into the No. 2 Vendor Ranking

WJ Communications Introduces Seven Intermediate Power Amps; Ideal for Next Generation and Multi-carrier 3G Base Stations

Cypress Samples Industry's Highest-Performance NSEs With Dual LA-1 Interfaces; 9-Mbit and 18-Mbit NSEs Enable Most Scalable Search Solution for IPV6 Applications

TI and iBiquity Introduce Industry's Lowest Cost Single-Chip AM/FM and HD Radio Baseband

More EDA in the News and More IP & SoC News

Upcoming Events...

--Contributing Editors can be reached by clicking here.

You can find the full EDACafe.com event calendar here.

To read more news, click here.

-- Jack Horgan, EDACafe.com Contributing Editor.