Microsoft Vista Software | Microsoft Vista Books | Linux Books | Ubuntu Books | Ruby On Rails Books

Thursday, August 31, 2006

TORQUE Reaches Milestone - 50,000 Downloads in One Year

Cluster Resources, Inc. announced today that TORQUE* Resource Manager passed a new milestone in its continued success, reaching 50,000 downloads since August 2005.

Cluster Resources, Inc., a leading provider of cluster, grid, and utility computing software, announced today that TORQUE* Resource Manager passed a new milestone in its continued success, reaching 50,000 downloads since August 2005.

Terascale Open-Source Resource and QUEue Manager, more commonly known as TORQUE, is a resource manager derived from the original Open PBS project that provides control over batch jobs and distributed compute nodes. With more than 2,500 patches and enhancements since its release in 2004, TORQUE has incorporated significant advances in the areas of scalability, flexibility, and feature extensions.

"We are pleased to be part of the TORQUE community project that continues to provide a leading resource management solution for Top 500 systems and thousands of other clusters worldwide," said David Jackson, CTO of Cluster Resources. "Combining professional development, testing, support, and documentation with the extensive support and development contributions of the TORQUE community, has proven to be highly successful."

Over the past six months, TORQUE downloads have grown almost exponentially. Cluster Resources recorded approximately 33,000 total downloads from February through July, nearly double the previous six-month's total. TORQUE is also included for download in most of the major cluster building kits, including ROCKS, OSCAR, xCAT, and others. When including these kits, downloads over the past year are estimated at more than 100,000.

Cluster Resources - providers of the Moab family of workload management products - professionally maintains and develops TORQUE, incorporating hundreds of feature extension patches from NCSA, OSC, USC, U.S. Department of Energy, Sandia, PNNL, U of Buffalo, Teragrid and many other leading HPC institutions and individuals in the user community. Cluster Resources also supports and maintains the TORQUE documentation and user lists, and provides current versions and patches at http://www.clusterresources.com/torque

Through TORQUE's user lists and documentation WIKI, community members are able to submit or view patches, suggestions, or questions in the archive, and contribute new information to the user manual, providing an active forum for the development of new ideas.

Garrick Staples, a lead developer of TORQUE, attributes the continued development of TORQUE to the collaborative efforts of the user community and Cluster Resources.

"Since many TORQUE users are the administrators of their own clusters, their needs often drive the competitive edge of our development focus," Staples said. "TORQUE also has the strong backing of Cluster Resources, through whose leadership, the collective wisdom and requirements of thousands of sites worldwide are being plugged directly into TORQUE."

To facilitate community involvement in TORQUE development, Cluster Resources recently began the use of Subversion - an open-source version control system dedicated to source configuration management. By integrating anonymous checkout through Subversion, users can more easily access the TORQUE source code to test and implement their own improvements.

"We take suggestions for improvements seriously," said Josh Butikofer, a product manager for Cluster Resources. "Whenever users make suggestions and improvements, or request that TORQUE be able to perform an additional task, the community works together to try and find a way to make it happen."

Before making any changes to the original source code, Cluster Resources tests the relevant submission enhancements in order to ensure the continued reliability and functionality of TORQUE.

Cluster Resources focuses on enabling long-term core enhancements to TORQUE in the areas of scalability, security, reliability and usability. In recent versions of TORQUE, Cluster Resources has implemented a number of significant feature enhancements including tight PAM integration, improved SGI CPUSet support, initial job array support, and dynamic resource definitions. The newest version of TORQUE, 2.1.2, offers X11 forwarding and also supports client-commands on Windows using Cygwin.

TORQUE is currently in use at hundreds of leading government, academic, and commercial sites throughout the world and is used on many of the world's largest clusters and grids. TORQUE scales from single SMP machines and clusters to sites with tens of thousands of jobs and nearly 10,000 processors.

"We look forward to continuing the same level of development excellence that has led to TORQUE's success." David Jackson said. "We welcome cluster users everywhere to try out TORQUE, to get involved and to help cultivate the type of ideas that have produced one of the best community resource management solutions in the HPC industry."

What is TORQUE?

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations. This version may be freely modified and redistributed subject to the constraints of the included license.

TORQUE can integrate with Moab Workload Manager® to improve overall utilization, scheduling and administration on a cluster. Customers who purchase Moab Workload Manager also receive free support for TORQUE.

Feature Set
TORQUE provides enhancements over standard OpenPBS in the following areas:

Fault Tolerance
Additional failure conditions checked/handled
Node health check script support
Scheduling Interface
Extended query interface providing the scheduler with additional and more accurate information
Extended control interface allowing the scheduler increased control over job behavior and attributes
Allows the collection of statistics for completed jobs
Scalability
Significantly improved server to MOM communication model
Ability to handle larger clusters (over 15 TF/2,500 processors)
Ability to handle larger jobs (over 2000 processors)
Ability to support larger server messages
Usability
Extensive logging additions
More human readable logging (ie no more 'error 15038 on command 42')

No comments: