CONTACTS
|
Workshops - Wedesday, October 15
Presentations slides are now being made available within the abstract by clicking on each link.
Workshop on Programming Models for Modern Architectures
Los Alamos Computer Science Symposium, October 14-15, Santa Fe, New Mexico
Workshop Chairs: Ben Bergen, Los Alamos National Laboratory
Pat McCormick, Los Alamos National Laboratory
Computer architecture development is currently undergoing something of a Cambrian explosion, with many different approaches competing in an effort to maintain performance advances relative to transistor count and Moore's Law. One thing that is clear is that developers of algorithms and numerical methods will need to identify greater concurrency to efficiently utilize the computing resources of the future. It also seems clear that, at least, for the time being, and in the absence of a clear winner, developers will need to be able to adapt their codes to run on multiple, and likely, very different, architectures. Consequently, researchers in all areas of scientific computing are struggling to design strategies that can address these issues, while still offering extensibility and preserving the intellectual and financial investments that go into developing scientific codes.
This workshop will continue the dialog that has been developing over the past several years in the HPC community. Some specific questions that we will attempt to address are:
- Where are current architectural trends converging? How can we exploit this convergence to increase code portability? How can we attempt to future-proof our codes?
- How do we preserve the substantial intellectual and financial investment represented by our existing legacy codes?
- How do we enable current and future simulation-based discoveries without forcing scientists to become experts in computer science?
This will be a full-day workshop, with invited speakers followed by an informal gathering where beer, wine, and light appetizers will be served to stimulate open dialog between the participants. Please join us for a lively discussion of the ideas that will help see us through this time of change.
7:30 - 8:30 |
Continental Breakfast |
8:30 - 9:00 |
Mattan Erez
Parallelism isn't Enough: An Architect's Perspective on Building and Programming
Terascale Processors and Petascale Systems
Abstract:
While Moore's Law is still going strong, device and circuit designers cannot maintain traditional CMOS VLSI scaling trends. As a result, architects, programmers, and
algorithm developers must work together to increase efficiency and continue to deliver
better and faster solutions. This talk will focus on the architecture perspective and
implications on programming and algorithms with emphasis on future designs and potential
long-term performance portability. I will describe the characteristics of modern VLSI
processes and future projections and introduce architectural techniques, based on
fundamental principles of locality, parallelism, and hierarchical control (LPH) to improve
performance and efficiency. I will also discuss current trends in current processor
architecture and argue why I believe we are converging to
two extremes in execution models: threading and bulk-streaming. Based on this
observation I will advocate a hybrid bulk/threading architecture model and explain
how it can lead to scalable processors and systems while simplifying writing, compiling,
and maintaining performance-portable codes.
Presentation Slides (pdf)
Biography:
Mattan Erez is an Assistant Professor in the Department of Electrical and Computer
Engineering at the University of Texas at Austin. Mattan received a B.Sc. in Electrical
Engineering and a B.A. in Physics from the Technion, Israel Institute of Technology in
1999. He subsequently received his M.S and Ph.D. in Electrical Engineering from Stanford
University in 2002 and 2007 respectively. His experience includes working as a computer
architect in the Israeli Processor Architecture Research team, Intel Corporation. As a
Ph.D. candidate at Stanford University he participated in the Smart Memories project and
was the student leader of the Merrimac Stanford Streaming Supercomputer
project, where his work spanned the entire system from Stream Processor
microarchitecture and architecture to the programming model and applications, including
the Brook and Sequoia systems. At UT Austin, his research focus in computer
architecture is on the critical aspects of locality, parallelism, and efficient control
hierarchies and on improving the cooperation between the hardware, compiler, and
programmer.
|
9:00 - 9:30 |
Gustavo Espinosa
Larrabee: A Many-Core x86 Architecture for High Performance Computing
Abstract:
This talk presents the Larrabee architecture, a many-core hardware architecture and programming model for visual and other high performance computing applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as fixed-function co-processors. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads and greatly increases the flexibility and programmability of the architecture compared to other data-parallel processor architectures.
Biography:
Gus has been an Intel employee for over 15 years. He is currently leading the architecture development of the Larrabee1 processor, the first product of a high-end, many-core chip architecture targeted at graphics and other high-performance computing applications. He has worked on all of Intel's major microprocessor designs since the
486 generation and served as chief architect of several Intel(r) Pentium(r) III and Intel(r) Pentium(r) 4 processors. His areas of expertise are in computer architecture, processor microarchitecture, microcode, and performance analysis. Prior to Intel, Gus worked as a hardware engineer at Data General Corporation designing minicomputer systems.
Gus holds a Bachelor of Science degree in Electrical Engineering from Cornell University and a Master of Science degree in Computer Engineering from Boston University. |
9:30 - 10:00 |
Richard Barrett
HPCS Languages : Potential for Scientific Computing
Abstract:
Three new programming languages, initiated under the DARPA HPCS program, are being developed with the goal of easing the burden of creating and
maintaining scientific applications for use on distributed memory, parallel
processing architectures. In this talk I will show how these languages
(Chapel from Cray, X10 from IBM, and Fortress from Sun) can be used to
express key computations found in scientific application program. This will
provide a basis for discussing the potential for achieving the goals set
forth by the HPCS program.
Presentation Slides (pdf)
Biography:
Richard Barrett is a Senior Research Scientist in the National Center for
Computational Sciences at Oak Ridge National Laboratory. His interests span
several areas required for creating effective scientific applications on
current and future highest performance computing platforms. Of special
interest are the use of programming models and languages, such as explicit
message passing, partitioned global address space languages, and developing
languages such as those in the DARPA High Productivity Computer Systems
Program; code development tools; performance modeling, analysis, and
optimization; the solution of large scale linear systems; and the bridge
between research and production computing.
|
10:00 - 10:30 |
Coffee Break |
10:30 - 11:00 |
Joel Morrissette
Abstract:
Recent advances in high-throughput gene sequencing platforms have resulted in a dramatic increase in the volume of sequence data that must be analyzed for a single sequencing run, causing a "data deluge". Highly accurate, computationally-intensive alignment algorithms like Smith-Waterman are no longer able to yield satisfactory performance on even the most current CPU architectures when processing this volume of data. Significant performance improvements can be realized by applying targeted hardware acceleration to the problem, including field-programmable gate arrays (FPGAs). In the past, adopting programmable logic in the HPC environment has presented significant challenges to developers, requiring a specialized set of skills and tools and a monolithic design methodology that resulted in hardware lock-in. We propose a heterogeneous development environment that incorporates QuickAssist-based hardware accelerators, the Intel AAL SDK and OpenCL as a complete vertical solution that partitions the design problem into algorithmic and performance solution spaces, while maintaining portability of the final application through abstraction of the underlying memory and messaging architecture.
|
11:00 - 11:30 |
Chris Baker
An Abstract Node API for Heterogeneous and Multi-core Computing
Abstract:
We outline an abstract node API to support the implementation of large-scale linear algebra primitives on a wide-range of HPC architectures. This API is primarily concerned with the issues related to memory management, data communication, and computational abstraction that arise in support of architectures employing a mixture of distributed, SMP/multi-core, and attached processors (GPU/Cell). This discussion takes place in the context of Tpetra, a templated next-generation implementation of the Epetra parallel linear algebra library in the Trilinos project.
Presentation Slides (pdf)
Biography:
Chris Baker is a postdoctoral researcher in the Scalable Algorithms department at Sandia National Laboratories. He completed his PhD in Computational Science at Florida State University under Kyle Gallivan in 2008. His research focused on the optimization of functions defined on Riemannian manifolds, with particular emphasis on applications to eigenvalue and singular value problems. He is currently a developer of Tpetra, the successor to Trilinos’s Epetra parallel linear algebra library, as well as the Anasazi large-scale eigensolver package.
|
11:30 - 12:00 |
Peter Messmer
|
12:00 - 1:30 |
Lunch |
1:30 - 2:00 |
John Clark
Sequoia on Roadrunner
Abstract:
An emerging class of high performance architectures expose a hierarchy of distinct memories managed explicitly in software. Sequoia is a machine independent programming language designed to efficiently abstract the programming details of memory management and movement among these various levels. Sequoia is being ported to the multilevel Roadrunner platform; we discuss this ongoing work and preliminary performance results on several standard benchmarks.
Presentation Slides (pdf)
Biography:
John Clark is a computational mathematics masters student in his second year in Stanford University's ICME. John has worked on the Sequoia port to Roadrunner as a collaborative effort between Stanford University's Sequoia project in the CS department and LANL's CCS-1.
|
2:00 - 2:30 |
Greg Bronevetsky
Static Communication-Sensitive Dataflow for Message Passing Applications
Abstract:
The dramatic growth in the popularity of parallel computing has driven significant work in the development of runtime systems and hardware for productively running parallel applications. At the same time, compiler techniques for analyzing and optimizing parallel applications have seen comparatively little growth. The reason is that parallel applications present a significant challenge to compiler developers because in addition to the traditional complications of sequential code, they have a very large number of possible executions. Furthermore, popular parallelism models such as MPI, shared memory and PGAS present additional complications by placing no bounds on the number of processes that may be used at runtime and by allowing processes and threads to choose their communication partners using arbitrary arithmetic expressions.
I will present a novel technique that extends the traditional dataflow compiler analysis framework to parallel applications. This framework identifies connections between send and receive operations in the application source code by symbolically matching their expressions. This information is used to propagate dataflow information throughout the application, taking into account the application's communication topology. Because the analysis does not depend on knowing the number of processes to be used at runtime, it works equally well for applications at any scale. Furthermore, as an extension of traditional dataflow, this framework is useful both for extending the power of existing analyses and to develop new analyses that specifically focus on parallel applications.
Presentation Slides (pdf)
Biography:
Greg Bronevetsky graduated from Cornell University in 2006 under the direction of Keshav Pingali. He currently holds a Lawrence Post-doctoral Fellowship at the Lawrence Livermore National Laboratory. Greg's work focuses on compiler analyses for parallel applications and scalable fault tolerance techniques.
|
2:30 - 3:00 |
Chen Ding
Suggestion-based Program Parallelization
Although parallelism exists in scientific computations, extracting parallel tasks is difficult because modern systems, especially when written in C and C++, are complex and may make extensive use of pointers, exception handling, custom memory management, and third-party libraries. To address these problems, we have been developing a parallelization system for suggestible parallelization. Unlike traditional programming, SPO allows a user or a profiling tool to parallelize or optimize a program based on *partial* information about the program code and the input. It uses software speculation and modern parallel processors to protect program correctness and to hide the overhead of dynamic checking and error recovery.
The basic components of suggestion-based parallelization includes marking of possibly parallel regions, on-line monitoring, correctness checking, and user feedback. The original system was described in a PLDI 2007 paper. Recent improvements include variable-size activity window and special memory management. A prototype system has been evaluated on a number of large benchmarks including open-source utility programs, scientific benchmarks, and a numerical library Intel MKL. Experiments on a multi-core, multi-processor system show significant performance improvement, with each processor accruing on average 30% to 60% of the sequential speed.
Presentation Slides (pdf)
|
3:00 - 3:30 |
Coffee Break |
3:30 - 4:00 |
Paul Woodward
Simulating Compressible Turbulent Mixing with Multifluid PPM on the Los Alamos Roadrunner Machine
Abstract:
Over the last two years, my team at the University of Minnesota’s Laboratory for Computational Science & Engineering (LCSE) has been redesigning our PPM gas dynamics codes for execution on multicore systems. We have focused especially on IBM’s Cell processor, as the most extreme representative of this new type of CPU, and have worked with colleagues at Los Alamos over the last several months to adapt our codes to the new Roadrunner system. We have found that our entire multifluid hydrodynamics computation can be pipelined so that it operates in a mode where data is continually streamed in and out of the Cell processor SPU’s on-chip memory while the compiled code resides permanently on chip. How the code is transformed to work in this fashion will be briefly described, and the very substantial performance benefits quantified. The hybrid design of the Roadrunner machine offers both challenges and benefits. Our adaptation of our codes to this hybrid environment will be described, particularly as it relates to message passing and I/O. The implications of our code redesign for scaling to hundreds of thousands of processor cores will also be discussed in terms of a performance model for the code, informed by experience on available systems. There is a dramatic benefit arising from a significant reduction of the granularity of tasks for individual cores, but this places very substantial stress upon the interconnect network. Finally, we will present results from running the code on Roadrunner to simulate the compressible, turbulent mixing of two gases of different densities at an unstable multifluid interface.
Biography:
Paul Woodward received his Ph.D. in physics from the University of California at Berkeley in 1973. He has focused his research on simulations of compressible flows in astrophysics, studying problems in star formation, supersonic jet propagation, convection in stars, and astrophysical turbulence. To carry out this work, he has developed, with various collaborators, numerical methods for fluid flow simulations, the best known of which is the PPM gas dynamics scheme, and has implemented these on a variety of very large computing systems. After working 11 years at the Lawrence Livermore National Laboratory over the period from 1968 to 1985, he joined the University of Minnesota faculty in 1985 as a Minnesota Supercomputer Institute Fellow in the department of astronomy. He was Director of Graphics and Visualization at the University’s Army High Performance Computing Research Center from 1990 to 1995, and founded the University’s Laboratory for Computational Science & Engineering (LCSE), which he directs, and which is now part of the University’s Digital Technology Center. In 1994, in collaboration with Silicon Graphics, his team developed the PowerWall visualization system, and the first PowerWall system was installed at the LCSE in 1995. The LCSE concentrates on high performance parallel computation and the data analysis and visualization that this requires. Woodward received the IEEE’s Sidney Fernbach award in large-scale computing in 1995 and, with 12 collaborators at Livermore, Minnesota, and IBM, received the Gordon Bell prize in the performance category in 1999.
|
4:00 - 4:30 |
David Bader
Accelerating Scientific Computing with the Cell Broadband Engine Processor
Abstract:
In this talk, we discuss the use of Cell for solving challenging combinatorial scientific computing applications. Our algorithms are tested on a cluster of 14 IBM QS20 Cell Blades with 24 Cell BE processors.
Georgia Tech is one of the first universities to deploy the IBM BladeCenter QS2x Server for production use, through the Sony-Toshiba-IBM (STI) Center of Competence for the Cell Broadband Engine (http://sti.cc.gatech.edu/) at Georgia Tech. The QS20 uses the same ground-breaking Cell/B.E. processor appearing in products such as Sony Computer Entertainment's PlayStation3 computer entertainment system, and Toshiba's Cell Reference Set, a development tool for Cell/B.E. applications. The IBM Cell/B.E. is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD coprocessing units (SPEs) integrated on-chip. Because of the performance capabilities of the Cell/B.E., it is adopted as an application accelerator for next-generation petascale supercomputers. In this talk, we discuss our research on several new Cell-optimized multi-core applications in areas such as digital content creation, gaming and entertainment, security, scientific and technical computing, biomedicine, and finance, which are developed using the Georgia Tech CellBuzz cluster of 28 Cell/B.E. 3.2GHz processors. An upgrade to CellBuzz provides several IBM QS22 with dual PowerXCell 8i double-precision processors per blade.
Presentation Slides (pdf)
Biography:
David A. Bader is Executive Director of High-Performance Computing and a Professor in Computational Science and Engineering, a division within the College of Computing, at Georgia Institute of Technology. Dr. Bader also serves as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor located at Georgia Tech. He received his Ph.D. in 1996 from The University of Maryland, was awarded a National Science Foundation (NSF) Postdoctoral Research Associateship in Experimental Computer Science. He is an NSF CAREER Award recipient, an investigator on several NSF and NIH awards, was a distinguished speaker in the IEEE Computer Society Distinguished Visitors Program, and a member of the IBM PERCS team for the DARPA High Productivity Computing Systems program. Dr. Bader serves on the Research Advisory Council of Internet2 and the Steering Committees of the IPDPS and HiPC conferences, and will be the General Chair of IPDPS 2010. David has chaired several major conference program committees and has served on numerous conference program committees related to parallel processing and high performance computing, is an associate editor for several high impact publications including the IEEE Transactions on Parallel and Distributed Systems (TPDS), the ACM Journal of Experimental Algorithmics (JEA), IEEE DSOnline, and Parallel Computing, is a Senior Member of the IEEE Computer Society and a Member of the ACM. Dr. Bader has been a pioneer in the field of high-performance computing for problems in bioinformatics and computational genomics. He has co-chaired a series of meetings, the IEEE International Workshop on High-Performance Computational Biology (HiCOMB), written several book chapters, and co-edited special issues of the Journal of Parallel and Distributed Computing (JPDC) and IEEE TPDS on high-performance computational biology. He has co-authored over 90 articles in peer-reviewed journals and conferences, is the editor of the recent book Petascale Computing: Algorithms and Applications, and his main areas of research are in parallel algorithms, combinatorial optimization, and computational genomics.
|
4:30 - 5:00 |
Douglas Doerfler
Adapting Codes for a Heterogenous Multi-core Red Storm
Abstract:
Sandia and Cray have recently upgraded a large fraction of Red Storm's compute partition to quad-core processors, while the rest remains dual-core. The quad-core processors differ from the dual-core in clock rate, memory specifications, and FLOPS per clock. This presentation will show our experience with running applications on both processor types and will discuss what we have done to run HPL on the whole machine in a manner that tries to use the processors in the most efficient manner.
Presentation Slides (pdf)
Biography:
Douglas Doerfler is a Principle Member of Technical Staff at Sandia National Laboratories. His research interests are high performance computer architectures and performance analysis. Current assignments include being a member of the NNSA New Mexico Alliance for Computing at Extreme Scale (ACES) design team, which has responsibility for acquiring the NNSA's next capability computing platform, and a member of the steering committee for the Institute for Advanced Architectures and Algorithms (IAA), a collaboration between Sandia and Oak Ridge National Laboratory to foster the integrated co-design of architectures and algorithms in the exascale era to enable more efficient and timely solutions to DOE's mission critical problems.
|
Workshop on Performance Analysis of Extreme-Scale Systems and Applications
Los Alamos Computer Science Symposium, October 14-15, Santa Fe, New Mexico
Organizers: Adolfy Hoisie (Los Alamos) and Jeff Hollingsworth (Maryland)
Building extreme-scale parallel systems and applications that can achieve high performance is a dauntingly difficult task. Today's systems have complex processors, deep memory hierarchies and heterogeneous interconnects requiring careful scheduling of an application's operations, data access and communication to achieve a significant fraction of potential performance. Furthermore, the large number of components in extreme-scale parallel systems makes failures inevitable; therefore, achieving fault-tolerance in hardware and/or system software becomes an integral part of the performance landscape.
In addition to "classical" performance considerations, the notion of high productivity of systems at scale is now of paramount importance. Productivity encompasses availability, fault tolerance, ease of use, upward portability (including performance portability), programming environments, as well as code development time. A related workshop on programming models for hybrid and heterogeneous systems will also occur at LACSS.
Given this multi-disciplinary mix of performance and productivity, in this workshop we will concern their interplay across system architecture, network, applications and system software design. The invited speakers will not only cover these areas, but will also address the state-of-the-art in methodologies for performance analysis and optimization including benchmarking, modeling, tools development, tuning and steering, as well as metrics for productivity.
The invited speakers will include people from academia, national labs, funding agencies, and R&D people representing computer vendors.
7:30 - 8:30 |
Continental Breakfast |
8:30 - 9:00 |
Harvey Wasserman (NERSC)
Recent Workload Characterization Activities at NERSC
Abstract:
The NERSC approach to supercomputer acquisition involves capturing the performance-critical characteristics of a highly diversified user workload in a relatively small set of application benchmark programs that can be easily packaged and ported to a variety of architectures. This talk will describe the process that NERSC used to derive the latest suite of benchmark programs, Some preliminary performance data will be presented along with an overview of the NERSC methodology for aggregating performance data.
Presentation Slides (pdf) |
9:00 - 9:30 |
Pat Worley (ORNL)
Performance Optimization at Scale - Recent Experiences
Abstract:
Recent activities in performance characterization and optimization on the Cray XT4 and the IBM BG/P will be described, focusing on issues that arise when optimizing performance at scale. Examples will be drawn from work with climate and fusion simulation codes.
Presentation Slides (pdf)
|
9:30 - 10:00 |
Celso Mendes (UIUC)
Abstract:
Multi-PetaFLOPS supercomputers are being planned for deployment during the next few years. Since these are expensive systems, and designed as capability computers, one would like that key applications start running efficiently on the full system from the first month (if not day) of operation. One of the challenges in developing applications that will run on such future machines is how to tune them for the machine, before the machine is available. Even after the machine is deployed, frequent tuning and diagnostic runs on the full machine may be difficult to come by. We present a methodology for accomplishing this for a large subset of applications with BigSim, a system for early application development and identification of performance problems. BigSim consists of an emulator and a trace-driven simulator, which leverage the virtualization support of Charm++. Another challenge is how to handle the amount of performance data that one can obtain on these large machines. Clearly
, scalable performance analysis tools are needed. We describe some of the ideas that we are exploring to handle large performance data volumes and to highlight the "interesting" events, and processors that may deserve focus of detailed analysis. Such scalable analysis techniques can be applied both to data obtained from execution on a real machine and to data obtained in BigSim simulations. Presentation Slides (pdf)
|
10:00 - 10:30 |
Coffee Break |
10:30 - 11:00 |
Karen Karavanic (PSU)
Environment Aware Performance Diagnosis
Abstract:
Environment Aware Performance Diagnosis is a new approach for automatically diagnosing the performance of high end applications. We integrate data from all layers of the runtime system to reach an accurate characterization that takes into account factors such as operating system interference, resource contention, and hardware failure. In this talk I will describe our approach, early results, and current efforts.
Presentation Slides (pdf)
|
11:00 - 11:30 |
Al Malony (Oregon)
Targeting TAU for Extreme Scale
Abstract:
As high-end computing (HEC) evolves to extreme-scale parallel systems with tightly integrated hardware, layered operating systems, and close coupling of application code, libraries, and runtime software, what are the challenges facing parallel performance technology. It is tempting to believe we can just "scale up" traditional performance diagnosis and tuning approaches used on terascale systems, but these "solutions" must address fundamental problems in increased data volume and analysis complexity that comes with extreme scaling. Moreover, the tighter interplay of the application with the petascale architecture and system which is necessary to reach high performance levels will likely require more dynamic and whole-system optimizations than are supported in tools today.
The talk will descibes the current and future efforts to transition the TAU Performance System to address the challenges of petascale performance analysis.
Presentation Slides (pdf)
|
11:30 - 12:00 |
Guojing Cong (IBM)
Automated performance analysis and tuning through the IBM high productivity computing system toolkit (HPCST)
Abstract:
In this talk I will present our research under the DARPA HPCS project for automated performance analysis and turning. Our goal is to improve the productivity of performance engineers when they deploy an application onto potentially complex HPC architectures. We are in the process of designing and implementing an open framework for capturing performance bottleneck patterns and discovering possible remedies for these performance problems. In performance analysis our framework facilitates combining compiler analysis, runtime information collected by existing performance tools, and expert knowledge for accurate diagnosis. Our solution determination mechanism is able to modify the source code, or to work closely with compiler in suggesting and implementing solutions. With the HPC systems becoming increasingly large and complex, we believe our approach will significantly boost the productivity in achieving high application performance on target machines.
Presentation Slides (pdf)
|
12:00 - 1:30 |
Lunch |
1:30 - 2:00 |
Greg Bronevetky (LLNL)
Accurate Prediction of Soft Error Vulnerability of Scientific Applications
Abstract:
Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientific applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications' overall vulnerability. This talk analyzes the vulnerability of these routines in the context of overall application error vulnerability. We develop a novel technique that uses vulnerability proles of individual routines to model the propagation of errors through chained invocations of them. We use our propagation models to assemble vulnerability proles of arbitrary scientific applications that are primarily composed of calls to BLAS and LAPACK. We demonstrate that the resulting application vulnerability proles are highly accurate while having very low overhead. |
2:00 - 2:30 |
Vladimir Getov (Westminster)
Integrated Framework for Development and Execution of Component-based Applications
Abstract:
Component-based software technologies have emerged as a modern approach to software development for parallel and distributed applications. However, the lack of longer-term experience and the complexity of the target systems demand more research results in the field. This paper provides an overview of three different approaches to developing component-based complex applications. In order to re-use legacy codes, the wrapper software approach can be adopted in its two flavours – hand-written or automatically generated wrapper code. Another approach applicable to existing object-oriented software is to componentise the code by introducing appropriate modifications. The third approach is component-oriented development from scratch. We compare and contrast the three approaches and highlight their advantages and weaknesses.
Presentation Slides (pdf) |
2:30 - 3:00 |
Jeff Carver (Alabama)
Software Development Environments for Scientific and Engineering Software: A Series of Case Studies
Abstract:
This presentation will discuss the findings from a series of case studies. As part of the DARPA HPCS project, I, along with some colleagues, studied a series of Computational Science and Engineering (CS&E) projects. The main goal of this investigation was to better understand what makes these projects unique. This type of understanding helps software engineering and CS&E develop more effective approaches for creating CS&E software. In this presentation, I will first introduce the process through which these case studies were conducted. I will then provide an overview of the projects studied. Finally, I will discuss 9 or 10 characteristics of CS&E software that make it more difficult that other types of software.
Presentation Slides (pdf)
|
3:00 - 3:30 |
Coffee Break |
3:30 - 4:00 |
Alan Snavely (SDSC)
Performance Modeling on the Path from Petascale to Exascale
Abstract:
This talk describes two record-breaking Petascale calculations, finalists respectively in the last Gordon Bell competition (2007) and the upcoming one (2008), and explains how performance modeling techniques helped to enable the tuning that resulted in the highest resolution non-hydrostatic weather simulation ever attempted (2007), and the highest frequency seismic wave ever propagated (in simulation) clear through the earth (2008). It goes on to provide some performance modeling forecasts summarized from a just completed DARPA study "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems" in which the author participated--in essence, the Von Neumann Bottleneck will remain with us, and will require modeling to identify the opportunities, and evaluate the strategies needed to mitigate it in hardware and software.
Presentation Slides (pdf)
|
4:00 - 4:30 |
Jeanine Cook (NMSU)
Monte Carlo Processor Modeling of Contemporary Architectures
Abstract:
With the increasing complexity of past-generation processor architectures came the need for a feasible method for performance modeling. Complex, superscalar, out-of-order execution processors were difficult to accurately simulate due to extreme time perturbation of the simulation. In the present age of multi-core, multi-threaded architectures, although individual cores may be simpler than in the past, accurately modeling and/or simulating the behavior of these cores and their interactions is still a problem. We have developed a processor modeling methodology based on a statistical technique that enables accurate performance prediction and analysis of single-/multi-core and multi-threaded architectures for any application. Models are based on processor characteristics that can be obtained from either the manual or from micro-benchmarking; inputs to the model are gathered from the application. Models predict performance very accurately and give a prediction in seconds. With respect to high-performance computing systems, these models can be practically incorporated into existing communication models to form composite system models for performance prediction and acquisition.
Presentation Slides (pdf)
|
4:30 - 5:00 |
Kevin J. Barker (LANL)
Application Performance Modeling: Predictive Accuracy in the Presence of Simplifying Abstractions
Abstract:
This talk will examine how abstractions can be used to simplify the expression of an application performance model without negatively impacting its predictive capability. We will use two application case studies: KRAK, a hydrodynamics code developed at LANL, and HYCOM, an ocean modeling code. Abstractions employed include simplifying both the computation and communication components of the performance model. Performance predictions are compared against measurements taken on large-scale hardware systems to ascertain how and under what conditions simplifications impact the accuracy of the resulting performance models.
Presentation Slides (pdf)
|
Workshop on Resiliency for Petascale HPC
HPC Resiliency Summit: Workshop on Resiliency for Petascale HPC
Recent trends in high-performance computing (HPC) systems have clearly indicated that future increases in performance, in excess of those resulting from improvements in single-processor performance, will be achieved through corresponding increases in system scale, i.e., using a significantly larger component count. As the raw computational performance of the world's fastest HPC systems increases from today's current tera-scale to next-generation peta-scale capability and beyond, their number of computational, networking, and storage components will grow from the ten-to-one-hundred thousand compute nodes of today's systems to several hundreds of thousands of compute nodes and more in the foreseeable future. This substantial growth in system scale, and the resulting component count, poses a challenge for HPC system and application software with respect to fault tolerance and resilience.
Furthermore, recent experiences on extreme-scale HPC systems with non-recoverable soft errors, i.e., bit flips in memory, cache, registers, and logic added another major source of concern. The probability of such errors not only grows with system size, but also with increasing architectural vulnerability caused by employing accelerators, such as FPGAs and GPUs, and by shrinking nanometer technology. Reactive fault tolerance technologies, such as checkpoint/restart, are unable to handle high failure rates due to associated overheads, while proactive resiliency technologies, such as preemptive migration, simply fail as random soft errors can't be predicted. Moreover, soft errors may even remain undetected resulting in silent data corruption.
The goal of the Workshop on Resiliency for Petascale HPC is to bring together experts in the area of fault tolerance and resiliency for high-performance computing from national laboratories and universities to present their achievements and to discuss the challenges ahead. The secondary goal is to raise awareness in the HPC community about existing solutions, ongoing and planned work, and future research and development needs. The workshop program consists of a series of invited talks by experts and a round table discussion.
Web sites:
Los Alamos Computer Science Symposium
Important dates:
- October 13-15: Los Alamos Computer Science Symposium
- October 15: Workshop on Resiliency for Petascale HPC
Program topics:
- Current system and application resiliency
- Application-level fault handling
- MPI-level fault handling
- System-level checkpoint/restart
- System-level preemptive migration
- System health monitoring
- System log analysis
- System failure analysis
- HPC resiliency standards
- Soft error issues
- Computational redundancy concepts
- Resiliency for HPC file/storage systems
Workshop general co-chairs:
- Stephen L. Scott
Computer Science and Mathematics Division
Oak Ridge National Laboratory, USA
scottsl@ornl.gov
- Chokchai (Box) Leangsuksun
eXtreme Computing Research Group
Computer Science Program
Louisiana Tech University, USA
box@latech.edu
Program co-chairs:
- Mihaela Paun
Mathematics and Statistics Program
Louisiana Tech University, USA
mpaun@latech.edu
- Christian Engelmann
Computer Science and Mathematics Division
Oak Ridge National Laboratory, USA
engelmannc@ornl.gov
7:30 - 8:30 |
Continental Breakfast |
8:15 - 8:30 |
Welcome |
8:30 - 9:00 |
John T. Daly (LANL)
Resilience: Sacrificing Previous Convictions About Physical Laws
Abstract:
Without a new paradigm of resilience, fleshed out through methods and tools that keep the application running in spite of underlying component failures, the end-to-end performance of extreme scale compute platforms, will plateau and eventually decline as a result of these increasing interrupt rates. We will attempt to examine these concepts rigorously and demonstrate in quantifiable ways why approaches emerging from traditional paradigms of reliability will not continue to be cost effective or power efficient means to keep large applications running.
Biography:
John Daly is a technical staff member in the HPC division at the Los Alamos National Laboratory. His research interests include application fault tolerance, system reliability, application resilience, calculational correctness, and mathematical modeling of application throughput. John is experienced in porting and running large-scale simulations to a variety of architectures and developing metrics and methods for measuring and enhancing HPC utilization. He has accumulated in excess of 40 million processor hours of computer time running on Red Storm, Purple, and BG/L. John holds degrees in engineering from Caltech and Princeton University, where he studied computational fluid dynamics under Antony Jameson. He has also worked as an application analyst and software developer for Raytheon Intelligence and Information Systems. |
9:00 - 9:30 |
Garth Gibson (Carnegie Mellon University/Panasas, Inc.)
Failure in Supercomputers and Supercomputer Storage
Abstract:
The largest computer systems have entered the era of Peta operations per second and will climb to Exa operations per second over the next decade, largely on the strength of more cores per chip and more chips per system. The inevitable consequence of increasing component counts is more parts that can fail, higher failure rates, more concurrent failures and more effort devoted to coping with and recovering from failures -- a key role for storage systems. In this talk I will review historical data on failure rates in supercomputers to project future failure rates, review growing limitations on traditional fault tolerance strategies for supercomputers based on high-speed checkpointing to parallel storage systems, and address the increasing failure issues in storage components.
Biography:
Garth Gibson is a professor of Computer Science and Electrical and Computer Engineering at Carnegie Mellon University (CMU) and co-founder and Chief Technology Officer at Panasas Inc. Garth received a Ph.D. in Computer Science from the University of California at Berkeley in 1991. While at Berkeley he did the groundwork research and co-wrote the seminal paper on RAID, then Redundant Arrays of Inexpensive Disks, for which he received the 1999
IEEE Reynold B. Johnson Information Storage Award for outstanding contributions in the field of information storage. Joining CMU's
faculty in 1991, Garth founded CMU's Parallel Data Laboratory (www.pdl.cmu.edu), academia’s premiere storage systems research center, and co-led the Network-Attached Storage Device
(NASD) research project that became the basis of the recently standardized T10 (SCSI) Object-based Storage Devices (OSD) command set for storage. At Panasas (www.panasas.com) Garth led the development of the Active Scale Storage Cluster in use in government and commercial high-performance computing sites, including the world’s first Petaflop computer, Roadrunner, at Los Alamos National Laboratory. Panasas products provide scalable performance using a simply managed, blade server platform. Through Panasas, Garth co-instigated the IETF's emerging open standard for parallelism in the next generation of Network File Systems (NFSv4.1). Garth is also principal investigator of the Department of Energy's Petascale Data Storage Institute (www.pdsi-scidac.org) in the Scientific Discovery through Advanced Computing program and co-director of the Institute for Reliable High Performance Information Technology, a joint effort with Los Alamos. Garth has sat on a variety of academic and industrial service committees including the Technical Council of the Storage Networking Industry Association and the program and steering committee of the USENIX Conference on File and Storage Technologies (FAST). |
9:30 - 10:00 |
Paul Hargrove, (LBNL)
System-level Checkpoint/Restart with BLCR
Abstract:
Berkeley Lab Checkpoint/Restart (BLCR, http://ftg.lbl.gov/checkpoint) is a DOE funded effort to produce a production-quality system-level checkpointing implementation suitable for use in preemptive scheduling, migration and fault tolerance. The BLCR implementation work is part of a larger multi-institution effort to define a “Fault Tolerance Backplane" (FTB) for HPC platforms, and to provide implementations of the system components that interact with the FTB (including batch scheduler, checkpointer, and MPI implementations among others). This talk will describe the goals of BLCR, its status, and its future directions.
Biography:
Paul H. Hargrove has been a full-time Principle Investigator at Lawrence Berkeley National Laboratory since September 2000, and since June 2005 has held an appointment in the Computer Science Division at the University of California Berkeley. His current research interests include checkpoint/restart for Linux, and high-performance cluster networks such as InfiniBand. Current projects include Berkeley Lab Checkpoint/Restart (BLCR) for Linux, Global Address Space Networking (GASNet), and Berkeley Unified Parallel C (UPC). |
10:00 - 10:30 |
Coffee Break |
10:30 - 11:00 |
Stephen L. Scott (ORNL)
Process-Level Fault Tolerance for Job Healing in HPC Environments
Abstract:
As the number of nodes in high-performance computing environments keeps increasing, faults are becoming commonplace. Frequently deployed checkpoint/restart mechanisms generally require a complete restart. Yet, some node failures can be anticipated by detecting a deteriorating health status in today's systems, which can be explored by proactive fault tolerance (FT). Our work proposes novel, scalable mechanisms in support of proactive FT and significant enhancements to reactive FT. The contributions are three-fold. First, we provide a transparent job pause service allowing live nodes to remain active and roll back to the last checkpoint while failed nodes are dynamically replaced by spares before resuming from the last checkpoint. Second, we complement reactive with proactive FT by a process-level live migration mechanism that supports continued execution of an application during much of migration. Third, we develop incremental checkpointing techniques to capture only data changed since the last checkpoint to reduce the cost of reactive FT.
Biography:
Dr. Stephen L. Scott is a Senior Research Scientist and team leader of the System Software Research Team in the Computer Science Group of the Computer Science and Mathematics Division at the Oak Ridge National Laboratory (ORNL). Dr. Scott’s research interest is in experimental systems with a focus on high performance distributed, heterogeneous, and parallel computing. He is a founding member of the Open Cluster Group (OCG) and Open Source Cluster Application Resources (OSCAR). Within this organization, he has served as the OCG steering committee chair, as the OSCAR release manager, and as working group chair. Dr. Scott is the project lead principal investigator for the Reliability, Availability and Serviceability (RAS) for Petascale High-End Computing research team. This multi-institution research effort, funded by the Department of Energy – Office of Science, concentrates on adaptive, reliable, and efficient operating and runtime system solutions for ultra-scale scientific high-end computing (HEC) as part of the Forum to Address Scalable Technology for Runtime and Operating Systems (FAST-OS). Dr. Scott is also principal investigator of a project investigating techniques in virtualized system environments for petascale computing and is Co-PI of a related storage effort, funded by the National Science Foundation, which is investigating the advantages of storage virtualization in petascale computing environments. Dr. Scott serves on a number of scientific advisory boards and is presently serving as the chair of the international Scientific Advisory Committee for the European Commission’s XtreemOS project. Stephen has published over 100 peer-reviewed papers in the areas of parallel, cluster and distributed computing and holds both the Ph.D. and M.S. in computer science. |
11:00 - 11:30 |
Rinku Gupta (LANL)
A coordinated infrastructure for Fault Tolerant Systems (CIFTS)
Abstract:
The need for leadership class fault-tolerance has steadily increased and continues to increase as emerging high performance systems move towards offering petascale level performance. While most high-end systems do provide mechanisms for detection, notification and perhaps handling of hardware and software related faults, the individual components present in the system perform these actions separately. Knowledge about occurring faults is seldom shared between different programs and almost never on a system-wide basis. A typical system contains numerous programs that could benefit from such knowledge, include applications, middleware libraries, job schedulers, file systems, math libraries, monitoring software, operating systems, and check pointing software. The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS) initiative provides the foundation necessary to enable systems to adapt to faults in a holistic manner. CIFTS achieves this through the Fault Tolerance Backplane (FTB), providing a unified management and communication framework, which can be used by any program to publish fault-related information. In this talk, I will present some of the work done by the CIFTS group towards the development of FTB and FTB-enabled components.
Biography:
Rinku Gupta is a senior scientific developer at Argonne National Laboratory and the lead developer for the Fault Tolerance Backplane project. She received her MS degree in Computer Science from Ohio State University in 2002. She has several years of experience developing systems and infrastructure for enterprise high-performance computing. Her research interests primarily lie towards middleware libraries, programming models and fault tolerance in high-end computing systems. |
11:30 - 12:00 |
Greg Koenig (ORNL)
Towards Support for Fault Tolerance in the MPI Standard
Abstract:
As the number of components comprising computer systems has grown, so has the need to deal with component failure for applications to utilize the full capabilities of these systems. As we face an explosion in system size, it is important to consider fault-tolerance through the full stack, from the hardware clear to the application, if we are to use the full capabilities of these emerging systems. The MPI Forum is currently considering what changes to make to the MPI standard to deal with failure. This talk will present the direction being taken by the MPI Forum's Fault Tolerance working group for responding to failures.
Biography:
Gregory A. Koenig is an R&D Associate at Oak Ridge National Laboratory where his work involves developing scalable runtime systems and parallel tools for ultrascale-class parallel computers. His interests also include middleware for grid and on-demand/utility computing incorporating technologies such as virtualization, fault detection and avoidance, and resource scheduling. He holds a PhD (2007) and MS (2003) in computer science from the University of Illinois at Urbana-Champaign as well as three BS degrees (mathematics, 1996; electrical engineering technology, 1995; computer science, 1993) from Indiana University-Purdue University Fort Wayne. |
12:00 - 1:30 |
Lunch Break |
1:30 - 2:00 |
Adam J. Oliner (Stanford University)
Studying Systems as Artifacts
Abstract:
Imperfections are an unavoidable characteristic of complex systems; the costs of these imperfections make it imperative for us to devise generic methods for effectively detecting and isolating them. Toward this end, we present a technique that infers the dependency structure of a system by looking for anomalous behavior correlated in time across components. I'll present some early results on a supercomputer and an autonomous vehicle, as well as provide a motivational survey of my work on system management: job scheduling, quality of service guarantees, checkpointing, and log analysis.
Biography:
Adam Oliner is a third-year PhD student in the Computer Science Department at Stanford University, working with Alex Aiken. He is a DOE High Performance Computer Science Fellow and honorary Stanford Graduate Fellow. Before coming to Stanford, he earned a Master's of Engineering in electrical engineering and computer science at MIT, where he also received undergraduate degrees in computer science and mathematics. He interned several times at IBM with the Blue Gene/L system software team and spent a summer studying supercomputers logs at Sandia National Labs. |
2:00 - 2:30 |
Jim Brandt, (SNL)
Combining System Characterization and Novel Execution Models to Achieve Scalable Robust Computing
Abstract:
New platforms are growing in both size and complexity, both within a node element and within the high-bandwidth, low-latency networks which provide the communication paths between node elements. Multi-core architectures add even more diversity to communication paths and contention for resources as the core count per socket continues to grow. Furthermore, corresponding growth in component count contributes to an ever shrinking system wide mean time to component failure. Understanding the heterogeneous and hierarchical nature of the platform will allow better utilization of the underlying platform resources and better handling of failure or expected failure situations.
This talk presents our ongoing work on using system characterization and resource state monitoring and analysis in conjunction with intelligent resource management and existing and new programming models to not only make applications more resilient to system faults but more efficient.
Biography:
Jim Brandt has been involved in research in high-performance computing platforms, performance optimization tools, and informatics for over 10 years. He is the lead of Sandia's OVIS (http://ovis.ca.sandia.gov) project which is developing an open-source tool for Intelligent Real-time Monitoring and Analysis of Large HPC clusters. OVIS has been used for analyzing system data from Sandia's Red Storm, Thunderbird, TLCC, and Talon clusters as well as chemical sensor data in conjunction with Sandia's SNIFFER project. Jim's relevant workshop organization activities include: organizer of the 2006 Tri-lab RAS workshop, chair of the 2008 Sandia Workshop on Data Mining and Data Analysis, and organizer of the 2007 Red Storm performance optimization workshop. |
2:30 - 3:00 |
Jon Stearley (SNL)
Root Cause Analysis
Abstract:
Because the functional interdependencies among components is numerous, complex, and dynamic, determining the root cause of failures on HPC systems requires extensive knowledge, unwavering tenacity, and often, a good "hunch". The difficulty of this task on future systems however grows not simply with the increasing number of components, but combinatorially with their interdependencies. Furthermore, as global checkpoint/restart overheads increase, the importance of a focused response to faults increases, which requires root cause determination. Consider a supercomputer as a graph where vertices are components (hardware or software), edges are dependencies (physical or functional), and labels are symptomatic factors (text, numeric thresholds, waveforms, etc) - is this model useful towards determining the root cause of failures within HPC systems to the benefit of human or automated responders?
Biography:
Jon Stearley enjoys variety and challenge, vocationally ranging from electrical engineering, neuroimaging programming, infrastructure architecture, and resilient supercomputing. Having spent the majority of recent efforts on log analysis (http://www.cs.sandia.gov/sisyphus), he is currently seeking to expand his scope of system information to compute upon, focusing on novel methods to determine the root cause of failures. |
3:00 - 3:30 |
Coffee Break |
3:30 - 4:00 |
Greg Bronevetsky (LLNL)
Accurate Prediction of Soft Error Vulnerability of Scientific Applications
Abstract:
Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientific applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications' overall vulnerability. This talk analyzes the vulnerability of these routines in the context of overall application error vulnerability. We develop a novel technique that uses vulnerability proles of individual routines to model the propagation of errors through chained invocations of them. We use our propagation models to assemble vulnerability proles of arbitrary scientific applications that are primarily composed of calls to BLAS and LAPACK. We demonstrate that the resulting application vulnerability proles are highly accurate while having very low overhead.
Biography:
Greg Bronevetsky graduated from Cornell University in 2006 under the direction of Keshav Pingali. He currently holds a Lawrence Post-doctoral Fellowship at the Lawrence Livermore National Laboratory. Greg's work focuses on compiler analyses for parallel applications and scalable fault tolerance techniques. |
4:00 - 4:30 |
Christian Engelmann (ORNL)
Modular Redundancy in HPC Systems: Why, Where, When and How?
Abstract:
In order to address anticipated high failure rates, resiliency characteristics have become an urgent priority for next-generation high-performance computing (HPC) systems. One major source of concern are non-recoverable soft errors, i.e., bit flips in memory, cache, registers, and logic. The probability of such errors not only grows with system size, but also with increasing architectural vulnerability caused by employing accelerators and by shrinking nanometer technology. Reactive fault tolerance technologies, such as checkpoint/restart, are unable to handle high failure rates due to associated overheads, while proactive resiliency technologies, such as preemptive migration, simply fail as random soft errors can't be predicted. This talk proposes a new, bold direction in resiliency for HPC as it targets resiliency for next-generation extreme-scale HPC systems at the system software level through computational redundancy strategies, i.e., dual- and triple-modular redundancy.
Biography:
Christian Engelmann is a R&D Staff Member in the System Research Team of the Computer Science Research Group in the Computer Science and Mathematics Division at the Oak Ridge National Laboratory (ORNL). He holds a MSc in Computer Science from the University of Reading and a MSc in Computer Systems Engineering from the Technical College for Engineering and Economics (FHTW) Berlin. As part of his research activities at ORNL, Christian is currently pursuing a PhD in Computer Science at the University of Reading. His research aims at providing high-level reliability, availability, and serviceability for next-generation supercomputers to improve their resiliency (and ultimately efficiency) with novel high availability and fault tolerance system software solutions. Another research area concentrates on ``plug-and-play'' supercomputing, where transparent portability eliminates most of the software modifications caused by divers platforms and system upgrades. |
4:30 - 5:00 |
James Elliott (Louisiana Tech University)
Making Resilience a Reality Through a Resilience Consortium
Abstract:
The study of large scale systems is challenging and attempting to draw objective conclusions is even more difficult. To better understand these systems and provide meaningful information to the entire HPC community some basic guidelines should be defined. From the data in the log files to the reports presented, a standard set of terminology and metrics with unified semantics should be introduced. There should also be cohesion among the various researchers and industry personnel to ensure that resilience research continues to grow. To initiate this process a consortium of researchers and industry personnel has been formed. This talk will highlight some of challenges encountered performing resilience research, and how we plan to address them through the resilience consortium.
Biography:
James Elliott is a PhD student at Louisiana Tech University studying under Dr. Box Leangsuksun. His interests lie in modeling and analyzing resilience mechanisms at various levels of the software stack. |
5:00 - 5:30 |
Discussion and Closing |
Next-Generation Particle-Based Simulations
Abstract:
Particle methods occupy an important sector of HPC application space. Particles can represent reality directly (e.g., atoms in Molecular Dynamics simulations, dust grains in SPH simulations) or can be proxies for a smooth field (e.g., Particle-In-Cell methods, Langevin solvers for Master equations). In addition, the dynamical rules in particle simulations can be very diverse, including smooth as well as stochastic forces, a variety of dissipation mechanisms, and probabilistic (Monte Carlo) techniques, as well as hybrid interactions (particles interacting with a smooth field). Particle codes have been developed aggressively on each succeeding generation of HPC platforms and, viewed as a class, have consistently played a dominant role in defining the performance envelope for large-scale computing applications. The rapid development of hybrid and heterogeneous systems -- as exemplified by LANL's Roadrunner platform -- offers new opportunities and corresponding challenges to the current paradigms of particle simulation.
With the scale of simulations now at the trillion-particle frontier, there are many questions that need to be confronted aside from how to make the simulation bigger and/or faster. As the dynamic range of simulations increases, physical processes that could be ignored earlier must now be included. In fact, the inherent multi-scale nature of many dynamical problems is only now becoming treatable as the dynamic range of the underlying codes has improved by several orders of magnitude. (Additionally, error controls must be proportionally tightened.) These advances, in many cases, have been due not only to improvements in hardware, but also due to the development of new methodologies, such as acceleration techniques.
This workshop aims to bring together researchers in this nascent area to share ideas, experiences, and plans for the future. The workshop will consist of a small number of invited talks followed by an open discussion session wherein all attendees are strongly encouraged to participate.
7:30 - 8:30 |
Breakfast |
8:30 - 9:00 |
Salman Habib (LANL)
Introductions/Initial Business |
9:00 - 9:30 |
Sriram Swaminarayan (LANL)
Porting Yesterday’s Molecular Dynamics Codes to Tomorrow’s Machines: A Case Study Using SPaSM |
9:30 - 10:00 |
Paul Mullowney (Tech X Corp.)
Towards Kinetic Modeling of Ion Transport in an ECRIS Plasma |
10:00 - 10:30 |
Coffee Break |
10:30 - 11:00 |
Brian Albright (LANL)
Next-Generation Particle-In-Cell Modeling of Plasma |
11:00 - 11:30 |
Salman Habib (LANL)
Petascale Cosmology Simulation: The Roadrunner Universe Project
Presentations Slides (pdf)
|
11:30 - 12:00 |
Art Voter/Danny Perez (LANL)
Replica Methods |
12:00 - 1:30 |
Lunch Break |
1:30 - 2:00 |
Mark Moraes (D.E. Shaw Research)
Anton, A Special Purpose Molecu¬lar Dynamic Machine Capable of Millisecond-Scale Simulation |
2:00 - 3:00 |
Directed open discussion on future challenges and possible collaborations/coordination |
3:00 - 3:30 |
Break |
3:30 - 5:00 |
Individual Discussions |
5:00 - 5:30 |
Discussion and Closing |
*All workshop participants need to register for the symposium.
|
Places to Visit in New Mexico
|