Scaling and Visualization of N-Body Gravitational Dynamics with GalaxSeeHPC

Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Alternative Way to Measure Private Equity Performance

DEFINING %COMPLETE IN MICROSOFT PROJECT

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

IMPACT ANALYSIS OF A CELLULAR PHONE

Damage detection in composite laminates using coin-tap method

Laws of Electromagnetism

Calculating the high frequency transmission line parameters of power cables

Project Networks With Mixed-Time Constraints

Politecnico di Torino. Porto Institutional Repository

An Interest-Oriented Network Evolution Mechanism for Online Communities

Vembu StoreGrid Windows Client Installation Guide

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

BERNSTEIN POLYNOMIALS

Financial Mathemetics

What is Candidate Sampling

Inter-Ing INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, November 2007.

Updating the E5810B firmware

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Implementation of Deutsch's Algorithm Using Mathcad

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

The Greedy Method. Introduction. 0/1 Knapsack Problem

Support Vector Machines

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Calculation of Sampling Weights

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

The OC Curve of Attribute Acceptance Plans

Recurrence. 1 Definitions and main statements

Realistic Image Synthesis

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Analysis of Premium Liabilities for Australian Lines of Business

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

8 Algorithm for Binary Searching in Trees

iavenue iavenue i i i iavenue iavenue iavenue

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

Activity Scheduling for Cost-Time Investment Optimization in Project Management

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Forecasting the Direction and Strength of Stock Market Movement

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

RequIn, a tool for fast web traffic inference

Lecture 2: Single Layer Perceptrons Kevin Swingler

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Can Auto Liability Insurance Purchases Signal Risk Attitude?

This circuit than can be reduced to a planar circuit

SIMULATION OF THERMAL AND CHEMICAL RELAXATION IN A POST-DISCHARGE AIR CORONA REACTOR

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Single and multiple stage classifiers implementing logistic discrimination

Multiple-Period Attribution: Residuals and Compounding

Traffic State Estimation in the Traffic Management Center of Berlin

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Enterprise Master Patient Index

Actuator forces in CFD: RANS and LES modeling in OpenFOAM

CHAPTER 14 MORE ABOUT REGRESSION

Quantization Effects in Digital Filters

Canon NTSC Help Desk Documentation

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Conferencing protocols and Petri net analysis

where the coordinates are related to those in the old frame as follows.

LIFETIME INCOME OPTIONS

Credit Limit Optimization (CLO) for Credit Cards

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

The Current Employment Statistics (CES) survey,

Loop Parallelization

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Overview of Financial Mathematics

Extending Probabilistic Dynamic Epistemic Logic

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

GENESYS BUSINESS MANAGER

Joe Pimbley, unpublished, Yield Curve Calculations

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Heuristic Static Load-Balancing Algorithm Applied to CESM

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

1 Example 1: Axis-aligned rectangles

Capacity-building and training

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Rotation Kinematics, Moment of Inertia, and Torque

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

7.5. Present Value of an Annuity. Investigate

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Transcription:

Journal of Computatonal Scence Educaton Scalng and Vsualzaton of N-Body Gravtatonal Dynamcs wth GalaxSeeHPC Davd A. Joner Kean Unversty 1000 Morrs Ave Unon, NJ djoner@kean.edu James Walters Kean Unversty 1000 Morrs Ave Unon, NJ walterj1@kean.edu ABSTRACT In ths paper, we present GalaxSeeHPC, a new cluster-enabled gravtatonal N-Body program desgned for educatonal use, along wth two potental student experences that llustrate what students mght be able to nvestgate at larger N than avalable wth earler versons of GalaxSee. GalaxSeeHPC adds addtonal force calculaton algorthms and nput optons to the prevous clusterenabled verson. GalaxSeeHPC lessons have been developed focusng on two key studes, the structure of rotatng galaxes and the large scale structure of the unverse. At large N, vsualzng the results becomes a sgnfcant challenge, and tools for vsualzaton are presented. The canoncal lesson n the orgnal verson of GalaxSee s the rotaton and flattenng of a cluster wth angular momentum. Model dscrepances that are not obvous at the range of N avalable n prevous versons become qute obvous at large N, and changes to the ntal mass and velocty dstrbuton can be seen more readly. For the large scale structure models, whle basc clearng and clusterng can be seen at around N=5,000, N=50,000 allows for a much clearer vsualzaton of the flamentary structure at large scale, and N=500,000 allows for a more detaled geometry of the knots formed as the flaments combne to form superclusters. For the galactc dynamcs smulatons, we found that whle a flattenng due to overall angular momentum can be explored wth N=1,000 or smaller, formaton of spral structure requres not only a larger number of objects, typcally on the order of 10,000, but also modfcatons to the default ntal mass and velocty dstrbutons used n older versons of GalaxSee. Keywords N-Body smulatons. Gravtatonal dynamcs. Scalng, Vsualzaton. 1. INTRODUCTION 1.1 Motvaton GalaxSee s a gravtatonal dynamcs program ntally developed by Mke South and the Shodor Educaton Foundaton, Inc.[5]. The Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Copyrght JOCSE, a supported publcaton of the Shodor Educaton Foundaton Inc. orgnal verson was desgned for the Macntosh, and focused on allowng users to create small N-Body smulatons usng a pont and clck nterface, to solve the problem of gravtatonal dynamcs, where the force on any object due to any other object j s gven by: F j M M = G x x j j 3 ( x x ) A wde varety of approaches have been developed to solve the gravtatonal N-body problem[1], ncludng many state of the art computatonal tools desgned for research (see for example [15]), as well as many educatonal tools. Most research grade tools for N-Body smulaton have obstacles to ther adopton as a classroom tool notably a relance on non-standard complers, multple software dependences, and non-human-readable fle formats. Most educatonal N-Body tools, however, focus on the use of graphcal user nterface to remove obstacles for students, but replace those obstacles wth lmtatons on the sze of N, ether hard-coded n the tool tself or self-mposed by the CPU requrements of real-tme vsualzaton of results. Typcal classroom use smulatons for N-Body problems usng tools wth lmts on the sze of N range from -Body problems such as the orbt of the Earth around the Sun up to smulatons of smple gravtatonal dynamcs, exotc solutons of the few body problem [3], where users could create ntal mass dstrbutons wth or wthout angular momentum and explore the dsk formaton that resulted from a spnnng cluster of gravtatonally bound masses, or collsons of dsk galaxes under the assumpton of small objects orbtng two massve cores [9]. 1. GalaxSeeHPC Learnng Goals The two scenaros presented n ths paper focus on studes of structure, the frst of the formaton and stablty of spral structure and the second of elements n large scale structure. Both of these are meant to be vewed qualtatvely, as there are many physcal elements left out of the model. In the case of the spral structure scenaro, the galaxy model presented does not account for drag due to the nterstellar medum. The large scale structure scenaro assumes Newtonan gravty n a constantly expandng unverse. Even wth these phenomena left out, however, key concepts n gravtatonal dynamcs can be quckly and easly seen by students. As the tool has been created for general purpose, specfc learnng goals would be largely mplementaton specfc and would depend on the goals the nstructor wanted to emphasze. An nstructor j 10 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1 focusng on performance or algorthms mght have dfferent goals than an nstructor focusng on a scence lesson. Lke many of the tools developed at Shodor, GalaxSee has always followed the paradgm that t should be able to address both (computatonal scence) educaton and computatonal (scence educaton). In the case of spral structure, students mght learn that the formaton of spral arms s a natural occurrence gven a velocty profle that s gravtatonally stable, and that not all velocty profles wll be gravtatonally stable. Students can, n the process of explorng spral structure get practce creatng velocty curves for model galaxes whch could then be compared to those of real galaxes, whch mght then prompt a dscusson of dark matter or other ssues of nterest. In the case of large scale structure, students can explore the nterplay between expanson velocty and ntal mass densty for an expandng cube wth perodc boundary condtons and wrapped gravty. Whle ths leaves out some key features of the Lambda-CDM model, t wll allow students to see a trend towards ntal clumpng along flaments, provded suffcently hgh mass densty and suffcently low expanson velocty. The stablty of those flaments over tme can be seen to be strongly affected, wth a tendency towards a bg crunch for more dense and more slowly expandng systems. Both of these cases lead naturally to goal-seekng exercses ( How can I change the velocty profle of ths galaxy? What f ths unverse has more mass n a gven expandng cube? ) that focus on smple conceptual questons related to the balance of gravty, angular momentum, and expanson. In terms of the computatonal scence learnng enabled by these lessons, students get practce usng tools runnng at a command lne, nput fle creaton, management, and analyss, parallel job submsson and montorng. The data sets created are rch, wth sgnfcant challenges n the vsualzaton of results. The smulaton ncludes a varety of force calculaton methods, whch, whle not necessarly state-of-the-art, provde an entry level nto two of the key methods used n modern N-Body, tree-based and partcle-mesh methods. 1.3 GalaxSee revson hstory The orgnal GalaxSee, lke many educatonal N-Body tools, took the approach of a graphcal user nterface wth the ablty to precreate systems at random wth a small number of parameters. Later versons of the code ncluded GalaxSee.0 for Wndows, whch kept the look and feel of the orgnal, but added the ablty to use a Barnes-Hut force calculaton, and a Java based web-start verson. GalaxSee-MPI was wrtten to explore parallel computng, removng the GUI nterface, as well as the Barnes-Hut force calculaton, and allowng for MPI based parallelzaton of a drect force calculaton[8]. GalaxSee-MPI was orgnally ntended just as an exploraton of parallelsm, and lacked any features to control the nput to the smulaton, nor dd t have any advanced features for vsualzaton, lmtng tself to a non-nteractve top-downsde-vew mage of the smulaton. 1.4 GalaxSeeHPC Software Goals The purpose n wrtng GalaxSeeHPC was to provde students wth an N-Body code that (a) allowed students to explore the types of problems that cannot be solved at smaller values of N, (b) allowed students to see examples of some of the force calculaton algorthms that have allowed for the ncreased use of N-Body algorthms, (c) was wrtten n code that s desgned for readablty and modfcaton, (d) had a smplfed dependency stack so that some functonalty would be avalable even wthout any addtonal code and that other features could be enabled easly as software dependences were met, and (e) allowed for human-readable nput and output fles so that students would not have to smultaneously learn modern herarchcal fle structures at the same tme as learnng ether the physcs or algorthms of the N- Body problem. GalaxSeeHPC s a re-wrte of the GalaxSee-MPI C++ code Lessons are avalable from the Blue Waters Petascale Educaton webste[11] and source code s avalable from Sourceforge[6]. GalaxSeeHPC was wrtten n C to allow for greater portablty, and ncludes both the ablty to perform a Barnes-Hut style force calculaton algorthm as well as a Partcle-Partcle Partcle-Mesh (PPPM) algorthm. Whle stll command lne based, GalaxSeeHPC allows for the user to use a text nput fle to specfy model parameters, ncludng changng the scalng and unts used for the problem, allowng a lnear expanson of the spatal unts (e.g. for a smulaton n an expandng unverse), force calculaton method and parameters, softenng factors, numercal ntegraton optons, and output features. X-Wndow based output s stll avalable, but a more nteractve SDL-based vsualzaton s also an opton, as are multple dfferent graphcs and text output optons. CMake s used for confguraton and buld management, and the code can be confgured at comple-tme to gnore any optons that requre numercal or graphcal lbrares not present on the system. GalaxSeeHPC has been used and tested n multple sessons for Physcs faculty at the SC09 and SC10 educaton programs. The vsualzaton of results from GalaxSeeHPC has been a feature of multple SC and NCSI workshops on scentfc vsualzaton. GalaxSeeHPC has been used n two successve summer camp envronments wth hgh-school age students.. GALAXSEEHPC ALGORITHMS As every object can nteract wth every other object, ths potentally leads to N( N 1) forces that need to be calculated, though n practce half of these forces wll be redundant as each force par s equal and opposte. As an Ο ( N ) problem, as N grows large the computatonal tme requrements of the problem can quckly grow beyond the lmtatons of a typcal classroom PC or laptop. The three approaches that are used to allevate ths problem are parallelsm to spread the work over multple processes, bnary tree based sortng of masses to determne whch forces can be approxmated by substtutng a pont mass n place of a large number of dstance masses, and spectral technques that nterpolate onto a densty grd whch can be solved usng Fourer technques..1 Barnes-Hut The Barnes-Hut algorthm s a tree based approach to approxmatng the force feld due to dstant partcles[]. An octtree s constructed for the space modeled, wth the tree recursvely refned untl each sub-element contans only one object. As the force s calculated, nearby objects, whch typcally wll be close by on the oct-tree and can be located quckly, are used n a drect force calculaton, and as objects are further away, branches of the tree can be approxmated as a pont mass, averagng the masses and postons of many masses nto a sngle force calculaton. August 014 ISSN 153-4136 11

Journal of Computatonal Scence Educaton Tree methods work on the prncple that one can organze an n- body model n a data structure that ensures that nearest neghbors can be easly defned for any one body, and that dstant neghbors can be easly approxmated usng a center of mass treatment. In one dmenson, ths can be thought of as a bnary tree, whch can be extended to three dmensons usng an oct-tree structure. A smple mplementaton of a tree-based structure mght assume that physcal proxmty s equvalent to beng leaves on the same branch, but problems can occur for partcles at the edge of a hgh level branch boundary, that are physcally close to each other, but separated by many branchng on the oct-tree. A modfcaton of the tree algorthm to take nto account ssues lke ths mght check to see f a node beng tested s close enough to the object of nterest to be suspect. As one descends the tree, ths closeness radus can get smaller and smaller. If we consder s to be the scale of a tree segment at depth l, one mght attempt the followng force calculaton method 1. For a gven object, start at the top of the tree. Descend tree a. If chld node s not a predecessor (along the same branch) of the object beng calculated AND the object n queston s at a dstance greater than ks from the center of mass of l the node, stop and use the total mass and center of mass of that node b. If chld node s a predecessor (along the same branch) of the object beng calculated OR the object n queston s at a dstance less than ks from the center of mass of the node, but l does not SOLELY contan the object beng calculated, descend all chldren of node The accuracy of the method can be controlled by the closeness crteron k. Fgure 1 gves a vsualzaton of ths n 1-dmenson usng a bnary tree structure. Note that n the case k = 0 ths reduces to the prevous algorthm, and n the case k ths approaches a drect force calculaton. The total number of forces N log N n ths stuaton nstead of to be calculated wll scale as ( ) N for large models, and the accuracy of the tree calculaton (and assocated trade-off n speed) can be adjusted by use of the closeness crtera. Fgure 1: Use of a closeness checkng factor can elmnate errors due to aggressve tree prunng l. Partcle-Partcle Partcle-Mesh Spectral methods, typcally solved usng the FFT algorthm, reduce the dscrete n-body problem to a contnuous gravtatonal problem solved on perodc boundary condtons[4]. Computatonally, the advantage of spectral technques s that t allows you to separate the long-range forces from the short-range forces, and use a drect calculaton of short range forces whle replacng long range forces wth the soluton of a potental functon that satsfes Posson s equaton. Φ = 4πGρ If a functon for the densty of space can be approxmated, ths can be solved easly as the Laplacan of the Fourer transform of a functon s gven by ˆ Φ = 4π k where Φˆ s the Fourer transform of Φ. Ths gves for the soluton of Posson s equaton ˆ G Φ = π k whch can be solved usng a dscrete Fourer transform, typcally the Fast Fourer Transform (FFT) algorthm. For the mplementaton n GalaxSeeHPC, the pont mass dstrbuton s frst nterpolated onto a densty grd at evenly spaced ntervals n x, y, and z. Each mass s treated as f t s mass s spread out over a Gaussan wth standard devaton σ = k n / S where n s the number of grd ponts n each dmenson (assumed to be equal n all dmensons n GalaxSeeHPC), S s the scale of a perodc box n the model, and k σ s a user suppled parameter. The Partcle-Partcle correcton s appled to all ponts wthn some dstance σ near = k near n / S, where k near s a user suppled constant. Default values of k =. 0 and k =1. 0 are used n σ the code. For the purposes of the perodc boundary condtons n the PPPM algorthm, partcles are ghosted across a perodc boundary f t results n a partcle beng closer to a second for the purposes of force calculaton..3 Parallelsm.3.1 Drect Force Calculaton The wall-tme when usng a drect force calculaton s domnated by the nested loop over all partcles. Ths s parallelzed n GalaxSeeHPC usng MPI, and a round robn schedulng scheme to determne whch partcle s forces are calculated by whch process..3. Tree-Based Force Calculaton The tree creaton takes suffcently lttle tme compared to the force calculaton that we parallelze only the calculaton of the forces from the bult tree. The tree s typcally bult every tmestep, but ths can be reduced by the user. The loop over all partcles to calculate forces from the tree s scheduled usng MPI n a round-robn fashon..3.3 PPPM Method The creaton of the densty grd and the nterpolaton of forces from the densty grd both consume a sgnfcant porton of the force calculaton n the PPPM method. Each of these processes are parallelzed n MPI usng a round robn scheduled loop. ˆ ρ Φˆ near σ 1 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1.4 Softened Potentals An ssue occurs due to the 1/ r potental n the gravtatonal N- Body problem n that there s a sngularty n the force as partcles get very close to each other. Typcally, one uses some method of alterng the potental to remove any sngulartes. Ths can be done by one of two methods n GalaxSeeHPC. The frst s through use of a sheld radus, as s done n prevous versons of GalaxSee, n whch the user specfes a parameter whch defnes a cutoff radus, wthn whch forces are gnored. In practce GalaxSeeHPC uses an adaptve algorthm that depends on the central mass causng the force and the tmestep beng used, and the actual sheld radus s gven by r = k s 3 sr GMΔt where the sheld radus scalng factor k sr s taken to be 5 by default. Tradtonally, most codes n the lterature use what s referred to as a softened potental, n whch the potental (and hence force) functons can be modfed to nclude a softened dstance, effectvely treatng all dstances as f they were some small dstance ε greater than they actually are. wth acceleratons F M and potental energy = P = PE = N GM 1 ( R + ε ) x x j GM j j x j x + 1 = 1 j= 1 GM M j j ε j x x x x + ε 3. SCENARIOS One queston that has arsen n many presentatons of GalaxSee- MPI to faculty, partcularly Physcs faculty nterested n the scence that could be learned by such a smulaton rather than computer scence faculty nterested n scalng propertes, has been whether or not students workng on projects nvolvng N-body smulatons need to run models wth enough ponts to warrant hgh performance computng resources. A large class of astrophyscal problems tradtonally ft nto what are often descrbed as mllon-body problems problems that requre enough ponts for study that statstcal or hydrodynamcal approaches are not approprate, but for whch usng too few ponts n an N-body soluton wll result n approxmaton error such that results are qualtatvely ncorrect[7]. Two problems are presented here that ft nto ths category, the modelng of galactc structure and the modelng of large scale structure n the unverse. 3.1 Galactc Structure 3.1.1 Potental Learnng Goals, Scence Students performng ths exploraton mght, dependng on mplementaton, focus on the velocty profles requred to mantan a gravtatonally stable structure and the patterns that develop, as well as how the patterns that develop depend on the ntal ansotropy of the mass dstrbuton. 3/ 3.1. Potental Learnng Goals, Sklls As N s ncreased, the computatonal overhead of a drect force calculaton rapdly wll ncrease the computatonal requrements of each run. The use of a tree-based method would be approprate n ths case as a perodc soluton s not needed and the problem doman wll have large regons of physcal space n whch there are few stars. Students can explore performance of tree-based methods as compared to drect force calculatons. The parallelzaton method currently mplemented does not truly splt bodes across processors but merely shares the results of force calculaton at each step. Students can explore the effect of communcaton on scalng as the code moves from a computaton bound problem to a communcaton bound problem when ncreasng the number of processes. 3.1.3 Overvew Galaxes are large collectons of stars, gas, and dust surrounded by relatvely empty space, typcally on the order of many kloparsecs n sze and contanng hundreds of bllons of stars. A key feature of galactc structure s the shape as classfed on a tunng-fork dagram, categorzng galaxes as ellptcal, spral, or barred spral[1]. (Teachers and students can fnd publc doman mages of many of these objects onlne, organzed by galaxy type[13].) A feature of the orgnal GalaxSee code was the exploraton of how the nterplay between gravty and angular momentum tended to flatten a large rotatng mass of gravtatonally bound objects. However, runnng models larger than a few thousand ponts was mpractcal, both due to hard coded features n early verson of the code and the lack of an ablty to operate n a command lne mode wth saved snapshots for models that requred longer to run. Addtonally, whle t was possble to create models wth dfferent mass dstrbutons and rotaton curves, the default ntal mass dstrbutons and rotaton curves n GalaxSee dd not produce results that could be easly compared to mages of spral galaxes. As GalaxSeeHPC makes for a more practcal approach to runnng models wth larger N, smulatons were run to test the results at N=5,000, 50,000, and 500,000. Addtonally, models were run wth the default ntal dstrbuton and velocty profle n GalaxSee, wth a mass dstrbuton that s more heavly weghted to the center of the ntal dstrbuton, and wth a velocty profle that s lowered for object near the center of the mass dstrbuton. 3.1.4 Intal Condtons The orgnal wndows GalaxSee used as ts ntal condtons a random unform dstrbuton wthn a sphere, and a velocty dstrbuton assocated wth a crcular orbt wth centrpetal acceleraton equal to the central force beng provded by gravty. As the number of partcles s ncreased, certan ssues related to the default GalaxSee ntal condtons are seen. In partcular, a unform dstrbuton does not have enough mass n the core to keep the entre structure cohesvely bound, and the dstrbuton breaks up nto many small clusters n orbt around each other. Addtonally, the assumpton of velocty set to centrpetal acceleraton works well at the edges of the galaxy, but towards the center ths overestmates the actual orbtal speeds, and smulatons see a clearng effect wheren a rng structure s formed as opposed to somethng that looks lke an ellptcal, spral, or lentcular galaxy. As a result, our ntal condtons are taken to be normal dstrbutons n x, y, and z for poston, parameterzed by the standard devatons of the normal dstrbutons σ, σ, and σ. Veloctes are calculated by modfyng the assumpton of x y z August 014 ISSN 153-4136 13

Journal of Computatonal Scence Educaton centrpetal acceleraton caused by gravtatonal force to allow for a slower velocty towards the center. rj a = GM j 3 j rj a T = [ ax, ay,0] v T where = [ x y,0], 1 ρ = 1+ erf 1 Ρ ρ assumng the entre mass dstrbuton s centered at the orgn, and Ρ s the pont at whch the slower veloctes towards the core swtch over to a more typcal centrpetal acceleraton-based velocty towards the edges. For each of the models here, we have assumed Ρ = σ x / 5. a T ρ 3.1.5 Results of Galactc Structure Smulatons A smulaton was run wth an ntal dstrbuton wth σ x = 383pc, σ y = 0. 8σ x, and σ z = 0. 1σ x, at szes of 1,000, 5,000, 50,000, and 500,000 ponts (see Fgure Error! Reference source not found.). At 1,000 ponts, typcal of the problem szes one would use wth the Wndows verson of GalaxSee, the possblty of a spral structure s hnted at by the results, but cannot be clearly seen wth so few ponts. Increasng the sze to 5,000 ponts makes the spral structure more vsble, and 50,000 ponts allows for a clear structure of spral arms wth clusters along the arms. Models were run for 1 bllon years at a tmestep of 500,000 years, usng an Adams-Bashforth-Moulton ntegraton scheme and a Barnes-Hut force calculaton scheme. Fgure Spral Galaxy Model wth varyng values of N. From top left to bottom rght N= 1000, 5000, 50000, and 500000. As can be seen n the comparson of the N=1,000 pont and N=5,000 pont smulaton, 5,000 ponts was the bare mnmum to begn seeng clearly any spral structure that formed n these models, and on the order of 10,000 ponts s preferred. A 5,000 pont model for GalaxSeeHPC wth 8 processes ran n 3 mnutes 54 seconds, and a 50,000 pont model wth 16 processes ran n 43 mnutes. For practcal use n a classroom lab, 5,000 pont models are best run on a mult-core workstaton or small cluster, and 50,000 pont models are best done as ether a sngle model run at the begnnng of a class and analyzed afterwards or overnght or as part of longer term student projects. Models wth 500,000 ponts showed more detal, but dd not have qualtatvely dfferent features for ths problem than those wth N=50,000, whle requrng sgnfcantly longer to run. 3.1.6 Scenaros for students to nvestgate One key ssue n the formaton of classc spral and barred spral structures s the need for some dfference n the scale n the x and y drectons for the ntal condtons. The lower the eccentrcty of the ntal materal, the less lkely t s that the resultng galaxy wll have a classc two-armed spral structure. A second ssue for students to study s the dstrbuton of mass n the formng galaxy lookng at the dfference between normally dstrbuted matter and unformly dstrbuted matter, wthout an elevated densty towards the center of the galaxy there wll not be enough gravty to hold the center together, and students wll see systems that fragment nto many smaller rotatng clusters. Addtonally, t s possble to overestmate the acceleraton of 14 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1 objects towards the center f one smply sets centrpetal force equal to the gravtatonal force exerted on each object. Both of Testng these are shown n Fgure 3. Fgure 3 Smulaton of galaxy formaton wthout any eccentrcty to nduce spral arm formaton (left) and wthout a hgher densty n the central regon to form a core (left) Ths can be seen by example wth a though experment n whch two equal stars orbt each other. Snce t s not a case of a sngle object orbtng a more massve one, the actual veloctes requred to mantan a stable orbt are half what t would be otherwse. Ths s addressed n the ntal velocty functon used n ths paper by usng an error functon to create an nteror zone where the objects are treated as f they are orbtng each other, and an exteror zone n whch objects are orbtng a central mass. Havng too lttle mass n the center can lead to fragmentaton of the galaxy beng modeled, and havng too hgh of a speed for the nteror objects can lead to clearng of the nner regons and thus fragmentaton of the galaxy beng modeled. 3. Large Scale Structure of the Unverse Issues of cosmology on a large scale are both of nterest to many students and are well reported n current meda and research lterature. Recent advances n computatonal smulatons have led to understandngs of the structure of the unverse and the connecton to the Λ CDM model of bg-bang cosmology[4]. One of the largest N-body smulatons ever run the Mllennum Smulaton focuses on ths problem[14]. Modelng of large scale structure s complex the dstance scales change as the unverse expands, and results depend senstvely on both the ntal ansotropy of the mass dstrbuton as well as the densty. Computatonally the problem requres treatng the space modeled as a unt cell wth perodc boundary condtons. However, students can explore at some level conceptual deas wth a smple Newtonan model. Our approach n GalaxSee s to let the students explore self-gravtaton of a random ansotropc ntal mass dstrbuton n an expandng perodc box. Intal student exploraton nto large scale structure can nclude an overvew of the exstng data on large scale structure, and attempts to ft models of bg-bang expanson, gravtatonal condensaton of galaxes, and freezng out of structures as the unverse expands to that data, partcularly wth regards to the eventual end fate of our unverse. Whle recent studes suggest that there s suffcent nflaton to sustan the unverse and have t contnue ts expanson, untl recently t was unknown by scentsts whether the unverse s gravtatonal pull would ever result n an eventual bg crunch collapse. Ths provdes a compellng queston for students to nvestgate, and allows them to understand the process by whch computatonal scence has nformed us about ths phenomenon. Even wthout allowng for ether an expandng unverse or any nflaton to that expanson, students can, wth only Newtonan gravty, explore the creaton of flamentary structure and the eventual progresson to a collapse event wthout expanson to prevent t. (As of verson 1.1, GalaxSeeHPC supports the ablty to model an expandng unverse wth a constant expanson rate, but does not allow for nflaton though ths s a modfcaton that a student could make.) The models used n studyng cosmologcal structure are often referred to as unverse-n-a-box models, n that they take what mght be consdered a unt-cell of the unverse, and approxmate the gravtatonal effect of the surroundng unverse by assumng that thngs are sotropc enough that whatever s happenng on the left sde of the cell s just as lkely as anythng else as to be a representaton of what mght be happenng beyond the rght edge of the unt cell. As such, perodc boundary condtons are appled, n effect gvng us a torodal geometry n order to approxmate a pece of a larger unverse. Students can change the ntal mass densty and sze of ths unverse n a box, start wth a random ntal dstrbuton, and smulate the ntal clusterng and eventual collapse that occurs. Students can see an nterm stage before collapse where the types of structures formed closely resemble both the more accurate cosmologcal models beng run on research codes. 3..1 Intal Condtons The ntal mass densty and unt cell sze were chosen so to ensure that the smulaton would results n vsble creaton of flamentary structure, and the mage shown are taken at the peak of the flamentary nature of the structure before further collapse August 014 ISSN 153-4136 15

Journal of Computatonal Scence Educaton occurred. The models shown here were run wth no expanson and 1.0e14 solar masses randomly dstrbuted n a 1 megaparsec cubed box. (Note that these numbers are chosen smply to produce qualtatve results and are not meant to be physcal. Whle these ntal condtons can qualtatvely show flamentary structure t results n a mass densty of the unverse that s orders of magntude greater than observed and not stable for the lfetme of the unverse.) 3.. Results of Unverse n a Box smulatons Smple effects can be seen wth a farly modest value of N. Consder the followng smulaton result, usng GalaxSeeHPC wth the PPPM algorthm and N=5,000. Fgure 4 shows the results of a model wth N=5,000 usng the PPPM algorthm requred roughly 1 second per tmestep runnng n seral on a Xeon-based machne, wth parallel performance peakng at only a few processes, though larger values of N were able to scale to more processes. Wth a typcal model requrng on the order of a thousand tmesteps, ths s well wthn the range of what a student mght do n a lab settng, runnng a smulaton every 10-0 mnutes on typcal hardware Fgure 4 Large scale structure smulaton, N=5000, 50000, 500000 Lookng at the same model for greater values of N, students wll be able to see more detal. At N=50,000, the knots n the mddle of the flaments become more readly apparent and addtonal structure n the flaments can be seen The connectedness of the flaments s much clearer. The typcal CPU tme for models of ths sze n our tests was on the order of days. Scalng up to 8 processes for ths problem sze on our test cluster was reasonably effcent; makng ths a smulaton that students could run multple tmes n one day on a quad-core or 8-core system. When lookng at the smulaton results wth N=500,000 the structure of the flaments themselves becomes much more clear, as does the morphology of the knots where flaments ntersect. Scalng of ths problem to 16 processes was reasonably effcent, and whle models wth mllons of objects mght run n days to weeks, dependng on the number of tmesteps requred, students wth access to a 8-core system or small cluster could run models n less than a day to a few days. 3..3 Scenaros for Students to Investgate Two key questons students can try to address wth these models are the senstvty to the ntal mass densty of the unverse of large scale structure and the effect of the expanson of the unverse on large scale structure. An ntal study students may make s to look at the tmescales needed for gravtatonal collapse of a large area of the unverse wth the current mass densty and wthout any expanson. Startng wth a random ntal confguraton, students should see that there s an ntal clusterng nto a flamentary structure and that these flaments feed nto superclusters whch then themselves combne, but that the tmescale for ths happenng s so short compared to the age of the unverse that some degree of expanson s requred to understand the structure of our current unverse. The mass densty n Error! Reference source not found.-error! Reference source not found. shown n the prevous secton, for example, requre a mass 4 orders of magntude greater than observed, and would result n gravtatonal collapse wthn a few bllon years. GalaxSeeHPC has an EXPANSION varable n the nput fle whch allows for a constant expanson rate. The tmescales n whch major change occurs wll vary greatly as the unverse expands, so for practcal purposes t s useful to also have a scaled tmestep that gets larger as the model progresses, and for that reason the student has an opton of settng the tmestep as a rato of the current tme usng the TIMESTEP_RATIO varable rather than as a fxed number. Trackng the ntal formaton of ansotropy back to the pont when gravtaton began wll lkely requre tmesteps and numbers of objects that go beyond the archtecture students have avalable, however the students can stll start wth a largely ansotropc random dstrbuton of ponts at some later tme, such as 1/100 th the age of the unverse, and evolve forward wth mass denstes near the current mass densty of the unverse. By changng the ntal mass, they can see that the dfference between structure never formng, flamentary structure of the type seen today, or a bg crunch s only a few orders of magntude, and that the qualtatve types of structures found n more detaled models can be seen as naturally resultng from a combnaton of self-gravtaton, mass densty, and expanson. Care must be taken n nterpretaton of results. Whle t s possble to ndependently set expanson rate and mass densty n the GalaxSeeHPC nput fle, n practce t would be expected that these two parameters are related. 3.3 Scalng of the N-Body problem An mportant concern wth the N-Body problem s the scalng of the problem, both n terms of how the computng requrements scale wth algorthm and problem sze as well as how well the parallel mplementaton of the problem scales across a parallel archtecture. The frst type of scalng s often referred to n terms of the Bg O of the problem f one were to wrte a functon of the number of total computatons needed as a functon of the problem sze, what term n that wll domnate as the problem sze gets large. In ths sense, a drect force calculaton s order N, and tree and PPPM methods are both order N log(n ). 16 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1 Parallel scalng, on the other hand, s typcally referred to as ether weak or strong. Parallel mplementatons wth weak scalng allow for larger problems to be solved n roughly equal tme on larger (.e. more CPU cores) systems. Parallel mplementatons wth strong scalng allow for same szed problems to be solved n less tme on larger systems. GalaxSeeHPC allows students to explore the bg O scalng of drect, tree-based, and PPPM methods, and to begn explorng questons related to parallel scalng. It should be noted that the parallel mplementaton used n GalaxSeeHPC s lmted n ts parallel scalng, partcularly for moderate and large clusters. Students and teachers nterested n pursung questons related to state-of-the-art tools that exhbt strong scalng on larger systems are encouraged to look at the many professonal-grade N-Body solvers. Of partcular note s Gadget-, whch comples wth standard C complers on many systems and has a farly small number of dependences requred to run. GalaxSeeHPC ncludes an opton to translate ts own nput fles nto Gadget- format. 3.3.1 Tmng and Scalng of Galactc Structure Smulatons Runnng a smulaton wth 1,000 ponts and a Barnes-Hut calculaton as descrbed n the prevous secton, GalaxSee for Wndows requred roughly 17 mnutes on an EEE PC wth 1 1.7GHz Atom chp runnng Wndows XP. Smlar speeds wth GalaxSee for Wndows were seen on a HP ElteBook wth a.5 GHz Centrno runnng Wndows Vsta. GalaxSeeHPC runnng on a sngle process on a Dell PowerEdge 1850 wth a.7ghz Xeon runnng RedHat Lnux fnshed n 1.7 mnutes, on 4 processes fnshed n 31 seconds. For comparson on smlar hardware, GalaxSee-MPI (whch was largely based on the Wndows GalaxSee codebase) usng Barnes-Hut and a 4 th order Runge Kutta took 4 mnutes 18 seconds (GalaxSee-MPI does not currently support ABM ntegraton methods). GalaxSeeHPC usng Runge Kutta 4 took 3 mnutes 4 seconds. Many of these models could be run wth a larger tmestep, brngng runnng tmes on all platforms down (typcal class presentatons for the rotaton and flattenng of a sphercal cluster are done wth tmesteps of 8 mllon years as opposed to 0.5 mllon years), however even for larger tmesteps runnng models wth on the order of 1,000 ponts s the practcal classroom applcaton lmt of GalaxSee Wndows. Wall tmes per tmestep for seral jobs are shown for N=5,000, 50,000, and 500,000 n Error! Reference source not found.. The tree-based mplementaton n GalaxSeeHPC scales closer to N log N than the N squared scalng expected of a drect force calculaton. The parallel scalng of GalaxSeeHPC wth the treebased force calculaton method was consstent across problem szes, scalng to speedups of on the order of 10-15 on our cluster. Effcency typcally peaked once a few 8 core nodes were nvolved n the soluton of the problem. For each of the problem szed tested, parallel effcency dropped to 50% at about 16 processes. All are shown n Fgure 5. 3.3. Tmng and Scalng of Large Scale Structure Calculatons Lke tree-based methods, the PPPM method n GalaxSeeHPC scales as roughly N log(n). The compute tme requred for the force calculaton s domnated by the mappng of ponts to a grd and the nterpolaton of forces on that grd back onto the ponts, combned wth nearest neghbor drect force calculatons. Speedup peaked at around 8 for models wth N rangng from 5,000 to 500,000 on the cluster used n ths study, wth parallel effcency droppng off somewhat faster for the PPPM methods compared to tree-based methods. Results are shown n Fgure 5. 4. VISUALIZATION 4.1 Need for hgher end hardware and software at large N In addton to the computatonal challenges of ncreasng N n GalaxSee by many orders of magntude, the resultng data also poses challenges n how t can be vsualzed, as tradtonal method of fllng n a pxel f there s a mass n the lne of sght for that pxel quck saturates at large N, even for very hgh resoluton mages. Even n relatvely low-densty regons of the smulaton, foreground objects can obscure more mportant detals. Maskng the mage by only showng a subset of ponts can result n loss of detal for structures of nterest. Ths can mpact both the type of hardware and software that s needed for students to work wth large datasets. Whle modest computers wth embedded vdeo may be able to load and render larger datasets, such hardware can experence much longer frame rates when loadng data for a new tme or when attemptng to re-render data for a dfferent perspectve (such as by rotatng a rendered dataset n ParaVew.) Fgure 6 shows the effect of not allowng for any opacty when drawng a large number of pont masses, as well as the loss of resoluton and structure that can occur from maskng ponts. Many vsualzaton packages exst that are avalable to students that allow for advanced features such as changng the opacty of ponts, volume renderng, and creatng contours and slces of regular grdded data. ParaVew[10] and VsIt[16] are two such examples that are avalable as open source, and wll work wth a varety of nput data types ncluded methods of openng smple comma separated fles. The mages created for ths paper were made usng ParaVew. ParaVew s mult-platform, and has been desgned to work n a dstrbuted fashon for massve data sets. Developed by Ktware Inc. and Los Alamos Natonal Laboratory, ParaVew s also supported by Sanda Natonal Laboratory and the Army Research Laboratory. August 014 ISSN 153-4136 17

Journal of Computatonal Scence Educaton Fgure 5 Scalng propertes of example problems. Top row shows seral performance of tree-based algorthm run n seral relatve to drect force calculaton, followed left to rght by speedup and effcency (deal would be 1.0) of tree algorthm n parallel. Bottom row shows seral performance of PPPM relatve to problem sze followed by speedup and performance. 4. Use of CAVE for vsualzaton Addtonally, a CAVE system was used wth students to vsualze the results of GalaxSeeHPC, usng a smple package wrtten n OpenGL wth CAVELb. Renderng was lmted to no lghtng effects and pxels for each mass, and up to N=500,000 could be vewed wth zero maskng and a frame rate hgh enough for the user to walk through the mage wthout notceable lag. The CAVE system used was a three wall system wth ART head trackng and a dedcated render node usng separate NVda Quadro cards for each wall. Our ntal use of the CAVE has focused on the feasblty of usng t for educaton. Techncally, we wanted to know whether there were easy methods of gettng student data nto the CAVE and whether t would provde an obstacle that nterrupted class flow. Fgure 6 N=500,000. Shown at left s wthout any maskng or opacty. At rght maskng s used, but no opacty s enabled to enhance vsualzaton. 5. PEDAGOGICAL CONCERNS 5.1 Sample Lesson Plan GalaxSeeHPC s meant to be a general purpose pedagogcal tool around whch a varety of lessons mght be bult, focusng on both topcs n computatonal scence educaton as well as topcs n physcs and astronomy. The followng lesson plan s desgned based on past use of GalaxSeeHPC wth hgh school students. It assumes the use of a helper code GalaxSeeUI, avalable on the Sourceforge ste for 18 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1 GalaxSeeHPC, to generate nput fles for the nvestgaton of spral galaxy shapes. Subject: Physcs Grade: 11-1 Lesson Length: 90 mnutes ( classes of 45 mnutes) Ttle: Galaxy Structure Smulatons Usng Computer Applcatons Overvew: Galaxes are large collectons of stars rotatng around a central pont n space, whle movng about n the unverse. These bodes of stars tend to crash and collde wth each other, and take on new and varyng forms. Through Hubble, galaxes have ganed classfcatons based on ther structures as they form over tme. Ths lesson wll have the students learn about how to classfy galaxes by ther structure, usng Hubble s model and computer smulatons of ther own desgn. Preparatons and Materals: The teacher should become famlar wth the GalaxSeeUI applcaton, GalaxSeeHPC applcaton, and the ParaVew applcaton. (GalaxSeeUI applcaton s avalable on Sourceforge ste along wth GalaxSeeHPC and can be used to generate nput fles for ths lesson.) The teacher wll need an nternet browser n order to access ths ste http://cosmctmes.gsfc.nasa.gov/teachers/gude/199/g ude/classfyng_nebulae.html The teacher should have a text edtor, such as notepad, loaded along wth the applcatons, and some way of dsplayng all of the applcatons on the montor to the class. The students wll need access to computers to utlze text edtng software, n order to generate ther ntal condtons. Objectve: Students wll be able to dstngush the dfferent galactc structures, usng the tunng fork model and computer smulatons. Students wll be able to apply ther prevous computer knowledge to generate nput fles for GalaxSeeUI and utlze ParaVew. Students wll be able to compose an argument about ther own observatons and defend ther pont of vew. Students wll be able to nfer thngs about natural phenomenon based off of the actvtes conducted durng ths lesson. Standards: NSES.9-1.A: Scence as nqury. o Use technology and mathematcs to mprove nvestgatons and communcatons. o Communcate and defend a scentfc argument. NSES.9-1.D: o Orgn and evoluton of the unverse. NSES.9-1.E: o Understandng about scence and technology. DoDD.Scence.9: o Use of computatonal models. o Use careful systematc observaton and data collecton to obtan vald nformaton. o Relate force, moton, energy, and power. Procedure and Actvtes: Day 1 45 mnutes 1. The teacher wll defne the term galaxy. a. Galaxy - a system of stars, numberng n the mllons to bllons that, along wth gas and dust, are held together by gravtatonal attracton. b. An example that can be gven s the Andromeda galaxy, the closest spral galaxy to the Mlky Way galaxy.. The teacher wll defne the types of galaxes: a. Ellptcal - a galaxy, generally havng an ellptcal shape and no obvous nner structure or spral arms b. Spral - a galaxy, that exhbts a central nucleus from whch many curved arms extend c. Bar Spral a galaxy, that contans a central bar structure from whch two large arms extend d. Irregular a galaxy, that cannot be labeled by the prevous defntons 3. The teacher wll utlze the Hubble Classfcaton, or the Tunng Fork Dagram, to dscuss the development of galactc structure over tme. A verson of ths dagram can be found on: http://skyserver.sdss.org/dr1/en/proj/advanced/galaxes/t unngfork.asp 4. The teacher wll show students a general nput fle of GalaxSeeUI n Notepad, and wll explan to the students the proper way to nput and save the data. All fles that are beng submtted to GalaxSeeUI are.n fles and can be saved wth ths extenson when savng and asked to name fle (Example: test.n). 5. The students wll be pared nto small, 3-4 person, groups to work on ther own nput fles. 6. The students wll utlze the computers to create an nput fle, usng Notepad, followng the teacher s example on how to setup the text fle and save t wth the proper extenson. 7. The teacher wll tell the students to fnsh what they are dong, and to return to ther groups. The teacher wll, then, have the students choose one member of ther group to submt ther fle to GalaxSeeUI, and save the fle to a folder, the teacher should have access to ths folder. Advse that ths folder should be a shared folder that the entre class can access, but the teacher can control. 8. The teacher wll show a varety of galaxy pctures to the students, and ask the students to make a classfcaton of the galaxy s structure, as well as provde ther reasonng for comng to that concluson. August 014 ISSN 153-4136 19

Journal of Computatonal Scence Educaton Example mages can be found on: http://hubbleste.org/gallery/album/galaxy/ 9. The teacher wll ask f there are any fnal questons or comments, and conclude the lesson. Ths tme can also be used to ad the students wth any errors that may arse. Day 45 mnutes 10. The teacher wll show the students how to start ParaVew, and how to confgure ParaVew to read n ther data fles. The teacher wll, then, show the students how to play ther anmaton, and how to download the mages needed to examne the structure of the galaxes. 11. The teacher wll have the students retreve ther data from, a flash drve the teacher controls or, the folder used prevously. The folder should have ndvdual folders wth the group of students names, and nsde the folders should be the nput fle the students created, and the output from GalaxSeeHPC. 1. The students wll observe ther galaxes, analyss the results they note, and make an educated concluson on the structure of ther galaxy. 13. The teacher wll nstruct the students to use ParaVew to take a pcture of ther ntal step and ther fnal step. 14. The student groups wll share ther results wth the class, havng the students present a small summary of ther results and makng ther fnal classfcaton of ther galaxy. 15. Durng the endng to the perod, make references to stable and unstable ntal condtons. Care must be taken to dfferentate between a student s set of ntal condtons and actual data. Possble wordng would be to always refer to the students smulatons as models and never as a galaxy. a. Stable ntal condtons such that the model does not exhbt overall change n structure or makeup as the smulaton evolves. A stable smulaton that addtonally exhbts behavor smlar to data s one n whch the ntal condtons are lkely to correspond wth real galaxes. b. Unstable smulaton exhbts behavor that changes greatly durng evoluton, partcularly changes n the sze, rotatonal speed, and overall geometrc makeup. Ths may be due to numercal nstablty (have students try reducng tmestep), or t may be due to ntal condtons that are not physcally lkely. 16. The teacher wll ask f there are any fnal questons or comments, and conclude the lesson. Ths tme can also be used to ad the students wth any errors that may arse anywhere durng the lesson. Extensons: 17. Show the students how to plot the velocty of ther galaxes n ParaVew as star color. Show them how the velocty curve of ther galaxes plays a role n how stable ther structure s. 5. Choce of Tme-step Currently none of the versons of GalaxSee (GUI-based, the orgnal command-lne MPI, or the latest GalaxSeeHPC release) allow for an adaptve tmestep n soluton. Whle ths change s planned n the future, ths makes t especally mportant that an approprate tme-step s used. Even wth professonal grade codes usng hgher order and/or adaptve ntegraton schemes, great care must be taken wth choce of tme-step. If students are not famlar wth tme-steppng methods, they should get some nformaton on the drawbacks of a tme-step that s ether too large or too short. Any of the versons used wth some form of vsualzaton (GUI-based have bult n vsualzaton, GalaxSee-MPI and GalaxSeeHPC have the opton of complng X-based vsualzaton nto the program f supported by your platform) wll show ths clearly, wth demonstrably wrong results and nstabltes occurrng wth too large a tme-step, and wth vsbly slower computaton occurrng wth too small a tme-step. 5.3 Choce of Integraton Method Integraton methods avalable n GalaxSeeHPC mrror some of the more standard optons used, as well as some optons that are pedagogcally easy to ntroduce yet not stable enough for professonal work. The Euler method s ncluded for pedagogcal purpose as t s often the frst numercal ntegraton method students learn, and the easest to code. The so-called mproved Euler or second order Runge-Kutta scheme as well as the mdpont Euler method and leapfrog methods are also allowed n the code as these are often ntroduced n numercal analyss classes as ncremental mprovements to Euler s method. In practce, however, one would not want to run professonal ntegraton wth these schemes. The fourth order Runge-Kutta algorthm s generally consdered the smplest numercal ntegraton scheme one would want to use for professonal work, and s a standard method used across computatonal scence dscplnes. Addtonally, predctor-corrector schemes, such as the Adams- Bashforth-Moulton avalable n GalaxSeeHPC attempt to use prevous tmesteps to better predct future behavor. For anythng other than nvestgatng the numercal mpact of usng lower order ntegraton schemes, students should use ether the fourth order Runge-Kutta or Adams-Bashforth-Moulton ntegrators. 5.4 Lmtatons of GalaxSee-MPI The prmary lmtatons of GalaxSee-MPI from a classroom perspectve was the nablty to use t to teach any concept beyond whch t was orgnally ntended. GalaxSee-MPI as frst wrtten was desgned to show scalng of the parallelzaton of drect force calculaton usng MPI, however all of the features of prevous versons of GalaxSee that made t a useful tool for classroom exploraton had been removed the ablty to easly modfy nput for new scenaros, the ablty to desgn nput fles to meet your own problem, the ease of vsualzaton had been removed n makng a command lne verson of the program. Movng the program to a command lne verson n a HPC envronment, however, dd allow for much larger values of N whch the vsualzaton abltes of early versons of GalaxSee would not handle well anyway. Addtonally, over many years of usng GalaxSee-MPI n faculty workshops wth Physcs faculty, many faculty expressed skeptcsm as to whether there would be beneft for ther students to runnng N-Body smulatons wth a larger value of N whether 0 ISSN 153-4136 August 014

Journal of Computatonal Scence Educaton Volume 5, Issue 1 there was anythng the students would learn at large N that they would not learn at small N. Also, the lack of a feature to allow for perodc boundary condtons lmted the types of stuatons that could be modeled. From a techncal perspectve, the use of GalaxSee-MPI n new envronments was often hampered by the choce of C++ as a language. Whle C++ s largely standard and wdely adopted as a language, the C++ verson of GalaxSee-MPI suffered from portablty ssues as t was deployed on dfferent clusterng platforms. The dependency on specfc standard lbrares often caused software to fal to run as expected, and dfferent mpcxx executables, from one MPI mplementaton to another, often requred mnor code changes to n order to deploy the software on a new platform. 5.5 Changes Made The followng feature comparson shows changes made n GalaxSeeHPC compared to prevous GUI based and command lne based versons. Feature GUI versons GalaxSee MPI GalaxSee HPC Runs from nput fle Users can specfy ndvdual partcle propertes Problem scale Change ntegraton method (Euler, Improved Euler, RK4, ABM) Choose from menu lst User specfed Barnes-Hut PPPM Passve vsualzaton (wth X11) (wth X11) Interactve vsualzaton (wth SDL) Command Lne opton MPI Wrte to snapshot fles Addtonal output optons Softened potental Adaptve sheld radus Adaptve sheld radus Adaptve sheld radus fxed softened potental 5.6 Effect of Modfcatons One concern n movng to GalaxSeeHPC was whether the removal of the GUI component would make exploraton of scence questons sgnfcantly more dffcult for students usng GalaxSee. In prevous workshops wth students, typcal use was to use the Wndows, Mac, or Java verson of GalaxSee when explorng scence questons and to use the GalaxSee-MPI verson of the code when explorng problems wth parallel effcency and scalng. Our frst use of GalaxSeeHPC n a nformal educaton or settng n summer 010 dd show that the constant flow back and forth between wndowed versus command lne envronments slowed the pace of actvtes down, and when gven the choce students tended to stck wth the GUI drven tools. In summer 011, we focused more specfcally on usng the command lne tools, wth more nstructon on the use of the command lne nterface and actvtes that ncluded vsualzaton of results solved wth larger N n a CAVE envronment, whch seemed to make for a more natural use of the command lne drven HPC tools. Snce the move to C, we have seen sgnfcantly reduced ssues wth portablty. The new verson of the code has been tested on multple platforms wth both GNU and Intel complers. 5.7 Vsualzaton tools Any effort to brng scalable supercomputng applcatons nto the classroom wll need problems of sgnfcant sze to (a) requre supercomputng resources, and (b) scale on those resources. Ths provdes an addtonal concern for the educator n that large problems produce large sets of results, and vsualzaton of those results wll need to be part of the plan for mplementng the use of such tools n the classroom. The use of common data formats s encouraged n order to be able to make the best possble use of open source vsualzaton tools. Comma Separated Value text fles provde a low barrer for creaton of fles, and are readable by many vsualzaton tools, but wll typcally requre the confguraton of many optons wthn the tool to defne how the CSV fle should be nterpreted. Other Common Data Formats, such as NetCDF or HDF, are well supported by the open source communty, and are standard nput fle formats for most vsualzaton tools, however ths wll provde an addtonal challenge for mplementaton as code lbrares for those formats may have to be nstalled on the systems on whch students are computng ther results. 5.8 Storage lmtatons Another concern for problems nvolvng large N, partcularly n a classroom stuaton n whch many students wll be runnng multple sets of such models, s dsk storage. For our N=500,000 models, 30Mbytes per snapshot was typcal, stored n NetCDF format. Keepng enough snapshots to create a smooth anmaton for N=500,000 typcally requred 3Gbytes per smulaton. Storage requrements were lnear wth N. 6. FUTURE WORK 6.1 CAVE Vsualzaton Our ntal work n ncorporatng the CAVE nto the vsualzaton of GalaxSeeHPC has focused prmarly on techncal ssues of how to get the data nto the CAVE as well as the feasblty of ncorporatng a CAVE system nto the flow of a class. Whle our general fndng s that stereo mmersve vsualzaton, as t s nherently focused on one ndvduals pont of vew, s dffcult to use n a large class settng t can be nspratonal for students. We notced a clear wow factor when brngng partcpants nto the CAVE. It s easer to ncorporate mmersve vsualzaton nto ndvdual student projects, as there s less of an ssue wth contenton for the resource. Our ntal work wth students has used custom wrtten software, and we are nvestgatng whether we can replace ths by usng VsIt, for whch a Condut nterface exsts, or ParaVew, whch has been ported to other CAVE systems usng FreeVR. We have not yet nvestgated whether partcpants learn dfferently n an mmersve envronment from a non-mmersve August 014 ISSN 153-4136 1

Journal of Computatonal Scence Educaton envronment, or from vewng 3-D data n other, non-mmersve, stereo vsualzaton systems. Whle CAVE systems are unlkely for typcal classroom use, students may consder usng non-mmersve stereo renderng n ParaVew through more readly avalable 3D montors, TVs, or projectors. 7. ACKNOWLEDGEMENTS The GalaxSeeHPC revsons were funded as a module project under the Blue Waters Petascale Educaton program. The cluster used for ths project was funded by NSF award OCI-7790. The CAVE used n ths project was funded by NSF award OCI- 0959504. 8. REFERENCES [1] Aarseth, S. 003. Gravtatonal N-Body Smulatons. Cambrdge Monographs on Computatonal Physcs. [] Barnes, J.E. and Hut, P. 1989. Error analyss of a tree code. Astrophyscal Journal Supplement. 70, (Jun. 1989), 389 417. [3] Chrstan, W. 010. EJS CSM Textbook Chapter 5: Few- Body Problems. An Introducton to Computer Smulaton Methods - Draft EJS edton. [4] Efstathou, G. and Eastwood, J.W. 1981. On the clusterng of partcles n an expandng unverse. Monthly Notces of the Royal Astronomcal Socety. 194, (Feb. 1981), 503 55. [5] GalaxSee Currculum Resources: http://www.shodor.org/master/galaxsee/. [6] GalaxSeeHPC: http://sourceforge.net/projects/galaxseehpc/. [7] Hegge, D. and Hut, P. 003. The Gravtatonal Mllon- Body Problem. Cambrdge Unversty Press. [8] Joner, D.A. et al. 008. Supercomputer based laboratores and the evoluton of the personal computer based laboratory. Amercan Journal of Physcs. 76, 4 (008), 379. [9] Mhos, C. et al. 1999. GalCrash: N-body Smulatons on the Student Desktop. Amercan Astronomcal Socety Meetng Abstracts (Dec. 1999), #101.04. [10] ParaVew - Open Source Scentfc Vsualzaton: http://www.paravew.org/. Accessed: 01-04-5. [11] Petascale: GalaxSeeHPC: 011. http://www.shodor.org/petascale/materals/upmodules/nb ody/. [1] Rood, H.J. and Sastry, G.N. 1971. Tunng Fork Classfcaton of Rch Clusters of Galaxes. Publcatons of the Astronomcal Socety of the Pacfc. 83, (Jun. 1971), 313. [13] SEDS Messer Database: http://messer.seds.org/. [14] Sprngel, V. et al. 005. Smulatons of the formaton, evoluton and clusterng of galaxes and quasars. Nature. 435, 704 (Jun. 005), 69 636. [15] Sprngel, V. 005. The cosmologcal smulaton code gadget-. Monthly Notces of the Royal Astronomcal Socety. 364, (005), 1105 1134. [16] VsIt Vsualzaton Tool: https://wc.llnl.gov/codes/vst/. Accessed: 01-04-5. ISSN 153-4136 August 014