First Code Improvement, Milestone E, code PARK TITLE OF AGREEMENT: Numerical Simulations For Active Tectonic Processes: Increasing Interoperability And Performance AGREEMENT NUMBER: JPL Task Plan No. 83-6791 TEXT OF MILESTONE: code PARK with 150,000 elements for 5000 time steps on 256 processors Uses MPI for parallel earthquake code and parallel multipole library. PROBLEM BEING SOLVED: Compute the history of slip, slip velocity, and stress on a vertical strike-slip fault that results from using state-of-the-art rate and state frictional constitutive laws on the fault for a specific geographic setting at Parkfield, California. The boundary conditions are those appropriate for Parkfield and the distribution of constitutive properties on the fault zone are as realistic as our ability to characterize the subsurface properties of the fault there allow. The methods developed in solving this problem can be generalized to other geologic settings in which the fault geometry, the boundary conditions are not so simple and multiple faults are involved. DESCRIPTION OF COMPUTER CODES USED: The main program is a boundary element program that determines the stress on every element of the fault surface due to slip on every other element, using a Greens function approach. The fault constitutive law is used to determine what the slip velocity will be for that stress and this velocity multiplied by the time step gives the slip to be used to calculate the stress in the next time increment. This involves the forward time integration of coupled ordinary differential equations. The integration is done with a fifth order Runge-Kutta scheme with adaptive step size control. Because the time-steps range over ten orders of magnitude, depending on whether the fault is slipping very slowly in the interseismic period or very fast during an earthquake, the adaptive step-size control is an essential element in the solution. The main program calls a variety of subroutines and the one of these subroutines that calculates the derivatives used in the forward time integration itself calls a Fast Multipole library that is suitable for such Green's functions problems. The Multipole approach allows a number of computations to scale as N log N rather than N^2 as would otherwise be the case. The particular Fast Multipole approach being used allows determination of the degree of grouping of the remote cells based on an analytical approximation to the Greens function. In order to reduce computation time it also renumbers the elements so that those that are near in space are also near in memory. The main program and most of its subroutines are written in Fortran 90. At the time of the First Code Improvement Milestone these programs have been converted to use MPI to run in parallel. The Fast Multipole library already is also written in parallel using MPI. Documentation: Contained herein and in the subdirectories under the 1st_Code_Improv_Milestone directory in which this file is found. SCALING ANALYSIS In the directory scaling, found in the same 1st_Code_Improv_Milestone directory in which this file is found, are a number of files that show how the job scales both with number of elements and number of processors. Thirty six scaling runs were done, involving all the combinations of the number of elements used (712, 5292, 15000, and 150000) and number of processors used (1, 2, 4, 8, 16, 32, 64, 128, and 256). All these scaling runs were run for 100 time steps, whereas the full 150000 element, 256 processor First Code Improvment Milestone run was done for 5000 time steps. In the scaling directory is a data table giving the walltime for all 36 scaling runs, as well as five plots showing dependence of walltime, speedup, efficiency, and overhead on number of elements and number of processors. The scaling data show that not much speedup is gained by going from one to two processors, and this suggests that we need to examine why this is true, since it could lead to a understanding of the behavior that might allow us nearly a factor of two in efficiency in future runs. For the largest job (150000) elements, the efficiency and overhead are nearly constant from 2-8 processors. Efficiency falls off between 8 and 16 processors and is constant from 16-32 processors. For 64, 128 and 256 efficiency falls off gradually. This falloff is presumably due to an insufficient number of elements per processor, the numbers being 2343, 1171, and 585, respectively, as the plots show. This effect is seen even more dramatically for the jobs with a smaller number of elements because as the number of processors increases the number of elements per processor gets so small that a large amount of time is spent communicating between processors. The falloff bewteen 8 and 16 processors on the 150000 element problem suggests that the optimum number of elements per processor may be about 20000. LOCATION FOR CODES, DATA, ETC.: Code and documentation can be found in two places, a web site (http://www.servogrid.org/slide/GEM/PARK/) and on turing and chapman, two of the SGI Origin 3000s at NASA Ames. The web site is the only public location. The machine turing is the front end to the machine chapman on which the First Code Improvement Milestone was run. For the purposes of NASA's verifying that the First Code Improvement run is as described in the Milestone_Certification_Data file and repeating the run if desired, it may be easier to use the copies of the documentation, files, etc. that are located on turing or chapman, since no compression, taring or compilation is needed there. In addition, because two copyrighted, but easily and inexpensively available, subroutines are used that cannot be posted on the public web site, it is also easier to verify the behavior on the NASA Ames machines where these subroutines are available in the src-bin subdirectory of the /u/tullis/1st_Code_Improv_Milestone directory. Verification can also be done by the public or by NASA officials by using materials taken from the web site and by purchasing the subroutines if onedoes not already have them. The directory structure on turing, chapman, and at the website are the same to make it easier to compare each to the others. Within the appropriately named subdirectories under the 1st_Code_Improv_Milestone directory in which this file is found can be found all the necessary material that describes the First Code Improvement Milestone and gives instructions that would allow one to duplicate it. Included in the "in" and "out" directories are all the materials from the First Code Improvement Milestone run with 150000 elements and 256 processors for 5000 time steps. For code testing purposes on one's own system it is useful to set the number of time steps in the prk.dat.150003 file to a smaller number than 5000 for the initial run; even 2 would be reasonable for the first run. The materials in these directories include: Milestone_Certification_Data.txt - a file that give the time required for the First Code Improvement run and describes various parameters of the run. README-setting_up_input_files.txt - a file that tells one how to understand the input files including an explanation of how the elements are created from the input files. README-Compile.txt - a file that tells how to create both the multipole library and the PARK fault files using the appropriate Makefiles. in - a directory that contains the input files that were used in the First Code Improvement run. out - a directory that contains the output files that were generated in the First Code Improvement run. src-bin - a directory that contains the PARK and related fault application files used in the First Code Improvement run. The versions of this directory on turing and chapman also have the object files and executable binary file (named park) scaling - a directory that contains data and plots showing how execution time depends on number of elelents and processors. One file is a table giving the walltime for all 36 scaling runs. One file is a Microsoft Word file with 5 imbedded plots that show how 1) execution time depends on number of elements, and the dependence on number of processors of 2) execution time, 3) speedup, 4) efficiency, and 5) overhead. In addition these 5 plots are also contained, one each, in 5 separate tiff files. downloads - a directory that contains two unix-compressed tar files, PARK_Package_1st_Improv.tar.Z and PARK_Package_NR.tar.Z, that allow one to generate the files needed for the First Code Improvement run. The file PARK_Package_1st_Improv_NR.tar.Z contains all the programs that are needed, including two copyrighted subroutines from the Numerical Recipes book. This version is only on turing, not on the public web site. It can be copied by NASA officials for verification of the program behavior and saves one making the additions needed for the other version, PARK_Package_1st_Improv.tar.Z, that does not have the two Numerical Recipes subroutines. See either the README-src-bin.txt file in the src-bin directory or the header for the park.f file to learn what needs to be done to create these Numerical Recipes subroutines. Except for the presence or absence of these two subroutines, both of these tar files will create the Multipole library, the source files for the PARK fault application, the input files for the First Code Improvement Milestone run. t17-7 - A directory containing the Fast Multipole library. This is not explicitly included in the directory structure on the web site, but is in the directory sturucture that will be created when the downloads are obtained. It is in the directory structure on turing and chapman. The tar files were created on turing in the following way. The libsw.a and the mpmy_seq.o files were moved from the t17-7/Objfiles/IRIX64 directory to another temporary location. The object files, the *.mod files, and the executable file (park) was moved from the src-bin directory to another temporary location. For creating the PARK_Package_1st_Improv.tar.Z the numrec.f file that contains the two Numerical Recipes subroutines was also removed to another temporary location, whereas for creating the PARK_Package_1st_Improv_NR.tar.Z this file was not removed. Then while in the home directory the following command was issued to produce the versions without and with the Numerical Recipes subroutines: tar -cvf PARK_Package_1st_Improv.tar 1st_Code_Improv_Milestone or tar -cvf PARK_Package_1st_Improv_NR.tar 1st_Code_Improv_Milestone Then the .tar files were compressed by issuing compress PARK_Package_1st_Improv.tar and compress PARK_Package_1st_Improv_NR.tar to produce the PARK_Package_1st_Improv.tar.Z and the PARK_Package_1st_Improv_NR.tar.Z files. SIGNIFICANCE OF ACHIEVING THE MILESTONES: Achieving the First Code Improvement Milestone is significant because it opens the way to run significant sized problems now that the earthquake code as well as the Fast Multipole library run in parallel under MPI. For the first time it presents to the scientific community fast parallel codes that allow creating simulations of the entire earthquake cycle in a 3D model that uses the most accurate description of fault friction, rate and state friction, and the quasi-dynamic radiation damping approximation to full elastodynamics. We now have the potential for greatly increasing the number of elements that can be included in the model over what could be done in the past. This means that enough elements can now be used that is it possible to represent a reasonably sized fault with elements that are small enough that they can properly represent the behavior of a continuum. Larger numbers of elements also allow occurrence in the simulation of earthquakes with a large range of sizes. This means that it will be possible to study in the simulations in what situations small earthquakes occur in isolation and in what situations they may cascade or grow into larger ones. This could help gain an understanding of whether patterns of microseismicity might be used to help predict earthquakes. The attainment of this milestone not only represents an advance in our computational ability to simulate earthquakes, it will allow us to understand the earthquake process better by creating data sets that can be compared with data on real earthquakes. The attainment of the next milestone (Second Code Improvement) will involve increasing the efficiency of the code in other ways, now that the parallel implementation has been achieved, and this will allow even larger and more realistic simulations to be run.