Tags:
create new tag
view all tags

Magnetic Extrapolation MPI on NGS

The NGS clusters can be accessed via a gsissh session into globus. After grid certificate has been renewed, NGS account requested, and account activated, NGS will add user to mailing list and send instructions to log into NGS for the first time and change password. Do this, then use applet as below.

NGS GSISSH Applet

  • http://www.grid-support.ac.uk/
  • Main Menu: Grid Utilities -> GSI-SSH Term (requires port 2222 outbound open; also requires Java JDK 1.5 installed locally)
  • Applet: File -> New Connection
    • Host to connect to: grid-data.rl.ac.uk
    • Grid Certificate / Proxy dialogue: click "Use certificate from browser"
    • Grid Authentication dialogue: select browser, click oK
    • (NOTE: sometimes get "Could not establish a connection to host. Error from GSS layer: GSS Exception: Operation Unauthorized" error - keep trying)

Loading modules on grid-data.rl.ac.uk

  • To see which modules are available: module avail
  • To add a module: module add srb ("srb" as example)
  • Note: adding "clusteruser" will add most modules in one go: module add clusteruser
  • text editor: use nano - nano myfile.txt

Uploading/downloading source and data files to/from NGS (using the GSI-SSHTerm client)

  • Select 'SFTP Session' (Alt+B) from the 'Tools' drop-down menu on the GSI-SSHTerm client toolbar.
  • Check that a client pop-up window appears displaying the contents of the user's home directory on NGS.
  • Select 'Upload Files' (Ctrl-V) or 'Download' (Ctrl-D) as appropriate from the 'File' drop-down menu on the pop-up toolbar.

Uploading source and data files onto NGS (from MSSL)

  • Add files for transfer to NGS to the /disk/ftp/pub/group/username directory on the MSSL SAN, where group is the MSSL group to which you are assigned and username is your personal lab account.
  • At the gsissh session commandline type: 'ftp'
  • Type 'open ftp.mssl.ucl.ac.uk'
  • At the username prompt type 'anonymous'.
  • At the password prompt type your lab e-mail address. A successful login will place the user in the disk/ftp/pub directory of the MSSL SAN.
  • cd to the directory where the transfer files are stored and use ftp 'get' commands as usual.
  • Close ftp session.
    Note: Files can only be pulled from the MSSL anonymous ftp server, not pushed.

Installation of Magnetic Extrapolation code

  • Log into gsissh session
  • Upload latest magnetic extrapolation code (including libcfitsio.a library) to user account using one of the above transfer methods.
  • Modify Makefile pathnames.
  • Modify Makefile executable commandline to use the NGS MPI library instead of libmpich.a library supplied with code as follows: mpicc -o relax2 $(OBJS) -I /usr/local/Cluster-Apps/mpich-1.2.5..10-gnu/include -L /usr/local/Cluster-Apps/mpich-1.2.5..10-gnu/lib -lmpich -L $(BASE) -lcfitsio -lm Note: The NGS Phase-2 clusters use 64-bit MPI libraries and do not support the mpich-1.2.5..10-gnu library, so the Makefile will need to be tweaked accordingly.
  • Should now be able to run make and create executable relax2

Execution

Submit using globus-job-submit? (Batch job that returns prompt to user immediately; other option is interactive globus-job-run, but no great luck with this.

  • (Wrong) submit job:
    [ngs0438@grid-data src]$ globus-job-submit localhost/jobmanager-pbs -np 4 -x '(job type=mpi)(environment=(NGSMODULES clusteruser gm mpich-gm pbs))' relax2 22 1000 ../test/LowLouCase2.fits LowLouCase2.grid_ini output.fits
  • result:
    https://grid-data.rl.ac.uk:64004/8931/1177713626/
  • get job status (not submitted, pending, running, done):
    [ngs0438@grid-data src]$globus-job-status https://grid-data.rl.ac.uk:64004/8931/1177713626/
  • result:
    DONE
  • get results (here's where it goes wrong, or at least blank):
    [ngs0438@grid-data src]$ globus-job-get-output https://grid-data.rl.ac.uk:64004/8931/1177713626/ (ECA stopped: blank result from globus-job-get-output.)
  • Success! Correct execution:
    [ngs0438@grid-data ngs0438]$ globus-job-submit grid-data.rl.ac.uk/jobmanager-lsf -np 4 -x '&(jobtype=mpi)(environment=(NGSMODULES clusteruser:gm:mpich-gm:pbs))' /home/ngs0438/MagExtrap/src/relax2 22 1000 /home/ngs0438/MagExtrap/test/LowLouCase2.fits /home/ngs0438/MagExtrap/test/LowLouCase2.grid_ini /home/ngs0438/output.fits
  • cleanup job : (Terminates running jobs and removes any temporary file(s) created):
    [ngs0438@grid-data src]$ globus-job-clean https://grid-data.rl.ac.uk:64004/8931/1177713626/

  • Note: If a job needs to be submitted frequently, it's more convenient to add the globus-job-submit command to a shell script. However, if using the vi editor to modify the number of cpus within the text, please check that the correct number of processors has been entered. Occasionally a newly entered value might be repeated leading to an invalid number of cpus being assigned. For example, '4' is entered but '4444' appears in the text. Please be aware that if an illegal number of cpus is entered the globus job manager won't report the error - the job status will simply be returned as "PENDING" indefinitely.

Error logs

Each "globus-" command submitted generates a log in user's home directory of format gram_job_mgr_XXXXX.log. Each log includes detailed steps for the job submitted and theoretically tells the user where it failed.

Available machines

Once the user has signed onto a gsissh session with grid certificate, any machine can be accessed via globus using gsissh machine.name.domain without further authentication. For example, gsissh grid-compute.oesc.ox.ac.uk.

  • grid-compute.oesc.ox.ac.uk
  • grid-compute.leeds.ac.uk
  • grid-data.rl.ac.uk
  • grid-data.man.ac.uk

Tutorials, help pages, useful links

NGS MPI Experiments

The Wiegelmann Magnetic Extrapolation algorithm splits the computation into 2 distinct phases:

  1. Calculation of a Potential Field
  2. Optimization

The Potential Field calculation is an example of an "Embarrassingly Parallel" computation. This defines the case where a dataset can be divided between several cpus and each allowed to work independently on its own subset of without the need for data transfer between processors.

By contrast, the optimization phase requires some data transfer between processors during each iteration.

Initial MPI experiments were out using 1, 2 then 3 single-cpu MSSL lab computers to calculate the potential field part of the extrapolation. The set-up and results for these can be seen on the MagExtrapDeployment page of the Twiki.

The optimization phase of the code was then parallelized and after successful testing across 2 cpus on the MSSL network the code was transferred to the NGS where a larger number of processors could be utilised.

MPI timing experiments were carried out for combined potential field and optimization calculations (relax2 22 10000 ...) using the Low & Lou Test Case#2 and the Titov-Demoulin Test Case#2, both single-boundary (photospheric) datasets. Results are shown below.

Results of MPI timings

Table 1 - Computation times (LowLou Science Case #2 - single boundary only)

No of CPUs Time (secs)
1 850
2 485
4 345
8 330

Table 2 - Computation times (TitovDemoulin Science Case #2 - single boundary only)

No of CPUs Time (secs)
1 29880
5 7620

Table 1 shows timings for Low-lou Case #2, an 80 x 80 x 72 pixel datacube. It demonstrates a law of diminishing returns, i.e. there is an upper limit to the number of processors that can be effectively used in computing the extrapolation. Data transfers between cpus are necessary but are slow compared to the speed of cpu calculations and represent the main computational bottleneck. As the number of cpus is increased, so is the volume of data traffic between cpus, partially cancelling out the gains made from using more processors.

In this example ... TBD

Data transfer delays can be mitigated somewhat by the use of a fast network. Communication between MSSL cpus is via a standard 100 Mb/sec Ethernet, while NGS uses the much faster Myrinet network which can achieve a throughput that is an order of magnitude higher.

The Table 2 shows timings for Titov-Demoulin Case#2, a 150 x 250 x 100 datacube.

Hinode Datasets

Hinode datasets have been made available by the NLFFF workshop group to allow comparison testing of each of the algorithms. Each dataset is 320 x 320 x 256 and saved in IDL .sav format. The datasets are too large to be run on standard NGS clusters (where each CPU has 4GB of physical memory) - even when all double arrays are converted to floating point - and will need to be deployed on the newer (phase-2) cluster configurations. Phase-2 clusters feature more CPUs, each with at least 8GB of memory per CPU.

Note: Phase-2 clusters require the code to be re-compiled for a 64-bit architecture and the job-submission syntax differs from that used on the older cluster configurations.

Hinode dataset description

The following describes how the Hinode datasets were created. Please note that the datasets were originally saved in IDL .sav format as pseudo Vector Magnetograms but the data have been extracted into Ascii format and then converted into FITS in accordance with eSDO requirements. FITS files for 2006-12-12 (prior to an X-flare) are available in the Magnetic Extrapolation algorithm software package for running in MultiGrid-like mode.

The high resolution Hinode data is embedded in MDI data to increase the field-of-view, rebinned, and then a smaller subset of the resulting data extracted. There are some unavoidable artifacts at the boundary between the Hinode and MDI data which you may be visible. The MDI magnetogram in which the Hinode data was embedded was supplemented with a potential field extrapolation so that the MDI data appears to be a vector magnetogram.

The Hinode data have been binned by a factor of 4 from the full resolution (a factor of 2 binning on the spacecraft and another factor of 2 during the data preparation). The final resolution is 0.634 arcsec per pixel. Since the Hinode data are not at disk center, the magnetograms were warped onto a heliographic grid so that they appear as they would had they been at disk center. Hence the Cartesian codes should correctly handle these data sets.

The resulting datasets are 320x320. The full vector field is provided everywhere, though the transverse field is potential for the area that is from the MDI data. If you look carefully, you will see a boundary between the Hinode and MDI data since the measurements from the two instruments have very different spatial resolutions. Even after embedding in the MDI magnetograms, the datasets are not completely flux balanced, however, the walls of the supplied potential field do account for flux outside of the Hinode field-of-view.

-- MikeSmith - 19 Sep 2007

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | More topic actions
Topic revision: r16 - 2007-09-27 - MikeSmith
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback