Tags:
create new tag
view all tags

eSDO 1311: Data Centre Implementation Plan

This document can be viewed as a PDF.
Deliverable eSDO-1311
E. Auden, R. Bogart, L. Culhane, A.Fludra, M. Smith, K. Tian, M. Thompson
Mullard Space Science Laboratory
31 March 2005

Introduction

The Solar Dynamic Observatory will produce approximately 2 TB of raw data per day from two instruments on board, the Atmospheric Imaging Assembly (AIA) and the Helioseismic and Magnetic Imager (HMI). The data from these instruments will be of interest to the helioseismology and non-helioseismology solar communities in both the US and the UK. Making SDO data readily available through the virtual observatory will allow scientists access to AIA and HMI observations as soon as possible.

Raw data from the satellite will initially be downloaded to the NASA White Sands facility. The data will then be transferred to the Joint Science Operations Committee (JSOC) data facility at Stanford University as well as an off-site storage facility in line with NASA’s data security policy.

Data Products

US Data Centre

Four levels of data products will be produced by the JSOC pipeline. A full list of data products is in development by the JSOC, but the current assessment of data levels and products is below:

  • Level 0 data products will be filtergrams (32 MB each) and telemetry data (~ 1.7 kB) from AIA and HMI. These low level products will be archived and made accessible through a tape library, but most will not be held online.
  • Level 0 data will be processed into level 1 data products, such as flatfield calibrations, low resolution summary images, full resolution NOAA active region image sets, and image sets of notable features and events. Most level 1 data products will be held online for 3 months, after which time they will be archived and made accessible through the tape library. Some level 1 products will be held online for longer than 3 months (to be determined).
  • Level 2 data, such as deconvolution products, temperature maps, irradiance curves, and field line modes, will be generated on the fly from user requests. The majority of these products will not be archived or held on line.
  • Level 3 products, such as internal rotation images, internal sound speed images, full disk velocity and sound speed maps, Carrington synoptics, and other custom products will also be generated on the fly and not held in an archive.

Level 0 and level 1 archived products will be stored internally in a proprietary format with keywords and header information stored in ascii files. Users will have a choice of export formats: FITS, JPEG, VOTable, HDF, and other formats to be determined. Although level 1 and some level 0 and level 2 products will be held in an online disk cache for limited time periods, the time delay in retrieving products archived in the tape library will not be prohibitive. Searchable metadata will be stored in an Oracle database; although JSOC users and Co-Investigator groups will have access to this database, it is unlikely that grid users will be allowed to directly interact with it. Instead, it is likely that a second copy of metadata searchable for grid users will be held in either a second instance of an Oracle database or in an XML database such as eXist.

UK Data Centre

The UK data centre will have two primary responsibilities: first, to facilitate access to low and high level data products for the UK solar community during periods of high data demand; second, to store a sample of AIA and HMI filtergrams over the lifetime of the SDO mission to enable visualisation of solar activity evolution during the period of SDO operation.

It is anticipated that the non-helioseismology community will experience periods of high data demand following solar events such as flares, coronal mass ejections, and coronal waves. To ameliorate the data request demands placed on the US data centre as well as transatlantic data transfers following solar events, the UK data centre will automatically request transfer of low level SDO data. In addition, a request will be placed for pipeline generation of on the fly high level data products, and these products will also be transferred to the UK data centre. These low and high level products will be stored in the UK data centre for approximately 10 days. When UK users request SDO data through the Astrogrid portal, their queries will be automatically redirected to the UK data centre first. If the UK data centre does not hold copies of the requested data, the queries will be transferred to the US data centre. This process will be transparent to the user. If all relevant low and high level data products are stored uncompressed for the duration of a solar event, it is approximated that ~3 TB of storage will be required for every 24 hour period of coverage.

The second UK data centre activity will provide immediate access to a sample of AIA and HMI filtergrams over the lifetime of the SDO mission; this service will be available to both UK solar researchers and the larger grid community. One filtergram from both AIA and HMI will be stored every 1000 seconds. If the filtergrams are stored in their 32 MB raw image formats, the storage required is approximately 2TB per year, or roughly 12.5 TB for the stated SDO mission duration of April 2008 to July 2014.

Phase B of eSDO will address a number of technical solutions:

  • What triggers can be used to automatically request transfer of low level data and pipeline processing / transfer of high level data following solar events?
  • How concentrated will UK demand for low and high data products be following solar events? Which solar events will trigger high demand?
  • Will periods of high data demand be experienced by the helioseismology community as well as the non-helioseismology community?
  • Can the UKLight point of access at UCL be harnessed to speed transfer of data through transatlantic pipelines?
  • Which compression algorithms should be used for the most efficient storage of scientifically useable data?
  • What visualisation plug-ins will be available from the UK grid community that could enhance access to the UK archive of SDO filtergrams?
  • Will data be most readily accessible through an add-on to the Solar-B storage facility at the Mullard Space Science Laboratory or through the Rutherford Appleton Laboratory large storage facility? (This question will also be addressed in deliverable eSDO-1321, Data Centre Grid Integration.)

Data Centre Software

DSA

Both the US and UK data centres will be accessible through the DataSet Access (DSA) software developed by the AstroGrid project. DSA is a set of java libraries that allow databases to be queried through web services using the Astronomical Data Query Language (ADQL). Queries are made to observational metadata stored in a database. DSA can translate ADQL queries to different flavours of SQL for a number of relational databases or to XQuery for XML databases. The DSA software can be downloaded as a war file and deployed with the Tomcat servlet container. While queries can be transmitted directly to DSA through the software’s JSP front end, the software is well-suited to acting as a backend system accessed through its web service interface.

Databases

Once an instance of DSA has been registered with a virtual observatory (VO) such as AstroGrid, users can submit queries through that VO’s portal. The DSA is configured to access a specific database during installation. In the case of the UK data centre, SDO metadata will most likely be stored in a mySQL database, while the data itself will most likely be stored in a file system either added to the MSSL Solar-B storage facility or included in the RAL large storage facility. For the US data centre, it is likely that grid user queries will be permitted to a secondary database at the JSOC facility; this database is likely to be a clone of the primary JSOC Oracle database or an instance of the eXist XML database. Data accessible to grid users will be stored at a Stanford University facility.

Implementation Tasks

Demonstration level data centres are currently in development at both MSSL and Stanford University. The eSDO Phase A research period will be used to document experience of DSA configuration, grid user accessible database design, and VO integration that will allow a smooth deployment of production level data centres in Phase B. The Phase A data centres and documentation are becoming accessible through the eSDO wiki at http://www.mssl.ucl.ac.uk/twiki/bin/view/SDO/DataCentre. SDO and eSDO researchers primarily involved in data centre deployment are Elizabeth Auden and Mike Smith at MSSL along with Rick Bogart and Karen Tian at Stanford University.

UK Data Centre

  • Completed implementation of data centre on test server: deliverable eSDO-2211, 22 December 2006
  • Completed integration of data centre with VO: deliverable eSDO-2212, 30 September 2007

  1. Establish data centre facility (proposed site: ATLAS)
  2. Configure mySQL database to hold metadata for cached SDO data products
  3. Install and configure DSA and CEA
  4. Register DSA and CEA with AstroGrid.
  5. Enable data queries and data transfers.

US Data Centre

  • Completed network latency tests: deliverable eSDO-2221, 31 March 2006
  • Completed AstroGrid / VSO interface: deliverable eSDO-2222, 30 September 2006
  • Completed support for AstroGrid modules installed at JSOC data centre: deliverable eSDO-2223, 30 September 2007

  1. Perform network latency tests to establish data transfer methods between Stanford and UK
  2. Complete AstroGrid / VSO interface
  3. Register AstroGrid / VSO interface with AstroGrid
  4. Help install / configure DSA if required.
  5. Help install / configure CEA if required.
  6. Help install / configure workflow engine if required.

-- ElizabethAuden - 25 Mar 2005

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | More topic actions
Topic revision: r7 - 2005-10-03 - ElizabethAuden
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback