Overview of the ADIL System

The Library system has three main tasks to manage:

This document gives an overview of these tasks from a perspective behind the user interface. For an overview of the Library from the user's perspective, consult The NCSA Astronomy Digital Image Library (Plante, Crutcher, and Sharpe 1996, ADASS V conference paper).

Processing Data Deposits

Astronomers are encouraged to deposit their fully processed images (in FITS format) into the Library at the time the resulting publication goes to press--about the time they would be clearing the data off their local disk. From the author's perspective, depositing the data is a matter of filling out a deposit description form (as an HTML form) and uploading the files to the Library via anonymous FTP.

Figure 1 below illustrates what happens to the data as it makes its way onto the Library "shelves". When the user submits the depositing form, a CGI script sets up an directory in the FTP area to receive the new data and places in it a description file containing the information provided in the HTML depositing form. The user is then given explicit instructions for uploading their data by FTP. In the near future, we plan to offer customized scripts (and later Java programs) that will automatically upload the data.

[depositing flow diagram]

Fig. 1. Data Flowing into the Library. Authors use FTP and the Web to deposit data and related information into the Library. Metadata for a searchable database is extracted, and the data is moved to storage. (Dashed lines represent services under development; see text for details.)

When the uploading is complete, the user notifies the Library by email, at which time the "Electronic Librarian"--a collection of perl scripts--sets off to process the deposit. One of the main objectives of this step is to extract metadata to insert into a database. Some of this information, such as sky position or frequency, can usually be extracted from the FITS files themselves. Other information, such as object name or bibliographical references, comes from the author by way of the description file. For this reason, a human librarian is also part of the processing stage to catch mistakes (e.g. author typos, etc.) or omitted information. With metadata extracted, the deposit is essentially ready to be sent to the Library Shelves.

The metadata gets loaded into database tables to facilitate searching. Currently, the ADIL uses Postgress as its database. Some selected metadata, including the deposit abstract, is sent the the Astronomical Data System (ADS) abstract archive.

The "Library Shelves" is system combining 14 GB of disk cache attached to the Library server with the NCSA Mass Storage System (MSS). The cache hold recently accessed images while the MSS (a tape-based system of fast drives) serves as long-term storage. If a user requests a file that is not currently on disk, it is transparently retrieved from mass storage and placed in the cache for downloading.

Responding to Search Queries

Figure 2 illustrates the process of searching for images in the Library. Users can query the Library's "Card Catalog" through a variety of search criteria (e.g. sky position, frequency, object name or type, etc.) entered into the Query Page (an HTML form). Through a CGI script, the query is sent through a query pre-prossesor to be converted to an SQL query to the Postgress database. The results of the query are then filtered through a post-processor for formatting and presentation to the user in the form of an HTML Results Page. One of the operations that takes place in the post-processing is a ranking of the results in terms how well each matched item satisfies the search criteria.

[depositing flow diagram]

Fig. 2. Searching the Card Catalog. A user searches the Library for images by entering criteria into the Query Page. The inputs are converted to an SQL query and sent to the Postgress database. Results are sent back to the user as an HTML document. (Dashed lines represent services under development; see text for details.)

Under development are techniques for combining searches of our local database with parallel searches of other data archives out on the network. One important archive the ADIL is working with is the the Astronomical Data System (ADS) Abstract Service. The ADIL and ADS have been working together to set up a system of cross-linking that allows both ADS users to locate ADIL images and ADIL users to locate ADS articles. These cross links can make possible complex cross-searching that incorporates criteria about both the observational details and the science behind the observations.

Browsing and Delivering Data

After completing a successful search, users often want to browse the results to further refine their search. When the exact desired images are found, they will want to download the images in full FITS format for further processing and analysis.

The key to connecting a user to an item of data in the Library is the data's unique identifier, referred to as a codename. ADIL Codenames serve as "call-numbers" to items in the Library collection. These codenames can be appended to URL bases and submitted to the library as HTTP requests for direct access to specific Library items.

[depositing flow diagram]

Fig. 3. Retrieving Data. The user will usually want to browse or preview an image before downloading it in full FITS format. He or she might following a link from a Query Results page which sends the codename for the desired image to the Electronic Librarian which uses it to find the image on the Library Shelves. Descriptive information and a GIF preview image is formatted into a Preview Page for browsing by the user.

The Results Page returned from a query contains a list of links to the matched images using URLs based on the images' codenames. Access to the images from a Results Page is illustrated in Fig. 3. When the user clicks on a link, the specified codename is sent to the "Electronic Librarian" which uses it to find the item on the Library's "shelves". If the item is a large image that has not been accessed recently, the E-Librarian may need to retrieve from the NCSA Mass Storage System. Once the item is available, it is downloaded to the user.

The requested item might be a full FITS image file; however, most users first access an image's Preview Page. This page provides descriptive information and links to related data to help the user determine whether the whole image should be downloaded. An important part of the Preview Page is the preview image, a subsampled rendering in GIF format which downloads quickly.


Raymond L. Plante / rplante@ncsa.uiuc.edu
NCSA Astronomy Digital Image Library / imagelib@ncsa.uiuc.edu