International Geothermal Association

Information Committee

Proposed Database Standard for Conference Proceedings

Justification

As the world geothermal community grows closer together, and as Internet access becomes more and more widespread, it is now possible to make geothermal technical papers easily available to the world community.  In particular, conference proceedings represent the single most prevalent source of technical and scientific papers in geothermal energy, yet paper copies are not always easy to obtain.  Not all organizations may choose to make their conference proceedings available on the Internet, but for those who choose to do so, the International Geothermal Association proposes to define a “standard database format” so that conference proceedings papers can be made available from a standard search from one (or more) web sites.  If each organization collates a database of each conference as it occurs (a task they are likely to do in any case), then the databases can be merged into a worldwide encyclopedia of geothermal papers that would be searchable on the Internet.  Acrobat (PDF) versions of the papers could be made downloadable from such a search.  The PDF files could be kept in multiple places – either at the original organization’s web site or at the IGA web site (or both).  The proceedings of the WGC2000 have already been set up on the web this way, and are accessed frequently.

Issues

There are several issues to consider in defining a database format that would be used by multiple organizations and in implementing it over the web:

1.      Software formats.  The standard should not be specific to a particular software package, but should be generic enough to be used in different packages.  This places some limitations, for example some software does not allow records greater than 256 characters in length.  To preserve flexibility, this would mean that multiline fields (typically this would be the abstract of the paper) would be saved as external plain-text files.  The need for software flexibility is required not only to give freedom to organizations to use the database programs they are familiar with (or even simple programs like Excel), but also to make it easier to share database files between organizations.  Databases might be used independently, or merged together into a single database at the IGA (or both).

2.      Flexible and fixed fields.  The standard would need to define specific standard fields that need to be included as a minimum (for example “author”, “title”, “filename”, “webaddr”).  These records would need to have the same field name across all databases so that they could be merged easily.  Individual organizations could add field definitions of their own, for their own purposes.  Such nonstandard fields would be left out of a merged database.  Some database programs can not accommodate field names longer than 10 characters and some database programs do not allow field names with spaces.

3.      Downloadable files.  The aim would be to have PDF (Acrobat) or HTML versions of the papers downloadable to people who searched for them on the web.  The PDF files could be stored at the IGA web site, or at other sites.  For this reason, the database should include information about the web addresses and directory names at which the files can be found.

4.      Archival conference proceedings.  The idea is to begin to accumulate conference papers as the conferences take place.  The possibility of scanning earlier conferences and adding them to the collection is being considered in several organizations – we need to allow for this possibility.

5.      Commercial aspects.  In the past, some organizations have made a charge for hard-copy proceedings and/or CDs.  Although IGA would probably not want to enter the “pay-for-copies” mode of web-paper delivery, there could be a mechanism by which people downloading a single paper could be shown a link to a web site that would allow them to order a complete paper copy or CD copy of a specific proceedings volume from the original organization.  It would be an advantage to IGA members to have commercially available papers appear in a search, even if the papers themselves are not downloadable for free.

6.      Legal aspects.  International Geothermal Association would not take over any copyright to the materials.  Copyright would remain with the original organizations.  Organizations would need to grant permission to IGA to distribute the papers over the web.

7.      Future expansion.  In the future, it may be possible to expand the function of the encyclopedic database to include journal papers and research reports.  The format needs to be flexible enough to allow this.

Proposed Standard Fields

The following 17 fields and field names are proposed.  These 17 fields would be “fixed” fields that would all need to be included in all standard databases.  Organizations would be free to include other fields in addition to the standard ones.  For compatibility, the exact field names need to be used for the standard fields.  The fields do not need to be in any particular order.  In the following tables, example field contents from WGC2000 are shown for illustration.  All fields are text fields, except for YEAR which would be a number field (to allow a search such as year > 1999).

Field Name

Description

Example

AUTHORS

Full list of authors

Enrique M. Lima Lobato and Julio Palma

TITLE

Title of paper

The Zunil-II geothermal field, Guatemala , Central America

CONFERENCE

Name of conference series

World Geothermal Congress

YEAR

Year of conference (strictly number field)

2000

SESSION

Session the paper was in (can be blank)

7 Case Histories

KEYWORDS

Key words for indexing

Guatemala , Zunil, drilling

PAPERNUMBE

Paper number (no special format – local choice).  Note: notpapernumbeR”.

R0002

WEBSITE

Site where paper is stored. Should end with “/”.

http://iga.igg.cnr.it/

WEBSITEDIR

Directory where paper is stored. Should end with “/”.(case sensitive)

mirrorWGC00/

FILENAME

Name of file (case sensitive). File name should not contain spaces.

R0002.PDF

FILESIZE

Size of PDF file

700KB

LAST1

First author last name

Lima Lobato

ABSTRACTFL

Name of abstract file (case sensitive)

R0002.TXT

ABSTRACTDR

Directory where abstract file is stored. Should end in “/”.(case sensitive)

mirrorWGC00/

ABSTRACTWW

Site where abstract file is stored. Should end in “/”.

http://iga.igg.cnr.it/

LANGUAGE

Language in which paper is written

English

COPYRIGHT

Copyright holder of paper

International Geothermal Association

Note that when a link is created by the database search engine, the records WEBSITE, WEBSITEDIR and FILENAME are concatenated together to construct the link (in the example above, this would be http://iga.igg.cnr.it/mirrorWGC00/R0002.PDF).  It is important therefore that slash characters (/) and case sensitivity be maintained carefully.

Note that the CONFERENCE series name should not change from year to year.  The search engine might look for example for all papers from a Stanford conference on a specific subject.

A sample database, in Excel format, can be downloaded from: standard.xls

Procedural Issues

The IGA web site runs on a UNIX web server, which is why file names and directory names are case sensitive, and may not contain spaces.  In many cases it may be easiest to simply use only upper case letters for file names and directory names.  As an organization completes a conference proceedings collection, there could be different implementation scenarios, depending on whether the organization hosts the PDF files, or IGA hosts the PDF files.

1.      IGA-hosted.  The organizers would create a standard database, and send it to IGA in DBF or Excel format.  The PDF and TXT files would be submitted on a CD-R or other form.  The directory name on the IGA web site would typically be a two-level directory, for example “PNOC/2003/” or “STANFORD/2003/”.  The IGA web site specification in the database records would be entered as “http://iga.igg.cnr.it/”.  The IGA would merge the new database into the encyclopedic database, so that all searches would access a single database.

2.      Locally hosted.  In this case the organizers would send just the database to IGA in DBF or Excel format, and host the PDF and TXT files on their own web site.

3.      Completely local.  It would also be possible for a local organization to run their own database web server and search engine, so that they could customize the appearance of the search, add fields etc.  In this case, the IGA web site search would make a link to the local web site search engine.

For More Information

Contact IGA Information Committee member Roland N. Horne at horne@stanford.edu