Introduction. The Data and Information Management System (DIMS) at the LEF-LTER is a Centralized system that has made, since its inception in 1989, data, metadata, and research results available to the LEF-LTER scientific community, including students and other scientists, policy makers, and to the general public in a timely manner. Data Management (DM) at the LEF-LTER is guided by Eda Melendez with the assistance of one data entry technician and one or two students. Data Management at Luquillo has profited greatly from the assistance of John Porter, a member of the National Advisory Committee.
Historical Perspective. Our Data Management goals have evolved since the beginning of LEF-LTER 1 (1989) from a centralized depository of the data gathered by our investigators to one that organizes, shares, and centralizes the data, metadata, and two catalogs of on a Local Area Network's (LAN) server for the use of the scientific community . One of the catalogs lists all the LTER data (LTERDBAS) and the other lists legacy data (LEFDSET1). The later started listing around 80 data sets compiled with data gathered before LTER 1 and that where inherited by the site's host, the Center for Environmental Ecosystem Research (CEER) of the University of Puerto Rico.
In of recovering the hundreds of data sets available mostly on paper, the site's Data Manager gained consciousness of the issues that have delineated the philosophy of LEF-LTER DIMS ever since LTER 1: 1) to document the data and to save its documentation files along with the data in order to make it useful to the scientific community with different space as well as different time frames, 2) to save the data on magnetic media that can be backed up and transferred to new media as technology changes , and 3) to have an on-line centralized list or data based catalog of the data where data set identifiers can be pulled out by the users as they searche (query) using key words.
The design and preparation of computer generated documentation forms, that were filled out for all the new data sets filed at LEF-LTER DIMS after 1991 started in the beginning of LTER 1. Guidelines to fill out the documentation forms were established and published in all our annual reports since 1991 to 1994, and finally published on our web site in 1995 (www.ites.upr.edu/sunceer/datamng/division.htm) Most of the investigators fill out the documentation forms by their own, but the Data Manager is available to assist them in this process. When this occurs the documentation is done more completely, and DM benefits from the knowledge received by the interaction with the researcher. LEF-LTER data sets catalogued before 1991 where not documented and mostly belong to short-term studies that where sponsored by our program during that period of time. Many of these data were in paper format and not entered in computer files (25). All data sets catalogued in the DIMS after 1991 had to be filed along with their documentation forms, otherwise they where incorporated into the system and , unofficially, catalogued in an on-line but unavailable catalog kept for LEF-LTER DIMS only. The data sets in both catalogs account for the 215 data sets reported in our last proposal. Almost all the non-documented data sets constitute legacy data from students and investigators that left the program at the beginning of LTER 1. The awareness induced by this situation moved us to be one of the first sites to prepare a data management policy (which can be reached from the hyperlink presented early in this paragraph) when the National Science Foundation required it so.
Since LTER 1 the catalogs, as well as the filed data sets and their metadata text files, has been placed on the PC-LAN's Novell server whose file system was then organized by projects, and that were accessible with exclusive rights by the research staff of the project. Everybody that accessed the LAN has access to the on-line LTER and legacy catalogs. The centralized system provided access to Paradox, a Relational Data Base Management Program (RDBMS) which access the catalog data base and all the filed data sets. The computer catalog contains all the fields that are part of the Documentation forms except for the methods, abstracts, and publications which are maintained in separate computer files. By the year 2000 (end of LTER 2) the LTERDBAS catalog contained 103 data sets, some of which were completed (34) and others ongoing (69). The investigators had access to centralized licences of the software that were used then to work with their data files, Lotus 123, Quattro PRO, and Paradox. Later in this period, a workstation held two statistical packages, SAS and Sigma Stat and Sigma Plot. Students and investigators from all the projects were given user names and password to access their data. Data with no ownership, like the EVFS meteorological data, were made accessible to all users as Read Only files to preserve the data integrity.
Until 1995, an updated copy of the Paradox file containing the catalog was maintained at the a standalone computer at El Verde Field Station (EVFS) which had no LAN until the end of LTER 2. Visiting researchers and students had access to the on-line LTER Catalog, documentation forms, and Guidelines on this computer. On-line requests forms were also kept on that computer for people to place data requests to the DM. If the data requested was not yet released the Data Manager would ask from one of the principal investigators of the site for permission to release the data and a notification of the request would be sent to the investigator in charge. The Data Manager would release the data only if she receives the written consent from the principal investigators. This practice is still in place.
During LTER 1 DM was in charged of entering 14 data sets many of which contain more than one data file. By the end of this period and the beginning of LTER 2 we entered 16 data sets. During LTER 1 we only used Paradox and QPRO for the entering of the data. By LTER 2 we also use Excel. Since then, the involvement of the Data Manager with the investigator, technicians and/or students of the project started at the beginning of the project.
Since LTER 1, the Data Manager has been collaborating with the design of the data files structures of the principal projects, and making suggestion regarding the software that should be used for the data entry and manipulation processes. In some cases, the Data Management staff provide training of the student and/or technicians entering the data (e.g., the Luquillo Forest Dynamic Plot, the LFDP project). Periodical meetings with the investigator in charge and the rest of the staff are held for checking the data. Up to three sets of computer generated data reports are delivered to the investigator for quality control. Data Management assures the minimization of errors in the data entry and data archival processes, but it is up to the investigator in charged to assure the scientific quality of the data gathered. When data quality is assured a copy is held on the server, and two other individual backup copies are saved in the some other magnetic media. The later have evolved from 5 1/4" diskettes, to 3 ½", finally to zip drives. The Novell server and the workstations holding the data are also periodically backed up.
Since LTER 1, investigators entering their own data file the data files, along with their metadata, from one to two years after the beginning of their project and periodically update the files. A copy of the data file, usually in Quattro PRO, Excel, or ASCII is kept updated on the centralized system. Two backups are kept in different media for each data file to recover in case of disasters.
Present. During the duration of LEF-LTER 2, two major technical events changed the centralized DIMS to a combination of both centralized and distributed system: Windows and the Internet. Although the RDBMS evolved with these events, the first Windows programs interfaced poorly with Novell servers and they did not provide Internet tools. During this time we continue to centralize our data sets on the Novell server but the access was limited by the incompatibility of the two environments.
Since the middle of LTER 2 the Data Manager has been in charged of developing and maintaining our Web Page. Our web site was first designed for the sharing of the data and metadata with the rest of the scientific community and the public in general. The requests managed directly by DM were reduced substantially (Fig. 1) giving DM time for other tasks. The usage of the Web page has evolved and is still evolving in LEF-LTER. Forms are being developed so the investigators can remotely enter their metadata. The Web has become the way in which DM interacts with the investigators and the rest of the scientific community. It is the publishing media for the DM's reports and surveys results.
By the end of LTER 2, the Windows versions of the Paradox RDBMS program interfaced perfectly with both Novell and the Internet. It also became a platform that can import and export other widely used computer formats like QPRO, EXCEL, ACCESS, and others. Portability of the data file ceased to be an issue, and DM can now except the data files in almost any computer file format, a big issue before then. By then, almost all the investigators in the program have migrated to Microsoft, mostly using Excel for the entry of their data (see survey at http://www.ites.upr.edu/sunceer/datamng/LTERSurveys/LUQInvestigatorsSoftware-ALLITESumm.htm) . Since transferability is not an issue any longer, DM continues using the Paradox program to generate and access the data based catalog.
The data and metadata are available on the Novell server data files in the form of ASCII files to ease their transferability for analysis. The original file format also exist on this media. All files are Read Only files to preserve the integrity of the data file and the efforts in doing quality control. The changes to these files are only made by the data entry person at DM if DM is responsible for its entry following the established protocols for changing the data. The updates of the data which are entered by the investigators are given to DM to update the files on the centralized system.
Data Management performs some analysis for the data. Most of the needs from the local scientific community requires the manipulation of the data structure rather than their analysis. The completion for most of their needs requires the use of many computer software resources, and sometimes the use of First Generation programming languages such as Fortran. The resources used are provided sometimes by three of our LAN's computers: the Novell Server holding the centralized data file, the workstation providing the local 5th Generation computer programs like Paradox and Excel to manipulate the data, and the UNIX-based SUN SPARC Station, our Web Host and host of the Fortran Language Program. This system is similar to what is known by LTER Network Information Managers as a "Centributed System", in which a centralized engine is used to access distributed data bases.
The server file system is now organized by the Catalog's unique Record Numbers and the link to the subdirectory holding the data and metadata is part of the catalog. The hyperlink providing direct access to the metadata and data on the Web site is also part of the information kept on the catalog. The data sets can be queried by Category, investigator's keywords, LTER Core Area, principal investigators, project titles, dissemination, starting and/or ending period, etc. The resulting table will give direct access to the metadata file on the Web which in turn gives access to the data. Paradox scripts generate on the fly Web Pages containing four special lists of data sets on the Web: by category, by LTER Core Area, by principal investigators, and by the record number
DM has now access to Compact Disks re-writable media to archive the final versions of the data sets. The data sets will be archived with their metadata following the same structure present in both our Intranet and Internet servers.
Between the Present and the Future. The following is between the present and the future (!?). For one thing is because it should have been in place by now and for another because it has already started to happen.
MISSING :
The actual Novell server does not meet the requirements to develop such a system. For this purpose, DM has placed a request to buy two Windows 2000-based servers. The principal one will be installed at the administrative offices at the ITES and the other at the EVFS. Both will hold Windows 2000 Server (5 clients) and the one at the ITES will also be the host for a Microsoft SQL 2000 Server and our Web site . This program allows to administer data bases that are located on any Windows-based computers including other servers. The server at EVFS will hold all the data that will be automatically downloaded using wireless technology at the station. The SQL server will be able to publish this real time data on the fly on our Web site. Triggers and scripts can be used to automatically update the data on the Web site as soon as it gets updated on "centributed"system. Most important, a set of queries can be developed to assist the investigators to merge data files on the fly and ultimately to make questions that will be answered based on the information held on the system.Our Data Manager has been preparing for this transition. At the beginning of this year she took two courses on the administration of an MS SQL server and the generation and administrations of its data bases. The training in the use of these and any other new resources will continue. The accomplishment of this goal requires a lot of work at DM and it will depend on the cooperation and communication between the scientific community and the Data Manager. Monthly meetings of the DM Committee has been taken place since August 2000 and have been the determining factor in the advance of the development of our Data Management and Information System.