Overview of the process of reviewing Information Management at Luquillo. I think we should take new steps in regard to the reviewer's expectations with Data Management (DM) at our site.
In order to meet the reviewer's crucial issues to be resolved in DM we must take care of two things:
(1) Publish all data and metadata on the Web - Although we are still not completely there, the rate of data being published on the Web was constant for three months, and it slowed down for a while due to the entering of data and metadata from paper sources. In these cases, the process of publishing metadata and data on the Web is longer than when this information is given to the Data Manager of the site already in a computer format.
(2) Convince the reviewers that DM provides a common framework for our scientific community to do analysis and research. This is the main conceptual issue of the review. First, I had to define this term because neither the reviewer that brought it up was consistent nor clear in his/her critics, nor the fellows Information Managers (IM) use this term. Sometimes the reviewer refer to this concept as a need of a computer (hardware and software) framework and other times it seems that he/she means the existence of a databased Web site (a Web site in which the data and other pertinent information is stored, edited, and retrieved from a table on a Web server).
I first prepared a document were I defined this common data management framework:
(http://.www.ites.upr.clu/sunceer/datamng/LTERSurveys/ANDLUQcommonframe-version1.htm)
Revising the definition of a common data management framework. The idea of a common framework to perform the management, analysis, and synthesis of data has gone through a process of refinement. My first definition of this concept was incomplete. This became obvious to me after participating in a workshop at the LTER Network Office in Albuquerque last April 2001, to collaborate in the revision of an NCEAS product for the KDI project to develop metadata standards for ecological data. My conversation with my peers lead me to the idea of developing a higher level platform that will serve as a basis to present the kind of research that is done at the LEF LTER and as the interface for other investigators that visit us (on our Intranet as well as on our Web site.)
A new definition. A Common IM Framework will not be complete if it does not incorporate the scientific conceptual element into it. It is not a coincidence that both the scientific and the DM issues pointed out by the reviewer included the phrase "common ... framework". I generated a second version of the document that can be accessed at:
(http://.www.ites.upar.clu/sunceer/datamng/LTERSurveys/ANDLUQcommonframe-version2.htm)
Michener et al. (Ecological Applications, 1997) say that once a scientist recognizes the need for a specific data, he or she needs to address the questions of why those data sets were, or should in this case, be collected, and why they are "fit" for their particular use. The research context that originated the need to generate these data should be documented at two levels: one is a broader context that apply to those data sets that are part of "pin-off projects" (eg., climate, primary productivity, decomposition, regeneration) that resulted from a more comprehensive project like the LTER. If the data set emerged from a project that stands by its own a research context, its description should be only done at the level of the data set, thus describing the research origin of the specific dat set. Moreover, the research context at the level of the data sets should be documented for all the data sets.
I relate Waide's and Lugo's comments at the Jan 2001 meeting of the need of a project list with the need of having a list of Research Activities (I call this Projects or a higher research context) that are fully described and linked to the data sets that they generate. This, I believe, should be the platform of our common data management framework.
There is a need to cluster the data sets into their research context considering the possibility of a hierarchy within each cluster.
Example of a first class of data sets cluster in the LEF LTER. A good example of this is the Luquillo Forest Dynamics Plot project. The first level of research is the project as it was conceived when the first proposal was undertaken. From there, several sub-projects developed that generated several sets of data bases that constitute the second level of the hierarchy. The studies of population of animals like the birds, snails, frogs, etc. represent a cluster within this second level of the hierarchy. (See Table 2).
Example of a second class of data sets cluster in the LEF LTER. A different example is the Reagan's lizards studies. He clustered all his data sets (LTERDBAS # 1-6, 24) into one single project (LTER Lizard Abundance Survey) which can be described as a whole. Each individual data set is then conceived to answer a specific scientific question within the project, and for each, a data set abstract and methodology is given.
Example of a third class of data sets cluster in the LEF LTER. These first two examples might not represent all the cases for all the data sets in our catalog. A data set could have been conceived for a specific purpose and both the project and the data base merge into one. In this last case, a project description is unnecessary, and it is enough to provide the user with a data set abstract and methods (Michener et al., 1997). The project title and description in this case can be left in blank in the metadata and the data set can be reported in our common framework as an independent study. I think a good example for this last case could be LTERDBAS #88: Fungi of the Greater Antilles. Although the investigator gives both a project description and an abstract they both merge into one level of research context. By examining closely all the other data sets in the catalog, this data set constitute a cluster all by itself.
Metadata Project elements as clusters of data sets. In the documentation forms the LEF LTER (LUQ LTER) the documentation elements that take care of these two levels of research context are: the Project Title and Project Description, for the broader context, and the Methods and Abstract for the lower level of context, i.e. the data set. Unfortunately, not all data sets that should have this project description have it. This element was incorporated last year into the LEF LTER metadata forms (see the draft the Web version of this metadata forms). In other data sets, that fall into the first two cases described above, the description given is incomplete or it lacks this broader scientific context perspective. We can correct this by clustering the data sets at a higher level. Due to a lack of a formal definition for the term "project" this exercise could generate only new titles for some of the data sets. The IM at a site should reflect, support, and enhance the research component of the site. A full description of the core projects as a gate to the data sets (on the Intra and Internet) will shoe the site's research endeavours as well as its intellectual goals. This is the core thought of the second review IM issue (2 above). An IM system will still be deficient if it does not provide this overview of the Research activities at the site.
Conclusion. What I am suggesting here? I think the first, the concept of what a project is should be clarified for the group of scientists in the site. I know that scientists have the tendency of not paying too much attention to definitions, unlike mathematicians for instance, but the process of trying to come up with a definition is worthwhile all by itself. Agreements as well as disagreements come up in such an exercise, and knowledge is the ultimate product of the process.
Maybe the site wants to develop a new list of research activities (I
call them projects) or just cluster together the existing ones in project
titles that reflect the components of a unified research purpose. Sometimes
the difference between projects is only a location or timeframe. A site
might adopt a clustering already done when separating all data sets into
usefull pre-defined Categories. I think that should a decision made by
the group of scientitst in the site and not the IM. The main thing is that
this list must really reflect what individual scientists and /or the groups
of scientist have been doing to answer their scientific questions at the
site and must give lead to the list of data bases they originated.
Table
3 shows the list of Projects we first associated with each data set.
This list could be the basis for the the site's Web site. This should also
be incorporated into the metadata of their corresponding data sets.
Table 2. An Example
of a Research Context with hierarchy: the LFDP Project Clusters
|
|
|
|
|
|
| Bird Population Dynamics | 23.00 | Grid points bird counts | Long-term population dynamics of birds in the tabonuco forest | El Verde 9 Ha. grid, Bisley grid (watersheds 1 and 2), Silver plots at Bisley, Grid poitns at the LFDP |
|
|
|
|
|
|
| Canopy Arthropods | 96.00 | Canopy invertebrate responses to Hurricane Hugo | Canopy invertebrate responses to Hurricane Hugo | Big Grid - grid points 5.5, 8.3, 10.3 (Block 3), 7.6 (Block 6), 11.7, 11.11 (Block 5); Odum Trail - 10 and 50 m, above MRCE plot (Block 2); Sonadora Trail - 10 and 40 m S and W of bridge (Block 1) |
|
|
|
|
|
|
| Community Ecology of Land Snails | 41.00 | Habitat selection/Caracolus caracolla and other snails | Habitat Selection by Snails | El Verde Field Station, Big Grid, Luquillo Experimental Forest |
|
|
|
|
|
|
| Herbivory | 99.00 | Herbivory of eight common species at El Verde from 1994 to 1996 | Student Thesis Data | Several paths along and across the Hurricane Recovery Plot (Big Grid) at El Verde |
| Herbivory | 102.00 | Leaf miners (Acrocercops sp.)larvae performance on young leaves of Manilkara bidentata | LTER | Hurricane Recovery Plot (Big Grid) at El Verde |
| Herbivory | 103.00 | Effect of plant density and light availability on leaf damage in Manilkara bidentata | Student Thesis Data | Hurricane Recovery Plot (Big Grid) at El Verde |
|
|
|
|
|
|
| Luquilo Forest Dynamics Plot (LFDP) | 46.00 | Topography of Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N & 217503.6 E in the PR planar coordinate system |
| Luquilo Forest Dynamics Plot (LFDP) | 47.00 | Tree measurements in Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N & 217503.6 E in the PR planar coordinate system |
| Luquilo Forest Dynamics Plot (LFDP) | 48.00 | Tree conditions in the Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N & 217503.6 E in the PR planar coordinate system |
| Luquilo Forest Dynamics Plot (LFDP) | 57.00 | Hurricane damage in the Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the Grid is 54394.1 N & 217503.6 E in the P.R. planar coordinate sys. |
| Luquilo Forest Dynamics Plot (LFDP) | 59.00 | Mortality of trees after Hurricane Hugo in the Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Station past the Sonadora river. The SW corner of the grid is 54394.1 N and 217503.6 E in the Puerto rican Planar coordinate system |
| Luquilo Forest Dynamics Plot (LFDP) | 60.00 | Location of trees in the Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394. 1 N and 217503.6 E in the PR planar coord. system |
| Luquilo Forest Dynamics Plot (LFDP) | 61.00 | Uncommon Trees in the Hurricane Recovery Plot (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394. 1 N and 217503.6 E in the PR Planar coord. system |
| Luquilo Forest Dynamics Plot (LFDP) | 62.00 | Canopy height in the Hurricane Recovery Plot (El Verde) (Canopy height) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N and 217503. 6 E in the PR Planar coord. system. |
| Luquilo Forest Dynamics Plot (LFDP) | 74.00 | Big grid analysis matrices | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N and 217503.6 E in the Puerto Rico Planar coordinate system. |
| Luquilo Forest Dynamics Plot (LFDP) | 85.00 | Forest regeneration after hurricane (El Verde) | Luquillo Forest Dynamics Plot | North of El Verde Field Station past the Sonadora River. The southwest corner of the grid is 54394.1 N & 217503.6 E in the PR planar coordinate system |
|
|
|
|
|
|
| Phenology | 88.00 | Phenologies of the Tabonuco Forest trees and shrubs | LEF LTER | El Verde-Hurricane Recovery Plot and Bisley Watersheds at the Luquillo Experimental Forest |