ARTICLE IN PRESS The International Information & Library Review (2004) 36, 291–295
The International Information & Library Review www.elsevier.com/locate/iilr
Research and usage of collection level metadata in Chinese digital libraries Wang Xin Library of Chinese Academy of Sciences, 33 Beisihuan Xilu, Zhongguancun, Beijing 100080, China
Summary The World Wide Web and other information technologies have deeply changed our information environment and influenced how we index and access the information we need. The library is a good example of this revolution. Instead of a building that holds books, the library is evolving into an electronic portal to digital materials from across the globe. Facing so many distributed and heterogeneous digital resources, layered metadata are efficient but sophisticated tools that make it easier to find the best information resources. In recent years many types of itemlevel metadata with which people are familiar have appeared. In contrast to itemlevel metadata, collection level metadata has not been broadly used, especially in China. By illustrating what a collection is and why we should create collection level metadata, we introduce the research and usage of collection level metadata in China and analyze the problems and limitations of the metadata being used. Some suggestions as to its use are also mentioned. r 2004 Elsevier Ltd. All rights reserved.
Introduction With the technical development of digital library interoperability, collection level metadata is studied by an increasing number of agents. We may define many things as collections, such as the holdings of libraries, websites, databases, catalogs, search engines and subject gateways. Most collections we have mentioned are aggregated by digital items, but physical collections are not exceptions; library holdings are a good example of these. Collections are not defined by their size. A search engine, such as Google, contains numerous web E-mail address:
[email protected] (W. Xin).
pages and data, whereas a person’s contact information may include only one or two items. Collection level metadata describes collections as a whole instead of describing the specific properties of digital objects or sub-collections within the collection. Description of functionality and access interface is included in some collection level metadata, which makes collections homogeneous. In the beginning, collections are described by unstructured textual documents, which can help people understand these collections. Because the description is unstructured, it is difficult to organize and locate these description files. Many efforts made by different project initiatives have
1057-2317/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.iilr.2003.10.014
ARTICLE IN PRESS 292 created structured and standardized collection description standards to avoid these difficulties. XML specification and ontology description languages promise to provide more opportunity to achieve automatic interoperation by software agents.
Some collection level metadata outside China Earlier collection level metadata, such as Dublin Core (Dublin Core Metadata Initiative, 2004a) and ISAD(G) (International Council on Archives, 2000), are similar to the descriptive metadata at the digital object level. The UKOLN Simple Collection Description Scheme (Powell, 1999) has 23 elements, 12 of which are taken from DC. Its elements are divided into two groups: collection elements used to describe a collection and service elements used to access a collection. This metadata scheme is also descriptive, but some issues have arisen when the scheme is used by the RIDING and Agora Experience, e.g. the use of standard controlled lists in some scheme fields (Brack, Palmer, & Robinson, 2000). The RSLP Collection Description Project (Powell, Heaney, & Dempsey, 2000) has made two main contributions. First, the project has developed a model for arbitrary collections. Second, this collection description metadata schema has been implemented using RDF. Three informational categories compose a collection description: description of a collection, description of accessing the collection and the terms associated with access to the collection. RDF Site Summary (RSS) (Swartz, 2000) is a lightweight collection level metadata schema used to describe and syndicate news and news-like contents on all sorts of websites. RSS schema includes classes such as channel, image, item and text input and properties such as items, title, link, description, name, etc. RSS is a very popular tool for information aggregation. The Z39.50 Profile for Access to Digital Collections (Library of Congress, 1996) and FEDORA (Staples, Wayland, & Payette, 2003) also contribute many good ideas to the development of collection level metadata, especially in the modeling aspect. In recent years, UDDI (Bellwood et al., 2003) is applied in the field of e-commerce, which uses BusinessEntity, BusinessService and bindingTemplate to describe business collections.
W. Xin
A review of existing practices on collection level metadata in China In recent years, there has been rapid progress in the development of digital libraries in China. The importance of the aggregation and interoperation of heterogeneous digital collections under distributed environments has been recognized. Metadata is an efficient solution for content aggregation. Nonetheless, the standardization of collection level metadata is in the developing stages. We discuss some research and practices concerning collection level metadata in China below.
Practice 1: Chinese Biodiversity Information System (CBIS) The Chinese Biodiversity Information System (CBIS) is sponsored by the Biodiversity Committee of the Chinese Academy of Sciences. CBIS is a distributed system that integrates information about biodiversity conservation and sustainable use of biological resources. It consists of one center system, five disciplinary division information systems and more than 30 data source information systems. The center system is located in the Institute of Botany of the CAS, and the five disciplinary divisions include the Botanical Division, the Zoological Division, the Microbiological Division, the Hydrobiological Division and the Marine Biological Division (Chinese Academy of Sciences, 2000). More than 40 basic databases in the field of biodiversity have been constructed under this project. Most of them can be accessed through the Internet. All of these databases have uniform collection level descriptions as follows (Table 1).
Practice 2: portals of the Chinese National Science Digital Library (CSDL) The Chinese National Science Digital Library (CSDL) is an active digital library project sponsored by the Chinese Academy of Sciences that includes one central portal (Chinese National Science Digital Library, 2004) and several subject portals, such as a chemistry portal (Chinese Science Digital Library, 2004), a life sciences subject information portal (Chinese Science Digital Library, 2003a), a library information portal (Chinese Science Digital Library, 2003b), a mathematics and physics portal (Chinese Science Digital Library, 2003c) and a resources and environment portal (Chinese Science Digital Library, 2003d). The vocabulary used by the CSDL portals refer to the format and content of DCMI vocabulary (Dublin Core Metadata Initiative,
ARTICLE IN PRESS Research and usage of collection level metadata in Chinese digital libraries Table 1 Collection level description for CBIS databases. General metadata Database author Database title (Chinese) Database title (English) Database description Database keywords Subject Data coverage Data use Publication data Metadata for database administration Database administration agent Database sponsored person Database contact person Database status Update frequency Latest update Database storage Hardware environment Software environment Storage format Records numbers File name File size Access metadata Access protocol Access location Contact person Access control Usage control Other information Project metadata Project name Project type Sponsored department Execution department Project start Project end Project manager Information gathering method Information gathering start Information gathering end yy Quality control Database description metadata Content description Aim Relationship with other databases Metadata for metadata administration Metadata creator Metadata creation date Metadata revision date Metadata status Metadata assessor References
293
2004b), including all the following Internet resources: databases/data sets, software, journals, books, libraries, discussion groups, news, meetings/events, organizations, search engines/gateways, multimedia, etc. All these types of resources or aggregates of different resources could be treated as multiple collections. The metadata used here are mainly based on DC, and involve the Dublin Core Metadata Elements Set (DCMS), Dublin Core Qualifiers (DCQ) and DCMI Type Vocabulary (DCMIType). Table 2 lists the metadata element sets of the library information portal as an example (Table 2). Along with DC and DCQ, this portal also references the name space of the Administrative Dublin Core Element (AC), Dublin Core Elements, Qualifiers, and schemas for CORC Resource Records (Core DC) and library information gateway type vocabulary (LIG).
Practice 3: Chinese Digital Library Standardization Project (CDLS) The Chinese Digital Library Standardization Project (National Science and Technology Library, 2003) is being sponsored by the Ministry of Science and Technology of the People’s Republic of China from October 2002 to October 2004. More than 20 libraries and research institutes, such as the Library of Chinese Academy of Sciences, the National Library of China, the China Academic Library and Information System, the Tsinghua University Library, and the Peking University Library, have taken part. A research team specializes in collection level metadata for this project. Based on the study of existing collection level metadata such as UKOLN Simple Collection Description, RSLP Collection Descriptions, Z39.50 Profile for Access to Digital Collections, RDF/RSS, and especially WSDL and UDDI, which relate to collection level description in electronic business fields, the collection level metadata research team is willing to realize an open collection metadata solution which fits the conditions for disclosing and discovering tremendous heterogeneous digital resources.
Some issues in current Chinese projects Analyzing the collection level metadata used in the CBIS and CSDL projects, we conclude that these two metadata sets are both useful for helping potential users discover, evaluate and access collections of interest and for helping collection administrators manage their collections.
ARTICLE IN PRESS 294 Table 2
W. Xin Metadata element set of the CSDL library information Portal.
Elements
Qualifiers
(DC) title (DCQ) alternative (DC) identifier
Form of obligation and repetition
Encoding rules
Mandatory Mandatory, Repeatable Mandatory, Repeatable
URI
(DC) creator
Mandatory
(DC) language
Mandatory
ISO 639-2
Mandatory, Repeatable Mandatory, Repeatable
Chinese library classification (CLC) Chinese classification and subject sheet
(DC) subject
(Core DC) class (Core DC) topical
(DC) publisher
Optional
(DC) contributor
Optional
(DC) description
(DC) date
(DCQ) table of contents (DCQ) abstract
Mandatory, Repeatable
(DCQ) created (DCQ) modified
Optional Optional
(DC) format
Mandatory, Repeatable
W3C-DTF, reference http://www.w3c.org/ TR/NOTE-datetime
Optional (DCQ) medium
(DC) relation
(DC) type
(DCQ) has part of (DCQ) is part of (DCQ) has version
IMT (type/subtype) Mandatory, Repeatable Mandatory, Repeatable Mandatory, Repeatable
URI
Mandatory, Repeatable
Library information gateway type vocabulary
Nevertheless, there are still many problems with the current Chinese practices regarding collection level metadata. First, more attention should be paid to the standardization problem for collection level metadata. Different metadata schemas have been applied to resource collections in each project, and these collection descriptions are stored in databases created by the project teams. Users may access the database to get collection description records, but these description records are not available through services offered by other projects because of the different formats of collection level metadata. This will influence the integration of and interoperability among different collections. The undergoing work of the CDLS is aimed at forming an open national standard of collection level metadata and creating the method for matching across different metadata schemas. Second, the base of conversation and matching between two different metadata schemas is shared semantics, so that the two sides can understand each other’s metadata records. The creator of the collection level metadata schema should first consider reusing the elements from the commonly accepted metadata schemas through namespace
technology. The CSDL project mentioned above is a good example of this. The third question comes with the use of namespace technology. At this time few Chinese projects encode their collection level description records in XML. It is very important to make these description records available in machine-readable forms that other software tools can use (Johnston, 2002). Also, a metadata registry is needed to publish and locate collection level metadata, such as a UDDI registry in the e-commerce field.
References Bellwood, T., et al. (2003). Universal description discovery & integration (UDDI) Version 3.0. Retrieved June 17, 2004, from http://uddi.org/pubs/UDDI-V3.00-Published-20020719.htm Brack, E. V., Palmer, D., & Robinson, B. (2000). Collection level description—the RIDING and Agora experience. Retrieved June 17, 2004, from http://www.dlib.org/dlib/september00/brack/09brack.html#ref4 Chinese Academy of Sciences (2000). Chinese biodiversity information system. Retrieved June 17, 2004, from http:// cbis.brim.ac.cn/ Chinese National Science Digital Library (2004). Central portal of CSDL. Retrieved June 17, 2004, from http://www.csdl.ac.cn
ARTICLE IN PRESS Research and usage of collection level metadata in Chinese digital libraries Chinese Science Digital Library (2003a). The life sciences subject information portal. Retrieved June 17, 2004, from http:// biomed.csdl.ac.cn/ Chinese Science Digital Library (2003b). Library information gateway. Retrieved June 17, 2004, from http://tsg.csdl.ac.cn/ Chinese Science Digital Library (2003c). Phymath portal. Retrieved June 17, 2004, from http://phymath.csdl.ac.cn/ SPT–Home.php?LANG_LANG=Ch Chinese Science Digital Library (2003d). The resources and environment science information portal. Retrieved June 17, 2004, from http://resip.csdl.ac.cn/ Chinese Science Digital Library (2004). The chemical information network. Retrieved June 17, 2004, from http://chin.csdl.ac.cn/ Dublin Core Metadata Initiative (2004a). Dublin core metadata initiative: making it easier to find information. Retrieved June 17, 2004, from http://purl.org/dc Dublin Core Metadata Initiative (2004b). DCMI type vocabulary. Retrieved June 17, 2004, from http://dublincore.org/documents/dcmi-type-vocabulary/ International Council on Archives (2000). ISAD (G) (General http://www.ica.org/biblio/cds/isad_g_2e.pdf
295
Johnston, P. (2002). Creating reusable collection-level descriptions. Retrieved June 17, 2004, from http://www.ukoln.ac.uk/cd-focus/guides/gp1/ Library of Congress (1996). Z39.50 profile for access to digital collections. Retrieved June 17, 2004, from http://lcweb.loc.gov/z3950/agency/profiles/collections.html National Science and Technology Library (2003). Chinese digital library standards (CDLS). Retrieved June 17, 2004, from http://cdls.nstl.gov.cn/cdls2/w3c Powell, A. (Ed.), (1999). Simple collection description. Retrieved June 17, 2004, from http://www.ukoln.ac.uk/metadata/cld/simple/ Powell, A., Heaney, M., & Dempsey, L. (2000). RSLP collection description. Retrieved June 17, 2004, from http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/september00/powell/09powell.html Staples, T., Wayland, R., & Payette, S. (2003). The fedora project: an open-source digital object repository management system. Retrieved June 17, 2004, from http:// www.dlib.org/dlib/april03/staples/04staples.html Swartz, A. (2000). RDF site summary (RSS) 1.0. Retrieved June 17, 2004, from http://web.resource.org/rss/1.0/