A database is information stored on a computer in such a way that it can be:
- searched through to find certain details
- displayed on the computer screen or printed onto paper in various styles
- sorted so that it can be ordered in different ways, such as author, title, subject and date of publication.
A database is made up of ‘records’, where all the information about an item such as a publication or organisation is stored. Records are equivalent to the cards in a card catalogue system. Each record is made up of ‘fields’, where information about different aspects of the item is stored – for example, the author or keyword, or the name of an organisation or its telephone number. Fields can be repeated to accommodate more than one author or keyword, or divided into ‘subfields’ to accommodate titles and subtitles, or the publisher and place of publication.
A database is faster and more flexible than a manual system. For example, searching a card catalogue is limited by the number cards that can be produced, and the way the information is presented can only be changed by re-writing or re-typing the cards.
The types of database most often used in resource centres are bibliographic databases and mailing list or ‘contacts’ databases.
A bibliographic database is like an electronic card catalogue. Each record contains details of materials, similar to the cards in a catalogue. Each field contains information about one aspect of a material, such as the author or title.
A mailing list database is like an electronic address book. Each record contains information about individuals or organisations. Each field contains information about one aspect of the individual or organisation, such as their name, profession, organisation type, or address. A mailing list database can be used in various ways. For example, a resource centre membership database could contain information about members of the resource centre, including contact details and their areas of interest. A network database could contain information about resource centres and organisations that can be contacted for information on other subjects.
6.8.1 Standard database structures
It is important to have an initial outline of a database structure before selecting the database software, as the structure and use of the database may affect which program will be most suitable.
A database structure or format defines the fields, their names or numbers, whether they are repeated or subdivided, and often the format of the information in the fields.
It is possible either to use a standard database structure, or to develop one (see Section 6.8.2). There are several advantages to using a standard database structure. It saves the effort of designing a structure, it enables records to be shared with other organisations using the same structure, and it means that there will be good supporting materials. Sharing records with other resource centres can prevent duplication of effort and provide easy access to information about materials in the other resource centres. Therefore, if there is any possibility that the resource centre will want to share records with other resource centres, serious consideration should be given to using a standard structure. Otherwise a lot of time and effort will be needed to alter the structure or convert records.
Standard structures are designed to cover all standard needs. They include the option of adding new fields if required. Standard formats may include fields that are not required. These need not be used, or could be used later if requirements change.
Many standard database structures are available. The main international standard is MARC, but it is too complex for most resource centres. The Common Communication Format (CCF) is a more straightforward structure, and is suitable for storing information about people and organisations, and bibliographic information.
Some structures have been designed for specific subject areas. For example, the MIMBIS Manual is designed for development information. The World Health Organization (WHO) has designed a structure for the African Index Medicus (AIM). This structure or an adapted version of it is used by WHO country offices and health-related organisations such as ministries of health, and medical and health libraries. Healthlink Worldwide has developed database structures for contacts and mailing list functions, which are used by partner organisations as the basis for their own contacts databases and newsletter mailing lists. Both the AIM and Healthlink Worldwide structures are based on CCF.
6.8.2 How to design a database
If you need only a simple database or a very specialised database, you could create your own database structure, or adapt a standard structure. If you are designing or adapting a database structure, make sure that it relates to the systems analysis and objectives (see Section 6.2.1), and remember the tip: Keep It Simple.
The initial database design should be produced on paper, noting down each issue, and gradually building up a list of fields, content and format.
There are two main aspects to consider:
- the content of the records – the data to be entered into each field
- the structure of the records – the way in which the record is divided into fields, subfields, and repeatable fields.
You will need to decide what to include in the database, and how the database will be used. Then you can work out the content and structure of the records.
For example, you might decide to include books, periodical articles and audiovisual materials in the same database, or you might decide to have a separate database for, say, audiovisual materials, depending on the size of the collection. If the collection is small, a single database for all materials would be appropriate, enabling all types of material on a particular topic to be located with one search.
Once you have decided what to include in the database, you will need to consider how much detail is required. This depends how the database will be used. For example, a database that will only be used for searching materials in the resource centre will only need to contain basic bibliographic information (author, title, place of publication, publisher, date of publication, keywords and classification number). For example:
WORLD HEALTH ORGANIZATION
Implementation of global malaria control strategy : report of a WHO Study Group.
Geneva : WHO, 1993
Keywords: malaria / prevention and control / health information systems / programme management
Location: HC7.2 WOR
A database that will be used to develop a resource list will need to contain additional information, including an abstract of each material and addresses of distributors. For example:
WORLD HEALTH ORGANIZATION
Implementation of global malaria control strategy : report of a WHO study group on the implementation of the global plan of action for malaria control, 1993-2000
(WHO Technical Report Series 839)
Geneva : WHO, 1993
57 pages
Keywords: malaria / prevention and control / health information systems / programme management
Location: HC7.2 WOR
Abstract: The report provides guidance for the implementation of the Global Malaria Strategy, and recommendations for the development of epidemiological and health information systems to assist in the planning, monitoring and evaluation of malaria control programmes. Emphasises the need for early diagnosis and treatment and the importance of community involvement.
Available from: WHO Distribution and Sales, CH-1211 Geneva 27, Switzerland.
Once the content and purpose of the database have been decided, it will be possible to work out how many fields will be needed, and what the field structure should be, including what fields need to be subdivided or repeated. The field structure will depend how the information is to be used. A database that will be used to develop a resource list could have fields for both abstracts and distributors, but an ordinary catalogue database would not need these. For the resource centre database, the distributor information could all be in one field. However, for a mailing list database each line of the address would either have to be in a separate repeat of the field or in a separate field, in order to print mailing labels and add the address to letters.
In a bibliographic database, an author’s name is entered surname first, then first name, as this is the order recognised for catalogues and bibliographies. The field for names would be repeatable to include up to three authors. In a mailing list database, the name is entered either first name first, then surname (as this is the way that a letter or envelope is addressed) or put into two separate fields. You can work out the number of fields required by writing down the list of fields, and assigning numbers to them (or names, depending on the software).
The structure is easier to remember if related fields are grouped together as follows:
- resource centre information, such as accession number and location
- title information, such as title and edition
- responsibility for the item, such as author, editor or meeting name
- descriptive information, such as where published, who published by, date published, number of pages and number of illustrations
- subject and content information, such as keywords, geographical coverage and abstract
- acquisition information, such as price, whether donation or purchase, and where obtained from.
You will need to know which fields will be used for searching. Fields that will be searched will need to be indexed. An index is similar to the index at the back of a book, and provides much more access to information on the records than a manual catalogue.
Before finalising the structure, you should test it to ensure that the database can do what it is intended to do. This can only be done after the software is selected and installed. However, the initial design process can help to decide which is the most appropriate software.
Once the software is selected, you can test the structure by entering about 20 sample records (at least five records of each material type that the database will need to handle) and noting how easy it is to enter records for the various types of material. You can use these records to carry out functions that the database will be used for, such as searching, sorting and printing.
6.8.3 Data entry guidelines
Once the database has been designed, guidelines need to be written on how to enter data into each individual field, to ensure that data is entered correctly. The guidelines should cover, for example, what to enter, and how to format and punctuate entries. It is useful to write the guidelines as a table.
6.8.4 Database management
Database management is something that can easily be forgotten, but is an important aspect of having a database. Although data may be entered by any member of staff, the database should be managed by one person. The database manager is responsible for the accuracy and consistency of the data entered. Database management also includes backing up the database onto a floppy disk or zip disk to prevent loss of data, deleting unwanted files, and developing the database to meet changing needs.
TIP: Backing up
The database should be backed up every time a significant amount of data has been entered or edited. A significant amount is the amount that you would not want to have to re-enter or re-edit if the data were lost. Back-ups should be made on a rotational basis using at least two sets of disks. It is important to have rotational back-ups, in case one back-up becomes corrupt, or if there is a fault in the database before the back-up is made.
Carefully label each back-up disk with the name of the database, the date and its number in the set, for example, ‘Disk 1 of 3’. Store disks in a cool, dry place out of direct sunlight. Keep disks away from magnetic and electrical equipment such as telephones, televisions, videos and bags with magnetic clasps. It is sensible to make two back-ups and keep one copy at home, or in another building, to protect them from fire or corruption.
6.8.5 How to choose database software
The differences between different types of database (or text retrieval) software are becoming more blurred as technology improves and new features are added to software programs. This section does not discuss specific programs, but provides guidelines for choosing software. A detailed look at three programs that are often used by libraries and resource centres is given in Section 6.10.
Before you consider what database software to use, you must be clear about what you need a database for (see Section 6.8.2). Think about the long-term needs of the resource centre, and how adaptable the program is, as it is more expensive in terms of both money and time to change to a different program later. Also consider what is happening in your area or field of work. If a particular program is widely used in your geographical area, or is used by a network that you belong to, and it is affordable, it is also worth considering. However, do not decide anything until you have considered all other aspects of database software discussed in this section.
You can identify a suitable program in two stages. First, take four main aspects:
- variable length fields
- repeatable fields
- methods of searching
- flexibility in design and development
Then, if the program looks suitable, you can look in more detail at the program’s functions.
1. Main aspects
If the program is suitable in terms of the first two of these aspects, consider the next two.
Variable length fields are important for bibliographic information. Authors’ names and titles, for example, are never a set length, so fields containing this information must be able to adjust in size. Fixed-length fields might be too small, so that data is lost, or too large, resulting in wasted disk space, which can slow down the operation of the database.
Repeatable fields are needed to accommodate multiple authors or keywords. Programs without the facility of repeating fields have either to place all repeats in one field, which causes problems for searching and sorting, or to create a linked database for these fields, which is unnecessarily complicated.
Searching is important, because improved retrieval of information is one of the main reasons for using databases in resource centres. The program needs to be able to cater for different levels of searching, from complex searches carried out by resource centre staff, to assisted searches, or simple searches for resource centre users. The program needs to be capable of retrieving information on known details such as authors or titles; finding materials on a particular subject area; finding individual words or phrases; and searching all fields at once, or individual fields, as required.
Flexibility means the ability to design databases to fit the resource centre’s needs, and to make changes as required, such as adding and removing fields, or changing field names and record structures. It is important that continuing to develop the database is straightforward, and can be undertaken by resource centre staff, or other staff in the organisation. Otherwise the services will suffer, and the support required may not be affordable.
2. More detailed assessment
Once you have assessed the four main aspects of a program, you can carry out a more detailed assessment of the program’s functions, including:
- how easy it is to set up and develop
- field identification
- data entry assistance
- editing
- indexes
- searching
- sorting
- display, print and download formats
- user interface
- languages
- data protection
- importing and exporting
- hardware requirements
- database size
- software documentation
- help function
- support services
- user groups
- cost factors
How easy to set up and develop - Designing and developing a database requires a basic knowledge of computers, and an understanding of how to organise information. Think about who will design and set up the database. This will depend on the program selected, and the staff skills available within the organisation, or funding for external help. If the right type of program is purchased, very little outside help will be required. However, if the database is not designed by resource centre staff, make sure that resource centre staff are fully consulted during this process, to ensure that the database meets all the requirements of those searching and those managing the data.
When selecting a program, consider whether the program’s design process is straightforward and requires few technical skills, or whether it is more complicated. Consider how easy it will be to modify the database to meet future needs, such as adding new fields, or to solve any technical problems, for example, if the database stops working properly.
If you employ a consultant, make sure that the consultancy includes providing good written guidelines and training in how to use the database. Also ensure that any training covers the skills required to adapt and develop the database for future resource centre needs.
Field identification - When designing the database, think about how you will label or number fields, and how you will use repeatable fields and subfields. Systems for labelling or numbering fields may be important for sharing information with other systems (see Section 6.8.3).
Data entry assistance - Data entry needs to be as straightforward as possible. Records can be entered more quickly and accurately if the program includes facilities for ensuring that data is in the right fields and the right format, and is spelt correctly. Look for the following features:
- Setting up different data entry screens for each material type can help prevent data entry mistakes, as most fields on these screens should contain some data, and there is less confusion between fields.
- Record templates/record skeletons are useful if you need to enter a lot of similar material. Regularly required data can be entered only once and used for any number of records, without having to be re-entered. The required data can be entered into any number of fields to build up the template/skeleton. To keep down the amount to data to be entered, it is useful to design different templates/skeletons for different types of material, or particular sources of information.
- Default values are useful if particular words or phrases are regularly required in a specific field, such as the type of material or language. Only one default value can be entered for each field, so templates/skeletons are a more useful feature than default values.
- Field templates/input masks are useful if the data needs to be in a specified format, such as upper case, or a fixed number or pattern of characters, or a specific date format. The pattern of the data is provided in coded form, and data not fitting the pattern will not be accepted by the program.
- Substitution lists are useful for standard information, such as organisation names and types of material. Once a list is set up, the abbreviation is typed and the rest of the word or phrase appears automatically. This saves time, and reduces the chance of errors, including spelling errors.
- Pick lists are words and phrases that may be pasted from a list, including data from previous entries or the index.
- Validation/value lists are like pick lists, but can only be used for fields where the content is controlled by a set of words or terms, such as subject, keyword, geographic fields, or type of material.
- Spell checking is increasingly available with database programs. It can be useful, especially for records with abstracts. However, it cannot replace record checking and other quality control processes.
- Context-specific data entry help is very useful, as it provides a quick way of checking the format to be used when entering information into each field. Help messages are entered by the designer and can be updated as needed.
- Control over which fields must contain data (mandatory fields) is a useful way of making sure that important information, such as the title or subject, is always entered.
Editing - Alterations to the data in the database need to take up as little time as possible, so the options offered by the software are very important. The ability to copy and paste data within and across records is essential. Options to edit specified fields in a record range, the results of a search, or the entire database are also important. A ‘find and replace’ function is useful, but needs to be used carefully, as this function is not field-specific.
Indexes - Computer indexes are similar to the indexes at the back of books, in that they help to locate records containing particular information, or written by a particular author. Browsing an index is similar to browsing the shelves of a resource centre. If something of interest is found, the record(s) can be viewed on the screen.
The type of indexing affects how easy it is to retrieve information. It is therefore important to check what types of indexing are offered - word, term/phrase, specific terms/phrases within a field. Word indexing is useful for finding individual words that may appear in different fields of a database. Term/phrase indexing is useful for indexing and retrieving keywords, and full titles. The ability to index selected words in an abstract is helpful, as indexing every word in an abstract can waste disk space and slow down the program.
Indexing is most useful where more than one type can be used at the same time. For example, word indexing combined with field/phrase indexing for the title field allows access to particular words in the title, as well as a complete title.
Searching - There are several ways in which data can be searched:
- Simple searching means browsing the index, and selecting and combining terms from the index.
- Boolean searching means combining search terms using AND, OR, or NOT. This is sometimes provided in the simplified format of broader/narrower searching.
- Phrase searching (searching for a string of words), and proximity or adjacency of term searching (searching for words that are near each other) are important when looking for important words in the title, or for documents that cannot be well described by the use of subject keywords.
- Truncation searching is useful for finding words that have similar word stems, such as singular and plural terms, or words with similar meanings, such as computer, computing, computerised.
- Synonym and soundalike searching is useful for finding words that have similar meanings, or words that sound alike, such as words spelt in different ways.
- Cross reference searching is useful when a relevant term is found, as it enables the entire database to be searched immediately for other records containing that term.
- Cross database searching (searching more than one database at the same time) can be very useful, for example if there are separate databases for different types of materials, different resource centre sites, or for the catalogue and ordering systems. It can help to speed up the search process, or avoid duplicating materials or wasting time re-creating a record.
- Saved searches are a search strategy that can be saved and re-run at intervals, to retrieve any new records of interest to a specific user, or in a specific subject area.
Sorting - It is useful to be able to sort data (for example, by date, subject, author or title) when carrying out searches, producing bibliographies, or viewing records on-screen. It is important to consider how many levels of sorting are available, and whether complex sorting is offered. For example, if there is no author, the record may need to be sorted by title instead.
Display, print and download formats - It is important to look at how easy it is to create a print format, and whether it is possible to format text, so that an attractive printout can be produced without having to use a word processor. It is useful to be able to save records and reports as ‘rich text format’ (RTF), so that formatted data can be imported directly into documents such as bibliographies and newsletters. It is also useful to be able to save data in HTML (hypertext mark-up language) format, for placing records as text on a website. Check also to see how easy it is to do mail merge and label printing.
User interface - Does the program look pleasant on the screen? Is it easy to understand? Can searches be carried out without using written guidelines? However, even if the program is easy to use, it is a good idea to provide users with written guidelines, as these can also be used to tell them what the different databases contain.
Languages - It is important that staff and users can understand the language used in the user interface as well as the data that is entered into the database. Check what languages the program is available in, and, if needed, whether data can be entered in non-Roman scripts, such as Arabic or Hindi.
Data protection - If resource centre users and other staff are to search the database themselves, it is advisable to have some protection for the data. This could take the form of a search-only version of the software, or a password system.
Search-only versions allow users to search, sort and print. They are useful for resource centre users, and for making the database available to other resource centres and libraries. The other way to prevent database structures and settings from being accidentally altered is by using passwords. Passwords can be set in such a way that different functions are available to different users. For example, resource centre users could have a password that allows only searching, sorting and printing; resource centre staff could have a password that also allows them to enter data; and the database manager could have a password that also allows changes to be made to the database structure.
Importing and exporting - You may not need to exchange data between the database and other software to start with. However, it might become important later on to exchange data with other resource centres, or transfer data to a new program. It is useful to choose software that can import or export data in several popular formats.
Hardware requirements - Different software programs require different computer specifications. Some require more than the minimum stated specification to function well. To find out what specification you really need, speak to users of the software, not just the suppliers. Work out what capacity hardware (hard disk, memory, and processor speed) you need to run the software. Do you already have this capacity, or do you have a computer that can be upgraded, or will you need to obtain a new computer?
Database size - There is usually a top limit to the size of a database file, or the number of records that can be stored, and the maximum number of fields that can be created. Consider whether these are sufficient for the resource centre’s needs. This will depend on the purpose of the database (see Section 6.8.2).
Software documentation (user manual) - Is the manual easy to use? Is it written in non-technical language? Are there any additional materials to help resource centre staff to use and understand the software?
Help function - Is on-line help available with the program by clicking on the help menu? This is important, as commercial manuals to support the software are usually expensive, and may not be available.
Support services - Check that the software supplier is stable, both financially and in relation to their trading history, i.e. they are not going to disappear overnight. It is also worth considering their technical experience, and whether they can provide training, troubleshooting services, and software updates.
Find out:
- who provides and supports the software locally
- what experience they have
- what services they offer
- how many other users they are serving.
Also find out how often the software is updated. How much does it cost to upgrade? Can existing databases be easily incorporated into the new upgrades? Look out for help services that are accessible via e-mail and the Internet.
User groups - Technical support contracts can be expensive, so it is good to keep in touch with any user groups. Ask your software supplier, staff from related organisations, and members of local networks for details of software support groups.
Cost factors - It is essential to consider the cost of:
- initial software purchase and upgrades
- vendor/consultant support
- training
- time required to become familiar with the software.
Make sure that you budget for adequate initial training and follow-up training in more advanced features. Remember, the more complex the software, the more time will be required for staff to become familiar with it. The familiarisation time must be taken into consideration, as less time will be available for other tasks during this period. Weigh up the cost of initial and follow-up training against the workload of the resource centre staff, and the importance of providing up-to-date information services.