Wednesday, August 25, 2010

Notes Dump

Module 5 notes from 5.4 on
• 24.8.10 Full text searching of a document (in Internet explorer go Ctrl+F) shows every instance of the use of the word/phrase eg. Information resource, but then there might be shortenings (info resource), alternatives/synonyms (resource material), variations (organization or organization) and flat out misspellings too. Then what? Example using “warwick” gave one first name, the name of a framework and the name of a university. [[NB text about “container” system of organizing metadata elements, from an NLA staff paper]] Useful for editing purposes, can also skim through web documents looking for keywords … searching is at document level, not over a group of items collected around a subject.
• 5.5 searching of texts in resource descriptions: searching in surrogate records, particularly in the abstracts, gives a bit more control over keyword/subjects (moving up the spectrum here)
• 5.6 searching on title keywords: primitive, unstable as titles (only) not always directly related to subject, Hider p.155-6: info retrieval systems like OPACs now search commonly on keywords in surrogate records from a range of fields: title, author, subject etc – improvements from not limiting keyword search to only title/author, adding more terms to surrogate record and searching full-text in digital form – gives rise to Boolean operations, limiting and truncation, automatic switching for synonyms and auto spellcheck
• 25.8.10 Going into 5.7 and confused about authority control’s place around bib description vs. using controlled vocab for subject access … hang on a mo, going back to read Ch.5 and then into Ch.8
o “Authority control” is the “process of ensuring that the names … titles … and subjects (of documents) in an info.retrieval system are used in a consistent form and style” (Hider p.86) – has 3 aspects: uniqueness, standardization and linkages, in practice
 Distinguishing names and titles, using the same form in indexing every time
 Showing relationships, eg. Same author, 3 variations of name
 Documenting decisions, ie. Noting the AACR2 rules used in decisionmaking
o So … authority control allows for better searching because the “authorized” name/title and linkages have been used by cataloguer/indexer, done-once & can be used again (ie. An “authority list” of all “authority files” (one file per surrogate record item, I assume) has to be kept and maintained: expense! Not having auth.file means that effort of sorting out the mess shifts to users and repeats endlessly (p.87). Obviously, in OPACs – machine checks any given name against authority file and connects user to full record of resource no matter what version of the name they used (user not even aware of auth.file itself & doesn’t have to be) and greatly reduces possibility of incomplete search
o Problems: maintainence and quality of authority files drops (expensive in time and money) and exacerbated by exchange of bib records between libraries (LofC, Libraries Australia, etc) – clients must contribute as well as take, too much take, not enough contribute … quality runs down. Smaller libraries end up living with consequences: less effective info.retrieval system. (p.91) Big players merging their vast authority files (BL, LAust, and france/german/Lof C/OCLC)
• Back to Ch.8: authority control for subject access (ie. Ensuring subjects … “used in consistent form and style” (p.86) Subject access to info.resource in 2ways: bib classif.scheme and “mechanisms based on alphabet.arrgmt (p.132) – mechanisms ie. Controlled vocabs like lists of subject headings (eg. SCISSHL) and thesauri
• Subject headings list = list of terms (alphabetically ordered) selected for ability to indicate subjects and authorized for use to provide subject access to bib records. (Circular definition?) – official, approved, the list you should choose terms from, spans wide range of subjects suitable for large public/academic libraries; each subject heading is a “term”: broad, often pre-coordinated (put together beforehand) and inverted (eg. Cookery, Singaporean), some reference structure linking related terms (imprecise at times), maintained by big players etc.
• LCSH Library of Congress Subject Headings; MeSH Medical Subject Headings; Sears List of Subject Headings; SCISSHL Schools Computer Information Service Subject Headings List --- several very major to big SH lists, lots of other technical ones, historically big problems with Australianising the content of American-biased SH lists (railways not railroads) leading to many Aust attempts like LASH (List of Australian Subject Headings) 1981.
• LCSH endeavouring mid 80s on to become more user friendly to worldwide users (p.140) with cooperative efforts for libraries to contribute new subject heading and to revise/clean up the older lists (cult.bias, sexism, racism etc) p.140-2 but still generally problematic
• SCISSHL 1985 to present: developed for use in Aust and now NZ school libraries together with SCIS’s database of catalogue records (on MARC21), online version const.updated; augments ScOT (the Schools Online Thesaurus, rumours of being dumped soon?) to improve subject access to info.resources online; users contribute proposals of new subject headings online; “aimed at reading age of 10 years” (p.145) Just as well!
• THESAURI: basically a thesaurus shares all the qualities of a SH list, but gets more specialized over a limited scope of subject areas (eg. Medicine), reminds of GP going on to specialize in oncology – has all GPness but goes deeper into certain area.
o “Much more likely to authorize single concept subject terms (post-coordinated) and combined ones (pre-coordinated)” (p.146) – ie. recognizes finer degree of detail that a user might have as search term; close/precise definition of relationships between terms; provides subject access to stuff in/outside lib.catalogues (ie online); similar aims to SH list but different emphases ( (Hider, 2008, p. 147)
o “descriptors” = terms authorised for use, are usually singular in a thesaurus; use of facets and sub-facets (p.149) to get fine detail (eg. Paper – end-products – purpose – blotting), not kidding about the fine detail
o Many thesaruri for diff subjects/fields of endeavour (eg. Business), often built/maintained in-house
o Work for multi-lingual applications (ie. Japanese/English & English/Japanese dictionary basically two thesauri heavily cross-referenced against each other)
o Most useful online, developed 60s onwards “to provide a controlled.vocab for indexing and searching online databases” (p.148), most major players have thesauri online eg. APAIS, NASA UNESCO, ERIC, APT, ACE
o Can build thesaurus in-house but then must maintain, better to use published versions
o Above points mean thesauri VERY useful tools online for combined “access to a wide range of information resources covering different subject areas” (A,G,B 2000 in Hider p.150); esp. Multi-lingual uses
o RELATIONSHIPS shown through notation:
 SN scope note – when to use descriptor, defines scope
 UR use for – terms the descriptor is used in preference to
 USE use – descriptor to use instead of the term
 BT broader term – terms that are more general than descriptor
 RT related term – terms on the same hierarchical level as descriptor
 NT narrow term – terms more specific than descriptor
 TT top term – the name of the broadest class to which the specific concept belongs (p.147-8)
• SH lists and thesauri used for assigning subject headings and terms to records, one method of improving effectiveness of lib.retrieval systems (p154) – obviously, the more terms in record, the better ability of search to use relationships to find suitable stuff! Controlled.vocab approach “complemented” (p.155) by natural language approaches: keyword searching/records enhancement/automatic indexing
o More terms for user to search on = improved recall for subject searches (p.157) – some terms come from controlled.vocab and more can be added from nat.language – taking terms from document itself (like info from contents page) increases currency (closest to terms used by user) -- records enhancement has to be targeted because obviously it is expensive to add extra info to every record
Reference Used:
Hider, P. (2008). Organising Knowledge in a Global Society. Wagga Wagga: Centre for Information Studies, Charles Sturt University.

Monday, August 23, 2010

Notes Dump

Module 5 Notes up to 5.3
• Been looking at this for a week and nothing’s sticking, so am reviewing and writing stuff down, dangblastit, as the slow and steady is the only thing that seems to work
• Subject access tools spectrum from low to high:
o Low vocab control (derived indexing)
 Full text searching
 Searching of text in surrogate record
o Automatic indexing & natural language processing (midway point)
o High vocab control (assigned indexing)
 Controlled vocabularies
 Classification schemes
• Text-based retrieval used by all these tools (up to classif.schemes like DDC which use notation to represent concepts in text) and surrogate records usually built using several subject access tools – gives more access points
• The “spectrum” isn’t orderly at all! In low vocab control text-search software makes internet search engines possible, and in the high voc.con specialist indexers can choose subject heading from lists (like Lib.ofC, many others etc) making surrogate almost hand-crafted (and expensive) – lots of tension between the methods and camps  Hider advocates middle road making the best of whatever tools seem right, as low and high strengths/weaknesses complement each other (figure 6.1 p.103 of Hider)
• Good subject access to information = intellectual access to the contents (p.99), but “hard to get right” – traditions of controlled indexing + web-based text searching gives rise to the “middle way” (p.100)
• Middle path in Hider (p.100-102)
o Determine what the information resource is about
o Select terms from an indexing language that represent the statement of aboutness
o Assign terms, or symbols representing terms, form a controlled vocabulary
o Determine other attributes of the info resource to search on
• SCISSHL is a controlled vocabulary (or “indexing language”) for choosing subject headings --- SCISSHL = SCIS Subject Heading List
• Controlled vocabulary – two types: alphabetical indexing languages or classification schemes
o Alphabetical listings of subjects – no relationships
o Classif.schemes - organize relationships and give notation (numbers or number/letters) to group subject matter together (eg. Puddings get DDC of .864 under the .86 Desserts) -- can see why classif.schemes like DDC or LCC are good for placing and finding the item on the shelf.
• Bib classif.schemes proliferate for many purposes (specialist libraries etc) so there are the big general schemes meant to cover “all of documented knowledge” (p.109) --- DDC, LCC, UDC --- and special schemes for “more limited” fields like maps or law or music.
• Ideal features of clss.sch notation: uniqueness (represents only one concept), simplicity (easy to comprehend (ha!!)), brevity and hospitality (ability to expand to cater for newcomers in info-subjects).
• Clss.sch are enumerative (count their subjects out), hierarchical (sep subjects into primary and subordinate areas) and/or faceted (subject broken down into single concepts which are notated so a combination of symbols can build up to represent quite complex larger subjects) (see p.110-1)
• “schedules” are the actual listings of subject concepts in a scheme, plus there must be rules for selection and an alphabetical index of concepts (p.110)
• Classif.schemes 2 purposes in library: to provide a location for an info resource and to provide access by subject to info resources – tension as big players (LCC/DDC) lean more to “mark and park” location function (p.112) while subject access function marginalized. Yet effective, deep subject access is what users need! WebOPAC: use of bib classif. To improve searching of documents/surrogate records before walking to stacks.
• The big schemes successful with support behind them to maintain and update: eg DDC owned by OLOC – good organizing schemes have to evolve with info boom, therefore will die out without big-institution support behind them (p114).
• Hider’s Chapter Six and keywords noted in module:
o Subject access: Cutter’s objectives: “to enable a person to find a book of which … the subject is known … to show what the library has … on a given subject … to assist in the choice of a book … as to its character” (in Hider P.10) --- subject access provides user with “intellectual access” to make effective choices of info resources, so it follows that access points provided have to be provided through a sensible process of identification, selection and recording in surrogate records.
o Indexing language: which system the cataloguer/indexer chooses to use
o Controlled vocabulary: systems developed and backed by major library players as formalized ways to award subject terms to resources; words/phrases from list selected and authorized by cataloguer/indexer
o Derived indexing: you get the terms from looking at the resource itself
o Assigned indexing: you get the terms by using a formal system like DDC
o Exhaustivity: the idea of getting every last possible subject term listed
o Literary warrant: the subject already exists, so put it on the lists; may not be considerate of what may not yet exist; web environment has changed the nature of what “literature” is now “published”
o Pre-coordination: combining terms at indexing stage into one (or choosing to index a heap of single concepts)
o Post-coordination: combining terms at searching stage into many (searcher knows the subject is complex, so tries various terms in search)
o Natural language: what is used there by author (and searchers) ; not limited to a controlled vocabulary
o Free indexing language: whatever terms occur to the cat/indexer, on and beyond the resource itself; what seems good at the time!! No use of controlled vocabs.
o Classification scheme: arranges subject concepts in lists/notations which when applied to resources gives them a place on the shelf and subject access points so that searchers can find them.
• Classification schemes have their strengths and weaknesses:
o Great for finding place on shelf, but taken to extremes, can only care about location rather than subject access
o Have to be updated and maintained to cope with rise of new subject concepts, thus expensive, need big players (L of C, OLOC etc) to keep alive, new editions moving to online access
o Flexibility/expandability dependant on structure of schedules and rules, which effects inclusive and comprehensive nature
o Notation meets unique/simple/brief/hospitable requirements to various degrees of success (eg. DDC decimals sometimes not far enough on hospitality)
o Cultural/nationalistic bias a problem (LCC)

Thursday, August 5, 2010

Notes Dump

Module 3 Summary Notes
Milstead, Jessica and Feldman, Susan. (1999). Metadata: cataloging by any other name … [Electronic version]. Online 23:1, 25-31.

This is the EBSCOhost metadata on the above article, has 9 headings, not very sure if all of these would be Dublin Core element set, will have to have a closer look:

Milstead, Jessica
Feldman, Susan
Online; Jan/Feb99, Vol. 23 Issue 1, p24, 7p
Document Type:
Subject Terms:
Provides an overview of projects for standardizing electronic resources. Information on metadata; Need for metadata; Creating metadata; Search engines and metadata; How metadata affects searching; Problems in the development of metadata.
Full Text Word Count:
Accession Number:
Library, Information Science & Technology Abstracts with Full Text

And what is “metadata”? Various definitions seem to share a general consensus:
• “Metadata is data about data. It describes the attributes and contents of an original document or work” (Milstead and Feldman, 1999 p.26)
• “The simplest definition of metadata is " structured data about data." Metadata is descriptive information about an object or resource whether it be physical or electronic.” (
• “Metadata is loosely defined as data about data. Metadata is a concept that applies mainly to electronically archived or presented data and is used to describe the a) definition, b) structure and c) administration of data files with all contents in context to ease the use of the captured and archived data for further use. For example, a web page may include metadata specifying what language it's written in, what tools were used to create it, where to go for more on the subject and so on.” (

Dublin Core Metadata Initiative (DCMI)
• Recent group (Ohio) 1997ish building a web standard to cope with electronic information packages / features simplicity, semantic interoperability, international consensus and extendibility ie: easy to use, moves well between languages, international agreements to use and ability to extend usefulness to just about any sort of info package (physical or digital)
• 15 elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights
• 15 elements cover data to depth not always seen in brief/medium level AACR surrogates, and allow for electronic means of tracking, copyright issues etc
• There is structure, but not a rigid format for what elements MUST be there and in what order. Ie: DCMI is moving away from the ISBC rules ( such like RDA developed out of AACR2, and said to be divorcing from the oldest formatting rules)
• FAQ section at Very handy.
• Metadata for a resource often created by its authors at time of release onto web / implications: others apart from librarians’ profession interested in the keeping and retrieval of data, but many clashes (too many cooks)
• Example notes 6 elements, so … how many elements do you need before a resource is absolutely uniquely identified? I suspect most online resources may not need all 15 elements in the metadata entry for uniqueness to be established. However, because of software used to create and read the data and other electronic issues, metadata about electronic resources needs to be very, very informative – concept of “linked data” where interconnectivity of resource is made through its metadata to similar resources. A metadata surrogate is obviously so very much more powerful in its scope (information about information) than a “simple” MARC21 file!!

Notes Dump

Module 2 summary notes
• Information agencies (lib/mus/arch) use range of info retrieval tools
• Info agency now functions as gateway to info not owned by that agency (also info centre) but available through it
• Info ret tools are also organizing tools / info ret and org both by library staff and by users
• Traditional tools developed over centuries: lib catalogues, classification schemes like DDC LCC UDC, controlled vocab (thes and sub heading lists) – through to information age where full text search software, image and sound ret tools, and search engines are used
• Taylor article (2004) chapter 3 covers history and development of catalogues into cat rules and standards, Panizzi, Dewey, Cutter, western development of AACR etc – basically what we recognize as a resource description has elements in a certain format/order which is created in accordance with a standard like AACR – has international and computer-system “interoperability”
• Emphasis: resource descriptions (and the info ret systems that make use of them) have functional aim of uniquely identifying info resources, providing access by title/names, and providing subject access. The point that all this careful analysis and application of tools comes down to is the creation of an entry where the item (info package) is ABSOLUTELY UNIQUELY identified. Why so needed? Trees too easily lost in forest if not somehow marked for retrieval. Another thought: what if the entry (surrogate) itself is lost? Then the resource goes unused, big effort to recatalogue etc – scenario even worse for digital resources, can’t find it again, prob never.
• History of SCIS / use of cat standard AACR with some interpretation plus MARC21 coding to create very large database / exchange of bib. Description of resources with most schools across country
• Downloaded SCIS documents: overview of SCIS subject headings, descriptive cataloguing etc etc’ / interesting reading … takes quite a lot of to&fro therefore very much worth printing out into loose leaf form, still sifting through it …

Wednesday, August 4, 2010

Notes Dump

ETL505 Module 1 summarising notes / page no.s quote the Hider textbook unless otherwise etc:
• Info resource has a description made for it – gets used as catalogue record so that users can find and use info res, starting off with very little info about it to getting to see the whole thing
• Many prof. “describers” + much analysis work to make description effective
• This is “bibliographic organization” (bib control) and big questions are:
o Why needed?
o How to do it?
o Effective org operation?
o Failure results in?
o Best ways to org info are?
o Costs are?
o Org new info (“digital objects”) on web has new challenges for old standards of info org

• Personal Experiences as info user: heavy use of schlib in teaching work / can use several catalogues from sch, uni and state / can persevere as searcher / enjoy personal org of info and things, have to recog leap to large formal org needing to org huge info, standards and learning to use them of pragmatic value: accept the info agency will be much bigger than you and the small personal way you like to org things, must be some Babel tongue or couldn’t connect with other people and info, also, who wants all the hard work of re-organising? Standards there to HELP you.

• Info “agency”: libraries, museums, archives, all sorts of online groups and depositories / their purposes for use and access fundamentally same: aim to provide info to people / merging of agencies historically sep in 17th to 20th C.

• “Access” to resource via its described elements – title author subject publisher etc.

• P.16 most lib use keyword searching + subject heading + classification scheme to provide access

• P.16-17 Three 3 indexing methods
o Controlled indexing language scheme / alphabetical or classif sche like DDC or LCC
 Use authority list to select from, leading to ? of can auth list really cover all knowledge – wot about info explosion on internet? Auth lists need constant updating to be current and useful
o Derived indexing
 Taking subject terms “naturally” from the text (esp title + abstract)
o Free indexing
 Use of any term that seems useful! Tower of Babel problems multiply.

• P.18-19 Distinctions between most common info retrieval systems in libraries

o Bibliography: comprehensive list on a certain topic, highly specialized for task and no location given
o Catalogue: holdings in collections accessible from that location only
o Index: highly specific to one subject/collection/back-of-book
o Add to this mix the INTERNET, and all that is accessible there, and distinctions above break down
o Modern OPAC Online Public Access Catalogue
 replaces old card cat 70s and previous centuries
 MARC Machine Readable Catalogue format (data format for computerization of records) makes OPAC possible and powerful
 current, easy to use, simultaneous use, compact, easily duplicated
 had 30 yrs to compete and merge with web – rise of web-based search engines, federated searches through many databases, gateway to other systems, some full-text files avail online
 hence OPAC becoming WebPAC
o Main points Hider (2008) making p21-23 about effective info ret system issues:
 Standards for org info
 Participation in agreements for metadata exchange
 Participation in cooperatives to gain max benefit from md exchange
 Info currency essential to global info environment – all info ret sys have inputs/outputs, this dataflow lifeblood of current info and web itself – global and connected (one body metaphor)
 Data inputs p.23

• Use widely-understood standards, min effort to create, disting res sufficiently (tell them apart), accurate and consistent
 Data outputs

• Users readily understand, close match to requests, user-friendly, rapid response, simultaneous use by many
 Implications for information services / agencies

• Librarian must catalogue very accurately, keeping users in mind and consider sensible placement of resource where it’ll be most used

• Users should find relatively easy to learn, use, access metadata (often without
direct 1 to 1 help)
 Bib record sources p. 23-24

• Mixture of in-house creation and importing, depending on info agency situation (assume school lib sources mostly imported through SCIS etc)
 Exchange as driving aim
• Describe it all and you control it all – do once and never to be re-done / ? of accuracy (who checks the work?) ? of ownership (who keeps/sells/what costs?)
• International level of cooperation (often sharing and support beyond what governments can!!)
o Blurring between boundaries of libraries, museums, archives
 See Rayward (1995) conference speech
 Blurring especially in what/why to collect, what/how to record and make accessible (eg. Digital formats and uses like the online site of the Australian National Museum in Canberra)
 Discussion questions on this blurring: what info is each dealing with? How diff is it? How does it change what info we might want to org and ret?

• My thoughts: doesn’t it come down to the info-wants&needs of the user in personalized situation and tasks? Obviously some portion of info choices and management have to be made by lib/mus/arch with the user in mind, otherwise vast collections would go unused and unvisited. Wasted. Some portion has to be about collecting for posterity, some portion has to be about “hey, look what we’ve got for you!” – obviously, again, info org and needs of personal and institutional have to balance.

• Made forum posting (mod 1) about response to Hider p.21-23: Main points are about standards for organizing info like resource descriptions and catalogue records, agreeing to and participating in metadata exchange globally that maximize benefits about cost, accuracy of description and ultimate accessibility of resources info packages. Standards aren’t up to Golden Key level, and may never be, but efforts have to continue because of the information explosion – knowledge is socially valuable, so any info “wandering free” feels like a loss. Addendum 4.8.10: also thought that standards have to be upheld and develop usefully (like AACR2 becoming RDA) because there is too much already invested in the org of info across many info agencies across the centuries! Can’t walk away from all that, or risk its careless loss.

• Read Helen Rowling’s forum posting (Module 1 13.7.10 9.53pm), what follows here are some of her thoughts, elaborated a little by me:
o So what are libraries FOR? Traditionally physical place to physically go to physically get hands on physical reading matter. Collection (local or state or private) was 100% in-house. Now, with web, digital travel, digital access, and digital communication mix heavily with the physical. Libraries “go virtual” and still let you flip through the paper, library users now used to that mix of phys/dig.
o So what are museums FOR? To collect things, especially to do with humanity and society (contextually always relates back to us, we are our favourite subject!) / those who don’t remember, understand or respect their past are doomed to repeat it, combined with my great-grandchildren will be able to see this
o So what are archives FOR? To keep org paperwork of companies, governments, wealthy individuals etc / historical context has to be applied, papers and other things only very raw info data / the devil is in the detail, if you can find him.

• Taylor’s 2004 article deals with the human need to organize / “we need to organize because we need to retrieve” (2004, p.23) / we org info – Stoll’s order: data, info, knowledge, understanding, wisdom – Taylor says und and wis intertwined / p4. “information package” the term used for “text” / archives often have very unique collections / internet disorganized but “parts of it will be brought under organizational control” p.16 / how to record, archive, preserve digital data, documents and objects from the web? We don’t know yet. DCore development in progress and we can keep “at least some” for posterity but not all p.8 / trad attitudes to collections priveledge books and items – possession/ownership of a copy is control of it at a quite absolute level, in comparison digital objects like webpages are quite ephemeral and not so easy to bring under possessive controls.

• Organising Tools
o Library catalogue
o Archive finding tool
o Museum register
o Index – essentially a set of resource descriptions – small data entries or larger records per access field
o Alice Ferguson (CSU Library DE section, 90% work matching info with requests) – spoken recording
 “major importance of organized information” through lib catalogue – if can’t find res, can’t use it / has to be functional “reliable process” giving “accurate info” and relevant for usefulness value of material / compares reliability of lib cat with internet search engines: web dominates modern searching habits but “ability to pay has nothing to do with authority or relevance” / lib cat built to strict commercial standards, uses auth files etc therefore lib cat services have “value added nature” making lib cat an essential tool to find reliable source of info, recreation and knowledge

• Objectives for an effective, efficient catalogue
o Cutter’s objectives 1904
o Modern interp of Cutter / what an info retrieval source should be able to do:
 Find info resource in any format (with author, title or subject search)
 Show what resources (in any format) can be offered for user selection
 Assist in choice of info resource which meets specific info needs of user
 Deliver or provide access to copy of resource to user
o These objectives the ideal, not all info ret sources can do all of above / modern times – access to digital copy more common plus also still physical resources to go to
• Module notes: “today’s library provides a gateway to information stored all over the world”

• In school library “organization of bibliographic info that users of libraries … need in order to find and select the info resources that allow them to acquire the knowledge they seek” (Harvey and Hider 2008 preface) -- keep in mind that all school library users are learners, so prime usefulness of library resources is in encouraging learning and development of knowledge and skills / every user should get learning benefit from interaction with school OPAC, TLs etc.

• Accuracy, efficiency and user-friendliness of school lib catalogue due to TL input and maintainence / how many users bypass OPAC and go straight to TL? / why? I think via Kulthau’s ISP that user needs to formulate clear ideas and keywords first. Catalogue won’t do this magically for users (kids and teachers too!!!), so web search engines appear much more attractive and helpful (even when they’re clearly v. difficult to use well) / TL has to work on getting users to understand and use value-added catalogue!!