Communication

(Photo: Cambodia4kids.org Beth Kanter/Flickr/CC by 2.0))

How Do You Say, ‘Read Me?’ Or Choosing Keywords to Retrieve Information

October 21, 2014 2204

(Photo: Cambodia4kids.org Beth Kanter/Flickr/CC by 2.0))

Consider two titles for this article. Title #1, “How Do You Say, ‘Read Me?'” and Title #2, “Choosing Keywords to Retrieve Information.” Which would invite more readers? Title #1 may be more alluring, but #2 describes more adequately what you are reading now. Before journals attached keywords to articles, their titles alone had to attract readers. In B.C. years (before computers), readers scanned tables of contents for interesting articles. Today, they search the Internet for articles that match their research interests. How much do supplementary keywords aid their search?

There is a difference between viewing a publication or table of contents open before your eyes and searching for information stored in archives. Seeing an intriguing title in a table of contents, we turn to it, impulsively responding to the title’s allure. Title #1 might elicit that response. When searching for information from invisible sources, we behave more deliberately. Then we need descriptive keywords, such as in Title #2. In these modern times, we assume that attaching relevant keywords to either title would facilitate the article’s retrieval by those interested in the role of keywords in scholarly research. In olden times, titles alone provided keywords for computer searches.

Yes, computers existed before the Internet, and they were sometimes used to retrieve textual information in article titles. Here is how it worked: The titles were recorded on punchcards and fed into a mainframe computer. The computer then generated an alphabetical index to all their “key” words, defined as all words not on a list of “stop” words—such as prepositions, articles, and conjunctions. The result was dubbed a Key-Word-In-Context (KWIC) index. The printed index alphabetized the keywords in a central column surrounded by other words in the title that preceded and followed them. Hence, the name, Key-Word-In-Context.

The figure below reproduces a portion of the KWIC index to 2,614 articles published in the American Political Science Review from 1906 to 1963. The first published use of a KWIC index appears to have been in September 1958 for an IBM bibliography, “Literature on Information Retrieval and Machine Translation.” In 1963, there was a KWIC index to transliterated titles from original Russian and social science sources. Updated KWIC indexes to the APSR were published in 1968 and 1996. Other early applications are mentioned elsewhere. So much for history.

Portion of the 1964 KWIC Index to the American Political Science Review

In a KWIC index, each article appears as many times as the keywords it contains. The 2,614 articles in the 1964 APSR index generated 10,089 alphabetical entries. Back then, computers only recognized uppercase text. (Why that occurred requires an arcane discussion of binary bits and is better avoided.) Note that some titles (e.g., ABSENT VOTERS) were occasionally tagged with supplementary keywords (e.g., LEGISLATION) to provide additional handles for retrieval. Although many more titles in the APSR cried out for more description, keywords were assigned sparingly.

Over 50 years ago, R.A. Kennedy advised authors how to write informative titles. His advice deserves retelling now:

Consider the title as a one-sentence abstract. Without attempting to summarize the content of your paper, make the title reflect the subject as definitively and concisely as possible. …

Provide sufficient context, but only enough, to clarify the relationships between the selected technical keywords. Remove words which tell the reader little or nothing. Use short connectives like of, for, the, on,. . . rather freely. Use conventional phrases like Introduction to or Analysis of or Status of only where they are important in indicating the nature and level of the paper.

Even if all contemporary authors followed Kennedy’s advice and wrote informative titles, supplementing the titles with keywords can still improve search and retrieval. Unfortunately, authors often choose keywords that do not help. Evidence for this claim comes from my study of 2,944 keywords attached to the titles of 700 articles published over the first 20 years of the journal Party Politics from 1995 through 2014. Here are some excerpts from that analysis:

Studying the keywords reveals a good deal about topics that concerned authors over the last 20 years, but the keywords could have been more informative. . . .

Nearly 35 percent of all keywords were ‘one and done’ – idiosyncratic terms mentioned only once. . . .

The most frequent keyword was political parties, which appeared 85 times and accounted for nearly 3 percent of all keywords. . . .

Given that most of the articles in Party Politics deal with political parties, the keyword political parties does not distinguish articles from one another so lacks value for searching within the journal.

Moreover, author-supplied keywords are notably inconsistent in form and phrasing. For example, authors supplied the terms parties, party, political parties, and political party as keywords. One would need to search for all four variations to retrieve all the articles. Moreover, some authors embedded the term in other phrases (e.g., right-wing parties), further complicating retrieval.

My analysis concluded that keywords serve three different values when attached to journal articles. They provide (1) face value, (2) internal retrieval value, and (3) external retrieval value. Proper choice of keywords depends on which values one tries to serve.

Face value

Just by looking at the first page of an article, busy readers can scan its keywords to determine whether they want to read it. Thus keywords have ‘face value’ in attracting readers’ interest. Unlike titles, keywords need not make a coherent statement. Authors can advertise whatever they regard as their articles’ salient features. . . . However, these specialized terms – specialized because they appeared so rarely – do not serve readers who search through keywords to retrieve articles that fit research interests. They need keywords with either internal or external retrieval value, depending on whether they search internally within Party Politics or externally across other journals.

Internal retrieval value

Given that most of the articles in Party Politics deal with political parties, the keyword political parties does not distinguish articles from one another so lacks value for searching within the journal. A better argument might be made for party system, which suggests special treatment to parties’ interactions, not generally covered in articles about individual parties. Nevertheless, a case can be made for substituting other keywords that describe party system properties – such as interparty competition, fragmentation, or ideological polarization – and for avoiding party system altogether.

External retrieval value

Suppose one did not want to limit the search to Party Politics but wished to search across a set of different journals, for example all various journals published by SAGE. Then political parties becomes a useful keyword, as demonstrated using the search function over all SAGE publications. On July 30, 2014, a search for political parties among article keywords using SAGE’s ‘advanced search’ capability turned up 269 articles, which included the 89 (85 for political parties alone and four for the term juxtaposed with others) in Party Politics alone. Ironically, the keyword political parties may be more useful when linked to articles in other SAGE journals than to those in Party Politics.

That keyword analysis was my last act as co-editor of Party Politics before retiring as one of its founding editors. On my way out, I proposed that we subject authors’ keywords to a standard procedure in library science. Libraries exercise ‘authority control,’ assigning a single, distinctive name for each topic. Henceforth, authors of accepted manuscripts will select up to four keywords from an authority list of keywords, organized under a series of headings. They can also offer a fifth keyword not on the list – or to choose a fifth from the list. Allowing authors to choose a ‘wild card’ term allows adding ‘face value’ to their article’s profile. Choosing all other terms from the authority list implements the keywords’ ‘internal information retrieval’ value, favoring searches conducted within Party Politics instead of across different journals.

Kenneth Janda

Kenneth Janda is the Payson S. Wild Professor Emeritus at Northwestern University. He has been co-editor of the international journal Party Politics since 1995, and in 2000 he won the Samuel J. Eldersveld Career Achievement Award from the American Political Science Association (APSA) for his research on political parties. In 2005, he was co-winner of the Best Instructional Software Award from the APSA, an award he also received in 1992. Janda’s books include Political Parties: A Cross National Survey (1980) and, with co-author Robert Harmel, Parties and Their Environments: Limits to Reform? (1982).

View all posts by Kenneth Janda

Published

October 21, 2014

Second Edition of ‘The Evidence’ Examines Women and Climate Change

By Aaron Knigin

Read Now

Did the Mainstream Make the Far-Right Mainstream?

Communication

February 27, 2024

Did the Mainstream Make the Far-Right Mainstream?

By Aurelien Mondon

Read Now

The Use of Bad Data Reveals a Need for Retraction in Governmental Data Bases

Communication

February 1, 2024

The Use of Bad Data Reveals a Need for Retraction in Governmental Data Bases

By Janet Freilich

Read Now

Social Science Bites

January 8, 2024

Safiya Noble on Search Engines

By Social Science Bites

Read Now

Did Turing Miss the Point? Should He Have Thought of the Limerick Test?

David Canter 156 Communication, Innovation

David Canter is horrified by the power of readily available large language technology.

Read Now

The Silver Lining in Bulk Retractions

Social Science Space 777 Communication, Ethics, Industry

This is the opening from a longer post by Adya Misra, the research integrity and inclusion manager at Social Science Space’s parent, Sage. The full post, which addresses the hows and the whys of bulk retractions in Sage’s academic journals, appears at Retraction Watch.

Read Now

Fake News, Misinformation Focus of New Microsite

Christopher Everett 855 Communication, Resources

A new Information Literacy Microsite from sage can be your new home for pressing research on the digital age and the ways to combat mis-, dis-, and misinformation.

Read Now