Semantic Embed: Part 3

Librarians are all about categorizing things. So is RDF. Are they a good match? That’s what I was thinking about at my third New York Semantic Web Meetup event…

The Librarian and RDF (Barbara McGlamery)

Barbara McGlamery, the first librarian of the evening, is actually an ex-librarian and, recently, an ex-ontologist for Time Inc (she just left Time for Martha Stewart). She talked about the application of the Semantic Web to Time’s online content, which basically follows an ontology-lite approach that consists of 1) setting up ontologies to define some rules and properties, 2) importing them into taxonomies, where resources (like ‘Will Smith’) are described, and 3) creating ‘navigational taxonomies’ so that editors and other people can access the information in whatever ways they want (for example, by using alternate names). Whenever an editor publishes a new article, he or she manually tags it with all the relevant resources, which makes it possible for machines to do basic inferences (like noticing that you were reading an article about Will Smith and recommending articles to you about movies Will Smith starred in, based on its awareness of the RDF triple: ‘Will Smith is a leadPerformerIn Hancock’). Which sounds great, except that the inferencing part didn’t work that well. McGlamery explained that the information just ended up being too heavy, which meant that inferencing was slow and couldn’t be very complex.

I thought Time’s attempt to plunge into the Semantic Web was admirable (they were apparently very early adopters of the technology), but I couldn’t quite understand their reasons for it until it became clear that it was just-another-Old-Media story. Sure, Time was adopting innovative technology, but it was for decidedly non-innovative ends: as another means of control over their content. After her talk, I asked McGlamery why Time had even bothered with all this Semantic Web inferencing for their article-recommendation feature – why not just recommend articles that were popular with other readers like you? That’s how most recommendation engines work. McGlamery’s reply was that Time is a hundred-year-old company and therefore favors the ‘curatorial’ approach over the crowdsourcing one, which I think explains why ontologies looked so good to them. I’ve talked before about how ontologies are in some sense a form of control – though I think they can be used for great things, especially in the news business. The question is just whether Time is going about using them in the right way…

…which is something I won’t answer, but instead briefly describe:

Jon Phipps’ Rant about RDA

Actually, this story isn’t much different from the first one. Jon Phipps’ rant was also about old control-systems adjusting (or failing to adjust to) to the new landscape of data and metadata. RDA stands for ‘Resource Description and Access’ and at this point consists of 1300+ pages intended to represent the collective wisdom of generations of catalogers. Phipps still thinks cataloging is worth doing (especially the informal kind that everyone does when they tag a photo in Flickr or bookmark a site on Delicious), but was mostly frustrated about the inflexibility of legions of catalogers in transitioning from their old rules to new ones.

Quote of the evening (from an audience member): “You can’t even get people to use Excel in the public library system. RDF? Forget about it.”