Semantic Embed: Part 2

This is my second posting on an event by the New York Semantic Web Meetup, which covers all aspects of the W3C recommended Semantic Web from technology to business. An offshoot Meetup, which will focus more on natural language processing, computational linguistics, and machine learning is supposed to start having meetings in January, and I plan to be there. See my first Meetup post here.

Semantic Web Programming – the book (John Hebeler)
The first slide in John Hebeler’s presentation last night had just one sentence: “Our ability to create information far exceeds our ability to manage it,” which is actually the best and most succinct argument for the Semantic Web that I’ve heard thus far. Hebeler made his point more visceral by asking us to guess how many files there were on his MacBook (the answer is over a million, about twice as many as most of us guessed). Imagining that many files on every computer hooked up to the Internet (there were over 1.5 billion Internet users as of June 30) is already overwhelming. And the bigger this mass of information gets, the stronger its pull toward entropy and the more we lose control. It’s something that should scare us, Hebeler said, because all that information is only as useful to us as our tools to sort through it; if we can’t find what we want, it’s the same as having lost it.

Luckily, Hebeler sees our salvation in the Semantic Web – or more specifically, in a highly flexible knowledge base that can handle both complex and simple types of data – and he’s co-authored the book to guide us there. It looks like it’s pretty easy to use: I’m not much of a programmer, but even I could follow the examples, all of which are demonstrated using Java code in the book. In trying to integrate data from, for example, Facebook and Gmail, which represent it in totally different formats, Hebeler gave us seven basic steps, or areas of code:

1) Knowledge-base creation

2) How to query it – just a simple search

3) Setting up your ontologies

4) Ontology/instance alignment – combine two ontologies, for example by teaching your program that what one ontology calls an “individual” is the same thing as what the other calls a “person,” or that “Kathryn” is equivalent to “Kate”

5) Reasoner – your program won’t incorporate its new understanding of equivalencies until you apply the reasoner

6) OWL restriction – allows you to apply constraints

7) Rules – allows you to apply rules

He and the other co-authors also maintain a website where they field questions and add updates about the book.

Lucene (Otis Gospodnetic)

The Lucene presentation by Otis Gospodnetic was aimed primarily at programmers who might want to use the Lucene software for indexing and searching of text. Lucene is actually just one piece of Apache Lucene, an Apache Software Foundation open-source project that includes other sub-projects like Nutch (a framework for building web-crawlers) and Solr (a search server). All of it, of course, is free, and since I’m not expert enough to vouch for any of it, I’d suggest checking out the Apache Lucene website where everything is available for download.