\documentclass[conference]{IEEEtran}
\usepackage[pdftex]{graphicx}
\usepackage[pdftitle={Wiki Semantic Enrichment},
            pdfcreator={Chris Murphy},
            pdfauthor={Chris Murphy, Robin Stewart, Andrew Correa},
            pdfsubject={A Final Project Proposal for 6.830 Database Systems},
            pdfkeywords={wiki,semantic,user interface design},
            colorlinks=true,
            pdftex]{hyperref}
\usepackage{cite}
\usepackage{graphicx}

\begin{document}

\title{Wiki Semantic Enrichment\\through UI Design and Content Analysis}

\author{\IEEEauthorblockN{Chris Murphy}
\IEEEauthorblockA{chrismurf@gmail.com}
\and
\IEEEauthorblockN{Robin Stewart}
\IEEEauthorblockA{stewart@csail.mit.edu}
\and
\IEEEauthorblockN{Andrew Correa}
\IEEEauthorblockA{acorrea@csail.mit.edu}
}

\IEEEspecialpapernotice{A Final Project Proposal for 6.830 Database Systems}

\maketitle

\begin{abstract}
\boldmath
The proposed project seeks to further enhance ``Wiki'' systems with semantic data about the content that is represented, allowing for querying and other high-level interpretation tasks. This work will ideally build on currently-existing systems in addition to current ``Semantic'' and ``Structured'' Wiki research.
\end{abstract}

\section{Introduction}
Ward Cunningham, creator of the first Wiki, describes the Wiki as ``the simplest online database that could possibly work''\cite{wikipedia}.  This ``simplest'' of online database models has spread like wildfire, and there are numerous open source and commercial Wiki offerings.  Users of these websites are presented with a simple interface for creating freeform textual documents, often with simple formatting and markup commands available to them.  Unfortunately, the resultant documents are often \textit{too} simple -- the resultant documents are typically semantically-poor, and allow for little post-processing, querying, or aggregation.

\section{Current Work}
More recently, two primary approaches to solving this problem have emerged.  The first, currently in use by Wikipedia, is the informal and individual specification of additional topic-appropriate information for an article through an information or navigation box.  In the case of states, a navigation box may link to each other state.  In the case of an animal, an information box may provide scientific information such as genus or the endangered status of the species.  In some cases this data is provided in an \href{http://microformats.org/}{HTML microformat} to allow for client-side aggregation with tools like \href{http://www.kaply.com/weblog/operator/}{Operator}.  

The second approach, pursued by a number of Semantic Wikis, allows for formal specification of ontologies or informal `folksonomy' specification.  Examples of projects pursuing this approach include OntoWiki (\url{http://ontowiki.net/Projects/OntoWiki}), SweetWiki (\url{http://argentera.inria.fr/wiki/}) and the Semantic MediaWiki (\url{http://en.wikipedia.org/wiki/Semantic_MediaWiki}).  These offerings are often very strict in their interpretation of ``semantics'', owing to their roots in the ``Semantic Web'' and ``Knowledge Representation'' research communities.

In addition, the \href{DBPedia}{http://wiki.dbpedia.org/} seeks to capitalize on Wikipedia's huge knowledge base.  The project has developed a set of tools to map data from Wikipedia into semantically meaningful data.  Results so far are impressive, allowing for queries like ``Find things near the Eiffel Tower'', though they must be entered in SPARQL.

\begin{figure}
\centering
\includegraphics[width=2.5in]{wikibook.jpg}
\caption{The book that started it all.}
\label{fig:wikiway}
\end{figure}

OntoWiki is essentially a thin presentation layer over underlying RDF triples, and seems to lose the informality that is the hallmark of a wiki.  SweetWiki allows informal folksonomy creation and editing with a powerful interface, but requires knowledge of SPARQL (a semantic web query language) and other semantic web concepts before it can be used to its full potential.  The Semantic MediaWiki provides an easy way of inserting semantic information through the use of embedded tags (e.g. Germany is a [[country]] that has [[has capital::Berlin]]) and by adding semantic meaning to Wikipedia-style info boxes.  Semantic MediaWiki's approach has great promise from a content-creation standpoint, and offers a few query different sources.

\section{Project Proposal}
\textbf{Currently really just chris's thoughts on things...}\\
I think we should probably do something with Semantic MediaWiki, because it's the best game in town at the moment, unless we want to create something unique / from scratch.  DBPedia also seems very cool, and worth looking at.

A wikipedia query interface building on existing ``info box'' / etc. work?
A better UI for SMW?
Automatically defining microformats from user-selected ``similar'' documents?
Converting tables to queryable objects?
Linking external data easily?
Adding external (SQL, SPARQL, REST) query interfaces?
Something else?  It's 2am, and I'm running out of steam on this, so please -- investigate and let me know what you guys think :-)

\bibliographystyle{IEEEtran}
\bibliography{proposal}
\end{document}
