Archive for November, 2008

The semantic desktop and research papers

I’ve been following the idea of a semantic desktop for a few years now, waiting for someone to implement a framework that enables a user to actually do something that’s useful.  I think that time has come.  It seems as if KDE has managed to integrate the Networked Environment for Personalized, Ontology-based Management of Unified Knowledge (Nepomuk) into their new 4.0 release, and while it’s far from perfect (not least because of the hideous name), it seems at least to be a usable solution and manages to give us a glimpse of the power of the semantic desktop.

So, what’s a semantic desktop and why is it cool?  First, we have to understand why current filesystem managers aren’t cool.  The file/folder hierarchy has been around since the first graphical user interface and for the most part, has handled the task of providing users with a visual of the filesystem in an easy to understand way.  Of course, it’s only a metaphor and “files” and “folders” are actually scattered all over the disk.  The interface presents the information in a hierarchical and linear fashion, which is not even close to how we think about and organise information, and this is where we can start to see the system breaking down.  The metaphor of files and folders that we use for managing information on a computer has worked reasonably well until now.  What’s changed?

When I only had a few thousand files on my computer, it was pretty simple to put them into folders and more or less remember where they were.  Over time, I had to start using dates or descriptions in the files and folder names to give me more information about what it’s contents were.  Again, this served me well until I began my masters thesis and had to start working with large numbers of large documents.  Now, not only did I need to know where I could find certain information (eg. what folder a file was in), I needed to know “deeper” information, such as author, publication and perhaps most importantly, what ideas were in that document (remember, that “document” could include videos and audio files).  The default search application could only index the name and type of the file, so if my document name wasn’t descriptive enough (i.e. have author, title and main idea in it), it’d sometimes take ages to find what I was looking for just by searching.

This was partly solved with Desktop Search, which indexed not only the document location, name and type, but also all the text within the document (if it was supported).  Now we could search by keywords within documents.  Awesome.  Except sometimes ideas are not articulated using the same words across documents.  Or the ideas are related but not the same.  Or you could spell the word/s incorrectly and now your keywords don’t match the keywords in the database.  Besides, Desktop Search couldn’t index the text within an image or the ideas within a video.  So it was a great temporary solution but still not good enough.

It’s a big problem, especially for me.  I can name a file using  author and title, and if I have a good memory (which I don’t), I can maybe remember the gist of the ideas in the document with a few keywords included in the name of the file.  However, try doing that with a 150 page White paper or thesis that contains many different ideas or themes.  It gets worse.  Suppose I have multiple documents, all with different main themes but related subthemes (for example, contradictions of the same idea), or with the same ideas framed in different ways.  Suppose those documents actually deal with different topics in general but each comes to a similar conclusion and I’ve filed them according to the main idea in different folders.  Now I have to remember not only the author, title and main theme of the document, but also the subthemes and their relationships to other documents, in different folders, by different authors, with different titles.  What if their are multiple ideas relating to multiple other documents, as their often are?

As you can see, once you start dealing with large numbers of large documents, multiple themes or ideas and different relationships between all of these things, the file/folder metaphor breaks down pretty quickly.  So, what’s the solution?

The semantic desktop is an idea that has been around for a while but has taken a long time to bear fruit (I’m not sure why, although possibly because there’s not enough demand or because technology limited development).  It suggests that with the huge proliferation of content we store locally (photos, emails, music, text documents and everything else we hoard), finding information and remembering the relationships between that information is going to become increasingly difficult.  An example given often includes trying to remember who emailed you that image you want to show someone, but don’t remember where it is or what it’s called.  It’s the same idea as trying to find that article by that author who had that great idea (we find it easier to recall ideas, rather than specific information like author and location).

So, the semantic desktop is a framework that exists as a data layer within the operating system that “remembers” not only the relationships between objects on your computer (for example, the email address and name of the person who sent a photo) but can also store any metadata you ascribe to it.  Metadata is data about data, so the date information that’s encoded into your photo or the ID3 tag you apply to an MP3 is all metadata.  What if we could ascribe metadata to articles?

We can.  The good people at KDE (there may be others, although I’m not familiar with them) have implemented the Nepomuk framework into KDE 4.0 (another article here) and it seems to be working OK, although right now it’s quite limited.  At this point you can only apply tags to a document, provide a description and rate it.  While that doesn’t sound terribly exciting, think of the possibilities.  Now I can design textual Tags to loosely describe the main themes or ideas within a document (of course, multiple tags are possible, which means describing multiple ideas), as well as use the Description component to highlight the key features of the article, as well as any other information that might be useful.  The rating system could be used to define the strength of an article, for example, newspaper articles might get one star, while systematic reviews could be given 4 stars.

Now it’s possible to search through hundreds of documents in multiple folders (or all thrown together in the same document) by themes or ideas (tags) and quickly establish which of the documents dealing with those themes contain the key points (description) I want to review, as well as determine the strength of the article (ratings).

This is just the beginning of the potential that Nepomuk will bring to the desktop.  It’ll also create a system that will allow people to decide what information (and ideas) to share across distributed environments.  So for example, researchers on the same team can each have access to everyone’s information dealing with that project but not everyone’s personal data.  Sounds pretty cool to me.

Note: Nepomuk and KDE only run on Linux at this point.  However, Qt4 can be compiled to run on Windows, so the coolness could theoretically be coming to you soon.

On my way to the HESS conference in Grahamstown

I thought I’d share some photos I took while I’m on my way to Grahamstown for the Higher Education as a Social Space conference.  I went about half of the distance on some of the smaller roads, went over the Du Toits Kloof pass, through Worcester, Robertson and Montagu to get to the Tredouw pass, and joined the N2 from there.

I’ve been in Sedgefield for 2 days now, mostly going through the conference Abstracts (all 150 of them) to get an idea of which presentations I’d like to see, as well as working on my own presentation.  I think it’s going to be a great conference, which unfortunately means that I’ll probably spend most of the holidays trying to get my head around all the cool things everyone is doing.

Just in case you’re worried I’m working too hard, here’s a few pictures from the road trip, as well as from the beach down the road.

Weapon of mass distraction

The title of this post is taken from the text in an article from Time magazine, called “The Off-line American“, about John McCain’s low level of IT literacy and it’s potential implications for his campaign and presidency if elected.

What I found more interesting though, was the suggestion that for all the potential of the Internet to provide a vast information resource, there’s often an inability for the average user to manage that information.  With too much content to efficiently find what you’re looking for, does this make the resource worthless?

The author mentions a study by Microsoft and the University of Illinois, which “found that it takes, on average, 16 min. 33 sec” for someone to get back to work after being interrupted by an email.  That’s one hour of productivity lost for every 4 emails received (assuming that the person is 1) notified when an email arrives, and 2) opens and reads the message.

The article goes on to mention the Information Overload Research Group, founded by Microsoft, Google and IBM, who are trying to find a solution to the problem.

Reference extract: credibility in academic searching?

I’ve spoken before about the need to teach students how to search, not just by typing keywords into Google, but by being able to validate the search results in terms of credibility.  Reference extract is a new project seeking to do just that, provide credible search results by using librarians (of the human variety) to provide links to credible articles online.

Context, as well as credibility, is important in search.  The example given in the project proposal involves an 8 year old asking for information about black holes.  Google won’t be able to select contextually relevant information, but a librarian will because the librarian is aware of the needs of the user.  This is obviously a simplified example but illustrates the point that semantics and meaning matter in search, and keywords aren’t a particularly useful means of figuring that out.

The project is only in a planning stage but I’m quite excited to see where it goes.

Here’s a link to the home page:
http://referencextract.org/

…and the project proposal:
http://referencextract.org/?page_id=3&page=2

Think different

Here’s a great post by a student asking the education system to change the way it views students.  I thought it was pretty inspiring:

http://students2oh.org/2008/08/07/think-different/

Laptops in class

A few days ago I wrote about employing technology in classrooms and how we need to make sure that it’s appropriate technology and not being used just because we can.  I felt at the time that it probably wasn’t a good idea for students to have their own machines in front of them because of the many distractions present online.

Today I came across an article that discusses the scenario (i.e. laptops in classrooms) from both perspectives, and offers some insight into the issue.  I’m intrigued at the possibility that laptops and internet connectivity may bring some advantage to the classroom.

The one point mentioned in the article that resonates strongly with me is the use of the word “engagement”.  I’ve often felt that students in my classes aren’t actively engaged with the content and recently I’ve started to think about options in terms of encouraging that process.  The idea that managing the expectations of both staff and students is also a powerful factor that’s often left to chance.

I guess it comes back to the point I made in the first article.  It’s not enough to throw technology at learning / teaching and expect it to solve the problem (if there’s even a problem to solve?).  The use of appropriate technology needs to be integrated into the curriculum if it’s to make any positive impact.

Here’s the link to the article:
http://bwatwood.edublogs.org/2008/11/16/students-and-laptops-in-the-classroom/

And to a site related to discussions about the use of laptops in classrooms:
http://cte-laptop.wetpaint.com/?t=anon

The Tower and the Cloud

Just a quick pointer to what I think is going to be a great read.  “The Tower and the Cloud” is a new publication by EDUCAUSE, which looks at the impact of cloud computing on higher education.  The book is divided into broad sections, each containing several chapters, with each chapter written by a different author who is a prominent figure in the field of e-learning.

I’m particularly keen on the section, Open Information, Open Content, Open Source, containing the following chapters (I’ve linked to the downloadable chapters):

The book is available as a free download, as well as a paid-for hardcopy that can be shipped internationally, and is published under a Creative Commons license.  I’m really looking forward to reading this.

Note: EDUCAUSE is a “…nonprofit association whose mission is to advance higher education by promoting the intelligent use of information technology”.

Other books available from EDUCAUSE include:

Lyx: separating content and style through document processing

It’s been a while since I posted anything here, mainly because I haven’t read anything interesting in that time, which is mainly because we’ve spent the past month or so gearing up for undergraduate exams.  Now that exams are effectively over, we’re marking…sigh.  Together with the exams, our department is on a writing workshop in the hope that by the end of the year we’ll each have a peer-reviewed article ready for publication.  While this is a great way to bite the bullet and get something out, it does take away time from the more interesting task of finding and blogging about cool stuff.

So it’s the weekend, I have a huge pile of scripts to mark and an article to complete for review on Monday…and here I am, working on this post.  But it’s work-related, so I don’t feel bad.  The reason it’s work-related is because I’ve recently started using a document processor for writing articles, called LyX.  A document processor differs from a word processor (like OpenOffice) in that it attempts to separate the process of writing from the process of typesetting, or formatting.

This separation of content and style is hardly a new concept but has been increasingly evident in the whole Web 2.0 hype that makes use of the idea that content wrapped in meaningful XML tags can be syndicated in almost any form and presented in almost any format.  In the early days of the web, it was also being addressed in the argument against HTML tags that described the formattin of content, rather than it’s structure.  CSS is what allowed that separation to take place, but not to the degree that XML does.  While this isn’t really the place for that discussion, I just wanted to highlight the point that the separation of content and formatting has been an issue since we started using computers to write documents (here’s a great video by Michael Wesch that demonstrates this idea really well).

The earliest word processors gave everyone the power to format content, which could be argued is a good thing because choice is important, right?  While the ability to decide text colour, font size, page margins and the thousand other options present in a word processor may be great for that letter to your mom, it’s almost meaningless when it comes to academic writing, the formatting of which is already determined by either your institution or publisher.  So when I write, why should I have to bother with formatting?

This is where LyX comes in.  By separating the writing process from the typesetting process, Lyx gives the writer the ability to concentrate on writing, rather than mucking about with trying to figure out how to insert and keep track of in-text citations and all the other soul-destroying aspects of computer-based academic writing.  It also allows you to output your document in any of the major formats you require.  For example, my institution uses the APA style of document formatting, so when I’m done writing, I literally press a button that outputs my work to a PDF document, already formatted for publication.

This post has gotten incredibly long, so I’ll end with a few links to more information if you’re interested in checking it out.  A word of warning though, if you’re not used to the idea that content and style are fundamentally different, there’s a steep learning curve when switching to something like Lyx.

Technology in the classroom: can we make it work?

I’ve been trying to think how to use technology to enhance both my teaching and my students’ learning and it’s proving more difficult than I’d initially thought.  I like to think that laptops and internet access in every classroom give students real-time access to related content while they engage in meaningful discussion, but this will never happen.  Their Facebook profile and IM conversations are far more interesting than the “Pathology of stroke” or “Justice in access to healthcare”.  And that makes sense in a bizarre kind of way.  Even while they (or their parents) pay vast sums in tuition fees to have the privilege of attending university, most students (in my very limited experience) see studying as inherently boring.

Some studies in American classrooms have all but proven that the distraction of the Internet in class is too strong for students to ignore and that most of the lesson is spent checking email, catching up with friends and even shopping.  Now, after that initial foray into “embracing” technology”, it seems as if there’s a move towards banning laptops altogether.

This is the kind of about-turn I’d like to avoid.  E-learning, while I have no doubt will be a revolution in education, is not the idea that technology for it’s own sake is the way forward.  Just because it’s possible to have Internet access in class, does it mean that we should?  Rather, teachers must take an approach whereby technology is used in a way that enhances it’s advantages, while minimising the disadvantages.  Just because I put the course reader online doesn’t make it “e-learning”, and neither does having a student blog.  The technology in itself doesn’t enhance learning in any way, but how you use it can have powerful implications.

I’ve been toying with the idea of using a wiki to manage a course, whereby any change to either the course content, test schedule or mark availability can by syndicated through RSS to all the students in the class.  Students will have to, as a course requirement, both add to and edit course content (obviously moderated), which can also then be tracked.  I think that this may be one way to encourage them to actively engage with the content, as well as introduce concepts like peer review, referencing and drafting, which may also improve their reading and writing skills (another huge problem).  The point though, will be to make the learning outcomes apparent from the beginning, so that students know what’s expected of them.  Merely creating a wiki and telling students to “Go forth and create content” isn’t enough.

I think that technology will fundamentally change the way we teach and how students learn, but not just by throwing technology at the problem.  The trick is to figure out how to use technology to facilitate deep learning by getting students to actively engage with the content.  A bad teacher will continue to teach badly, no matter how much “technology” they use.

Link to the article that inspired this post:
http://www.britannica.com/blogs/2008/10/why-i-ban-laptops-in-my-classroom/

6 minute walk test and pulmonary hypertension

The 6 minute walk test is a common outcome measure of endurance used by physiotherapists and students.  I’m not sure of the full significance of this article, but it seems to indicate that the test may not be a useful indicator of outcome in patients with mild pulmonary hypertension.

Just thought I’d put this out there.  Here’s the link to the article:
http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(08)61725-0/fulltext