How machine learning is building Semantic Web

Tom Mitchell, who leads the machine learning department Carnegie Mellon, has been applying machine learning technology to help develop the Semantic Web. In a recent interview, Mitchell spoke about the application of machine learning analysis to MRI brain scans to determine in real-time what a person is thinking.

Unfortunately, his comments on the Semantic Web didn't make it into final cust of that story. Here's what he had to say on the subject:

In what way is your research in machine learning helping with the advancement of the Semantic Web?

We're beginning to understand how to train systems to analyze text and get more of the real meaning out of it.

We have a project that learns to extract information from the Web. I begin by saying "Here are the categories and here are the relations between the categories I'm interested in." Then for each category I'm required to give a dozen examples.

Yahoo provided us the computing facilities to run this thing. We run on a cluster of 100 high-end computers with a lot of memory. The machine learning algorithms we developed work better the more Web pages you give them. So we recently upgraded from a collection of 200 million Web pages to 500 million.

Starting with just that information the algorithm is able to learn the patterns of text that indicate something is a company or a CEO. It learns millions of text patterns that indicate these different categories and relationships. We let it learn for about four or five days and it comes back with 100,000 extracted beliefs. The accuracy is about 95%.

How will that change the Web?

Over the next decade we're going to see significant advances in the ability of computers to read content on the Web. When you realize how big the Web is, you realize that it's going to make a very large difference in what we can count on computers to do for us.