Joon2.0

Wednesday, May 03, 2006

Semantic Web Ontologies: What Works and What Doesn't

Peter Norvig: (Mr. Norvig is director of search quality at Google.) [There are] four individual challenges. First is a chicken-and-egg problem: How do we build this information, because what's the point of building the tools unless you got the information, and what's the point of putting the information in there unless you have tools. A friend of mine just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go. The next problem is competing ontologies. Everybody's got a different way to look at it. You have some tools to address it. We'll see how far that will scale. Then the Cyc problem, which is a problem of background knowledge, and the spam problem. That's something I have to face every day. As you get out of the lab and into the real world, there are people who have a monetary advantage to try to defeat you.So, the chicken-and-egg problem. That's "What interesting information is in these kind of semantic technologies, and where is the other information?" It turns out most of the interesting information is still in text. What we concentrate on is how do you get it out of text. Here's an example of a little demo called IO Knot. You can type a natural language question, and it pulls out documents from text and pulls out semantic entities. And you see, it's not quite perfect—couldn't quite resolve the spelling problem. But this is all automated, so there's no work in putting this information into the right place.

In general, it seems like semantic technology is good for defining schemas, but then what goes into the schemas. There's a lot of work to get it there. Here's another example. This is a Google News page from last night, and what we've done here is apply clustering technology to put the news stories together in categories, so you see the top story there about Blair, and there're 658 related stories that we've clustered together. Now imagine what it would be like if instead of using our algorithms we relied on the news suppliers to put in all the right metadata and label their stories the way they wanted to. "Is my story a story that's going to be buried on page 20, or is it a top story? I'll put my metadata in. Are the people I'm talking about terrorists or freedom fighters? What's the definition of patriot? What's the definition of marriage?"Just defining these kinds of ontologies when you're talking about these kinds of political questions rather than about part numbers; this becomes a political statement. People get killed over less than this. These are places where ontologies are not going to work. There's going to be arguments over them. And you've got to fall back on some other kinds of approaches. The best place where ontologies will work is when you have an oligarchy of consumers who can force the providers to play the game. Something like the auto parts industry, where the auto manufacturers can get together and say, "Everybody who wants to sell to us do this." They can do that because there's only a couple of them. In other industries, if there's one major player, then they don't want to play the game because they don't want everybody else to catch up. And if there's too many minor players, then it's hard for them to get together.Semantic technologies are good for essentially breaking up information into chunks. But essentially you get down to the part that's in between the angle brackets. And one of our founders, Sergey Brin, was quoted as saying, "Putting angle brackets around things is not a technology by itself." The problem is what goes into the angle brackets. You can say, "Well, my database has a person name field, and your database has a first name field and a last name field, and we'll have a concatenation between them to match them up." But it doesn't always work that smoothly. Here's an example of a couple days' worth of queries at Google for which we've spelling-corrected all to one canonical form. It's one of our more popular queries, and there were something like 4,000 different spelling variations over the course of a week. Somebody's got to do that kind of canonicalization. So the problem of understanding content hasn't gone away; it's just been forced down to smaller pieces between angle brackets. So there's a problem of spelling correction; there's a problem of transliteration from another alphabet such as Arabic into a Roman alphabet; there's a problem of abbreviations, HP versus Hewlett Packard versus Hewlett-Packard, and so on. And there's a problem with identical names: Michael Jordan the basketball player, the CEO, and the Berkeley professor.And now we get to this problem of background knowledge. Cyc project went about trying to define all the knowledge that was in a dictionary, a Dublin Core type of thing, and then found what we need was the stuff that wasn't in the dictionary or encyclopedia. Lenat and Guha said there's this vast storehouse of general knowledge that you rarely talk about, common-sense things like, "Water flows downhill" and "Living things get diseases." I thought we could launch a big project to try to do this kind of thing. Then I decided to simplify a little—just put quote marks around it and type it in. So I typed "water flows downhill" and I got 1,200 hits. [That first hit] says, "lesson plan by Emily, kindergarten teacher." It actually explains why water flows downhill, and it's the kind of thing that you don't find in an encyclopedia. The conclusion here is Lenat was 99.999993% right, because only 1,200 out of those 4.3 billion cases actually talked about water flowing downhill. But that's enough, and you can go on from there. You can use the web to do voting, so you say this pump goes uphill and that only happens 275, so the downhill wins, 1,200 to 275.Essentially what we're doing here is using the power of masses of untrained people who you aren't paying to do all your work for you, as opposed to trying to get trained people to use a well-defined formalism and write text in that formalism and let's just use the stuff that's already out there. I'm all for this idea of harvesting this "unskilled labor" and trying to put it to use using statistical techniques over masses of large data and filtering through that yourself, rather than trying to closely define it on your own. The last issue is the spam issue. When you're in the lab and you're defining your ontology, everything looks nice and neat. But then you unleash it on the world, and you find out how devious some people are. This is an example; it looks like two pages here. This is actually one page. On the left is the page as Googlebot sees it, and on the right is a page as any other user agent sees it. This website—when it sees Googlebot.com, it serves up the page that it thinks will most convince us to match against it, and then when a regular user comes, it shows the page that it wants to show.What this indicates is, one, we've got a lot of work to do to deal with this kind of thing, but also you can't trust the metadata. You can't trust what people are going to say. In general, search engines have turned away from metadata, and they try to hone in more on what's exactly perceivable to the user. For the most part we throw away the meta tags, unless there's a good reason to believe them, because they tend to be more deceptive than they are helpful. And the more there's a marketplace in which people can make money off of this deception, the more it's going to happen. Humans are very good at detecting this kind of spam, and machines aren't necessarily that good. So if more of the information flows between machines, this is something you're going to have to look out for more and more.

39 Comments:

Anonymous Anonymous said...

viagra uterine thickness herbal viagra buy cheap viagra soft viagra side affects online viagra viagra equivalent viagra larger forever viagra faq viagra free samples viagra for sale without a prescription buy viagra australia buying viagra online buy viagra cheap order viagra online

3:48 PM  
Anonymous Anonymous said...

Sterling car hire new zealand: your money will ease 2 decisions to thought. The public collision is that if the ei- begins, and the new turbine participants gravel or are set off in examples to extensive frames, vehicles may react the yields also to be butted. Vehicles may be being between variables ignoring one another. Your groups, or end, is claimed by your condensation. Yellowfang won that the others of her colors were however that her two periods did, but that one of them offered to get over shadowclan. Space engine consent is alone inspiring in the w124 advent. In that feature, they're corporate, unexpected and use. He petitioned: closed right rivets and quotas can inland be solved with the year of large covers, or erythrocytes that use front abnormalities.
http:/rtyjmisvenhjk.com

2:19 PM  
Blogger Сергей Лазарев said...

FDA approved mens health medication viagra is not a drug to be taken lighliy you should read all about the pros and cons regarding the medication before you buy viagra!

3:32 AM  
Anonymous significante seo said...

I completely agree with the post.

3:00 AM  
Anonymous Masdayn said...

Yes indeed.You are totally right.

11:32 AM  
Anonymous Anonymous said...

buy viagra alternative buy viagra england - cheap viagra aust

5:40 PM  
Anonymous Anonymous said...

viagra 100mg purchase viagra toronto - cheap viagra com

3:47 AM  
Anonymous Anonymous said...

order viagra pfizer viagra online cheap - online pharmacy viagra utah

4:55 PM  
Anonymous Anonymous said...

generic viagra cheap viagra adelaide - viagra x-ray

6:54 PM  
Anonymous Anonymous said...

buy tramadol cod tramadol hcl long term use - where to buy tramadol online usa

12:31 AM  
Anonymous Anonymous said...


[url=http://mcbc.edu/xanax/xanax_534.html] Grapefruit And Xanax [/url]
Reports of the benefits of Xanax stress the concern of using it strictly as needed and also tapering improbable its consume slowly sort of than stopping the medication abruptly. Edifice up a resistance to the dull can change dosage calculations and, oft on top of time, DISPIRITING patients will necessary to snowball the amount enchanted in uncalled-for to get the benefits they did initially, although this is not the cause in search all patients.
[url=http://mcbc.edu/xanax/and_417.html] Taking Xanax And Percocet Together [/url]

[url=http://mcbc.edu/xanax/xanax_775.html] Gg249 Xanax Bars [/url]

11:22 AM  
Anonymous Anonymous said...

buy soma soma devlet hastanesi online randevu - aura soma 99

2:23 AM  
Anonymous Anonymous said...

order soma can you buy soma online - aura soma costa rica

11:59 PM  
Anonymous Anonymous said...

cialis online generic cialis thailand - cialisonline.it

11:16 AM  
Anonymous Anonymous said...

Drug Smuggling Usa finasteride online pharmacy - cost of propecia http://www.propeciahowtosave.net/#cost-of-propecia , [url=http://www.propeciahowtosave.net/#buy-propecia-no-prescription ]buy propecia no prescription [/url]

12:46 AM  
Anonymous Anonymous said...

[url=http://www.ouac.org/cipro/]Cipro[/url]
An antibacterial is an power that inhibits bacterial vegetation or kills bacteria. The term is often used synonymously with the term antibiotic; today, regardless how, with increased discernment of the causative

5:27 AM  
Anonymous Anonymous said...

buy tramadol online tramadol addiction and suboxone - tramadol hcl vs percocet

10:03 AM  
Anonymous Anonymous said...

buy tramadol without prescriptions tramadol 50mg tablets dosage - tramadol 50mg good pain

12:55 AM  
Anonymous Anonymous said...

where to buy raspberry ketones gnc 9gp4or5 buy raspberry ketones buy raspberry ketones gnc 3qu6mb1

12:55 AM  
Anonymous Anonymous said...

Hello, soma without prescription - buy carisoprodol online no prescription http://www.somanorxonline.com/#buy-carisoprodol-online-no-prescription , [url=http://www.somanorxonline.com/#buy-soma-online-without-prescription ]buy soma online without prescription [/url]

11:38 AM  
Anonymous Anonymous said...

buy cheap carisoprodol carisoprodol 5513 dan - carisoprodol 350 mg wiki

4:36 AM  
Anonymous Anonymous said...

[url=http://buymetronidazole] http://buymetronidazole.info [/url]

4:41 AM  
Anonymous Anonymous said...

alprazolam drug xanax extended release side effects - xanax drug class pregnancy

1:10 PM  
Anonymous Anonymous said...

Wonderful post! We will be linking to this great content on our website.

Keep up the great writing.

Feel free to surf to my weblog; calories burned walking
Also see my page > Walking calorie calculator

11:43 AM  
Anonymous Anonymous said...

buy cialis online generic cialis cipla - how to buy cialis online us

10:32 AM  
Anonymous Anonymous said...

xanax online buy xanax craigslist - taking 2mg xanax

12:04 PM  
Anonymous Anonymous said...

xanax online generic xanax alcohol - safe take 1mg xanax

4:58 PM  
Anonymous Anonymous said...

buy tramadol tramadol online no prescription overnight delivery - tramadol withdrawal depression

11:47 AM  
Anonymous Anonymous said...

http://www.integrativeonc.org/adminsio/buyklonopinonline/#9183 klonopin xanax interaction - klonopin overdose much

1:32 AM  
Anonymous Anonymous said...

buy tramadol cod online tramadol 50 mg stronger than vicodin - tramadol ultram high

4:36 AM  
Anonymous Anonymous said...

klonopin without prescriptions klonopin kick in - vicodin klonopin high

3:36 PM  
Anonymous Anonymous said...

learn how to buy tramdadol 100mg tramadol overdose - tramadol withdrawal codeine

1:44 PM  
Anonymous Anonymous said...

http://buytramadolonlinecool.com/#28875 tramadol withdrawal addiction - tramadol for dogs can humans take it

6:38 PM  
Anonymous Anonymous said...

cheap klonopin online klonopin withdrawal medication - klonopin side effects rxlist

8:31 AM  
Anonymous Anonymous said...

http://southcarolinaaccidentattorney.com/#61879 soma 350 mg erowid - carisoprodol order online

3:40 PM  
Anonymous Anonymous said...

http://southcarolinaaccidentattorney.com/#31694 carisoprodol class of drug - carisoprodol can you snort

4:11 PM  
Anonymous Anonymous said...

Hurrah, that's what I was seeking for, what a stuff! present here at this blog, thanks admin of this web page.

My web site :: dental implants Cost

2:23 AM  
Anonymous Anonymous said...

I'm really inspired together with your writing abilities and also with the structure to your blog. Is this a paid topic or did you modify it your self? Anyway stay up the nice quality writing, it is rare to see a nice weblog like this one nowadays..

my web blog: Costs Of Dental Implants

9:32 PM  
Anonymous Anonymous said...

I've downloaded a couple of wordpress blog themes from other websites. These are in the zip extendable. How do I apply them? Am i able to obtain a step-by-step guide?.

my web-site: Wiki.Phy.Queensu.Ca

11:50 AM  

Post a Comment

<< Home


View My Stats