Joon2.0

Monday, May 29, 2006




MS-구글-야후 사생결단 3파전


[한겨레] 미국 인터넷업계가 합병과 제휴, 경쟁과 견제가 난무하는 ‘전국시대’로 접어들었다. 소프트웨어 거인 마이크로소프트(MS)와 무서운 신예 구글, 전통의 인터넷 강호 야후가 온라인의 진정한 지배자가 누구인지를 가리기 위해 사생결단을 벌이고 있다.
합병·제휴, 경쟁·견제 난무‘짝짓기 계절’ 최후 승자 누구냐…구글 돌풍 지속 ‘촉각’
짝짓기의 계절…동맹을 확보하라=물밑에서 전개되던 싸움은 지난 26일 두 건의 굵직한 거래로 실체를 드러냈다. 이날 검색 1위 업체 구글이 피시 생산 1위 델과의 ‘동맹’을 전격 선언했다. 구글은 델이 만드는 연간 2천여만대의 피시에, 인터넷 기능을 배경화면에 노출시켜 접속을 유도하는 도구막대(툴바)를 설치하기로 했다. 또, 전자우편과 하드드라이브 검색용 소프트웨어도 기본 프로그램으로 깔기로 했다. 인터넷익스플로러 클릭을 필요없게 만들어 피시 이용 초기단계에서 네티즌들을 구글로 끌어들인다는 전략이다.
출시를 앞둔 엠에스의 인터넷익스플로러7이 엠에스엔(MSN) 이외의 검색사이트 접근을 어렵게 만들어 경쟁업체들에 타격을 준다는 불평이 나오는 가운데, 구글이 선수를 치고 나온 셈이다. 이번 제휴를 놓고 구글이 델에 3년간 10억달러를 주기로 했다는 얘기가 흘러나온다고 <뉴욕타임스>가 전했다.
구글의 상승세에 짓눌렸던 야후도 이날 인터넷쇼핑업계 1위 이베이와의 제휴를 발표했다. 야후는 이베이에 그래픽 광고와 검색 콘텐츠를 제공하고, 이베이는 야후가 전자결제 서비스를 이용할 수 있게 했다. 콘텐츠 공유와 서비스 연동으로 서로 부족한 부분을 채워 시너지 효과를 노리는 것이다.
앞서 구글은 엠에스와의 경쟁 끝에 아메리카온라인(AOL)과의 제휴를 따내, 10억달러 어치의 지분 투자에 나섰다. 반면, 아마존닷컴의 검색엔진 공급 다툼에서는 엠에스가 승리를 거뒀다. 이밖에 엠에스의 야후 검색사업 인수설, 야후와 이베이의 합병설 등이 당사자들의 부인에도 잦아들지 않고 있다.
3파전 최종 승자는?=각자의 길에 전념하던 인터넷업체들이 합종연횡 바람에 휩싸인 배경에는 패권을 쥐려는 구글과 야후, 엠에스의 3파전이 자리하고 있다. 특히 검색시장 점유율 50%를 넘보는 구글은 정보기술업계 제왕인 엠에스의 지위를 흔들 정도가 됐다. <파이낸셜타임스>는 “모든 (인수·합병) 논의의 배경에 구글의 부상이 있다”고 분석했다.
물불 안가리는 싸움은 소프트웨어 개발로 성장한 엠에스가 인터넷사업에 승부수를 던지고, 구글도 각종 소프트웨어 개발과 배포로 엠에스의 사업영역을 침범하고 나서면서 한층 치열해졌다. 구글과 엠에스는 또 루퍼트 머독의 뉴스코프가 지난해 인수한 커뮤니티사이트 ‘마이스페이스’와의 제휴 성사를 위한 경쟁에 들어갔다. 8천여만명의 회원을 지닌 마이스페이스와의 제휴는 검색 서비스와 광고의 확대 기회로 여겨진다. 검색과 광고를 연계한 검색광고 시장이 올해 69억달러에 이르러, 온라인 지배를 통한 광고 독식 경쟁이 뜨거워지고 있는 것이다.
엠에스 경영진은 지난해 말 직원들에게 돌린 글에서 “우리는 검색이 매우 중요해질 것을 알았지만, 구글이 강력한 위치를 선점하게 내버려두고 말았다”고 자성했다. 이에 따라 엠에스는 7월에 시작되는 새 회계연도의 인터넷사업 투자 규모를 이전의 10억달러에서 16억달러로 늘리겠다고 이달 초 밝혔다. 빌 게이츠 회장은 윈도 체제와 결합한 새 제품들이 검색분야 경쟁자들을 물리칠 것이라고 호언했다.
하지만 엠에스와 야후의 몸부림에도 구글 돌풍은 가라앉을 기색이 없다. 시장 조사업체 컴스코어네트웍스 자료를 보면, 구글의 검색분야 시장점유율은 지난해 4월 36.5%에서 1년 만에 43.1%로 뛰었다. 야후는 같은 기간에 2.7% 포인트 감소한 28.0%, 엠에스엔은 3.2% 포인트 깎인 12.9%로 내려앉았다. 구글 경영진 중 여럿이 인터넷익스플로러에 의해 시장에서 쫓겨난 넷스케이프 출신이라는 점은 흥미로운 관전 포인트가 되고 있다.


이런 상황에서 구글-델, 야후-이베이의 동맹 결성으로 엠에스는 더욱 초조하게 됐다. 엠에스는 최근 야후의 검색 부문에 지분을 투자하려 했지만, 테리 시멜 야후 회장은 “검색 부문 일부를 파는 것은 한 쪽 팔만을 파는 것처럼 말이 안 된다”고 퇴짜를 놨다.
<월스트리트저널>은 인터넷 사용인구가 성숙단계로 접어든 점도 이들이 더 필사적으로 시장점유율 확대에 나서는 이유라고 지적했다. 이본영 기자 ebon@hani.co.kr

Wednesday, May 03, 2006

Semantic Web Ontologies: What Works and What Doesn't

Peter Norvig: (Mr. Norvig is director of search quality at Google.) [There are] four individual challenges. First is a chicken-and-egg problem: How do we build this information, because what's the point of building the tools unless you got the information, and what's the point of putting the information in there unless you have tools. A friend of mine just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go. The next problem is competing ontologies. Everybody's got a different way to look at it. You have some tools to address it. We'll see how far that will scale. Then the Cyc problem, which is a problem of background knowledge, and the spam problem. That's something I have to face every day. As you get out of the lab and into the real world, there are people who have a monetary advantage to try to defeat you.So, the chicken-and-egg problem. That's "What interesting information is in these kind of semantic technologies, and where is the other information?" It turns out most of the interesting information is still in text. What we concentrate on is how do you get it out of text. Here's an example of a little demo called IO Knot. You can type a natural language question, and it pulls out documents from text and pulls out semantic entities. And you see, it's not quite perfect—couldn't quite resolve the spelling problem. But this is all automated, so there's no work in putting this information into the right place.

In general, it seems like semantic technology is good for defining schemas, but then what goes into the schemas. There's a lot of work to get it there. Here's another example. This is a Google News page from last night, and what we've done here is apply clustering technology to put the news stories together in categories, so you see the top story there about Blair, and there're 658 related stories that we've clustered together. Now imagine what it would be like if instead of using our algorithms we relied on the news suppliers to put in all the right metadata and label their stories the way they wanted to. "Is my story a story that's going to be buried on page 20, or is it a top story? I'll put my metadata in. Are the people I'm talking about terrorists or freedom fighters? What's the definition of patriot? What's the definition of marriage?"Just defining these kinds of ontologies when you're talking about these kinds of political questions rather than about part numbers; this becomes a political statement. People get killed over less than this. These are places where ontologies are not going to work. There's going to be arguments over them. And you've got to fall back on some other kinds of approaches. The best place where ontologies will work is when you have an oligarchy of consumers who can force the providers to play the game. Something like the auto parts industry, where the auto manufacturers can get together and say, "Everybody who wants to sell to us do this." They can do that because there's only a couple of them. In other industries, if there's one major player, then they don't want to play the game because they don't want everybody else to catch up. And if there's too many minor players, then it's hard for them to get together.Semantic technologies are good for essentially breaking up information into chunks. But essentially you get down to the part that's in between the angle brackets. And one of our founders, Sergey Brin, was quoted as saying, "Putting angle brackets around things is not a technology by itself." The problem is what goes into the angle brackets. You can say, "Well, my database has a person name field, and your database has a first name field and a last name field, and we'll have a concatenation between them to match them up." But it doesn't always work that smoothly. Here's an example of a couple days' worth of queries at Google for which we've spelling-corrected all to one canonical form. It's one of our more popular queries, and there were something like 4,000 different spelling variations over the course of a week. Somebody's got to do that kind of canonicalization. So the problem of understanding content hasn't gone away; it's just been forced down to smaller pieces between angle brackets. So there's a problem of spelling correction; there's a problem of transliteration from another alphabet such as Arabic into a Roman alphabet; there's a problem of abbreviations, HP versus Hewlett Packard versus Hewlett-Packard, and so on. And there's a problem with identical names: Michael Jordan the basketball player, the CEO, and the Berkeley professor.And now we get to this problem of background knowledge. Cyc project went about trying to define all the knowledge that was in a dictionary, a Dublin Core type of thing, and then found what we need was the stuff that wasn't in the dictionary or encyclopedia. Lenat and Guha said there's this vast storehouse of general knowledge that you rarely talk about, common-sense things like, "Water flows downhill" and "Living things get diseases." I thought we could launch a big project to try to do this kind of thing. Then I decided to simplify a little—just put quote marks around it and type it in. So I typed "water flows downhill" and I got 1,200 hits. [That first hit] says, "lesson plan by Emily, kindergarten teacher." It actually explains why water flows downhill, and it's the kind of thing that you don't find in an encyclopedia. The conclusion here is Lenat was 99.999993% right, because only 1,200 out of those 4.3 billion cases actually talked about water flowing downhill. But that's enough, and you can go on from there. You can use the web to do voting, so you say this pump goes uphill and that only happens 275, so the downhill wins, 1,200 to 275.Essentially what we're doing here is using the power of masses of untrained people who you aren't paying to do all your work for you, as opposed to trying to get trained people to use a well-defined formalism and write text in that formalism and let's just use the stuff that's already out there. I'm all for this idea of harvesting this "unskilled labor" and trying to put it to use using statistical techniques over masses of large data and filtering through that yourself, rather than trying to closely define it on your own. The last issue is the spam issue. When you're in the lab and you're defining your ontology, everything looks nice and neat. But then you unleash it on the world, and you find out how devious some people are. This is an example; it looks like two pages here. This is actually one page. On the left is the page as Googlebot sees it, and on the right is a page as any other user agent sees it. This website—when it sees Googlebot.com, it serves up the page that it thinks will most convince us to match against it, and then when a regular user comes, it shows the page that it wants to show.What this indicates is, one, we've got a lot of work to do to deal with this kind of thing, but also you can't trust the metadata. You can't trust what people are going to say. In general, search engines have turned away from metadata, and they try to hone in more on what's exactly perceivable to the user. For the most part we throw away the meta tags, unless there's a good reason to believe them, because they tend to be more deceptive than they are helpful. And the more there's a marketplace in which people can make money off of this deception, the more it's going to happen. Humans are very good at detecting this kind of spam, and machines aren't necessarily that good. So if more of the information flows between machines, this is something you're going to have to look out for more and more.

How Google beat Amazon and Ebay to the Semantic Web

Friday, July 26, 2002
August 2009: How Google beat Amazon and Ebay to the Semantic Web
By Paul Ford
A work of fiction. A Semantic Web scenario. A short feature from a business magazine published in 2009.
Please note that this story was written in 2002.

It's hard to believe Google - which is now the world's largest single online marketplace - came on the scene only a little more than 8 years ago, back in the days when Amazon and Ebay reigned supreme. So how did Google become the world's single largest marketplace?
Well, the short answer is “the Semantic Web” (whatever that is - more in a moment). While Amazon and Ebay continue to have average quarterly profits of $1 billion and $1.8 billion, respectively, and are successes by any measure, the $17 billion per annum Google Marketplace is clearly the most impressive success story of what used to be called, pre-crash, “The New Economy.”
Amazon and Ebay both worked as virtual marketplaces: they outsourced as much inventory as possible (in Ebay's case, of course, that was all the inventory, but Amazon also kept as little stock on hand as it could). Then, through a variety of methods, each brought together buyers and sellers, taking a cut of every transaction.
For Amazon, that meant selling new items, or allowing thousands of users to sell them used. For Ebay, it meant bringing together auctioneers and auction buyers. Once you got everything started, this approach was extremely profitable. It was fast. It was managed by phone calls, emails, and database applications. It worked.
Enter Google. By 2002, it was the search engine, and its ad sales were picking up. At the same time, the concept of the “Semantic Web,” which had been around since 1998 or so, was gaining a little traction, and the attention of an increasing circle of people.
So what's the Semantic Web? At its heart, it's just a way to describe things in a way that a computer can “understand.” Of course, what's going on is not understanding, but logic, like you learn in high school:
If A is a friend of B, then B is a friend of A.
Jim has a friend named Paul.
Therefore, Paul has a friend named Jim.
Jim has a friend named Paul.
Therefore, Paul has a friend named Jim.
Using a markup language called RDF (an acronym that's here to stay, so you might as well learn it - it stands for Resource Description Framework), you could put logical statements like these on the Internet, “spiders” could collect them, and the statements could be searched, analyzed, and processed. What makes this different than regular search is that the statements can be combined. So if I find a statement on Jim's web site that says “Jim is a friend of Paul” and someone does a search for Paul's friends, even if Paul's web site doesn't have a mention of Jim on it, we know Jim's considers himself a friend of Paul.
Other things we might know for sure? That Car Seller A is selling Miatas for 10% less than Car Seller B. That Jan Hammer played keyboards on the Mahavishnu Orchestra's albums in the 1970s. That dogs have paws. That your specific model of computer requires a new motherboard and a faster bus before it can be upgraded to a Pentium 18. The Semantic Web isn't about pages and links, it's about relationships between things - whether one thing is a part of another, or how much a thing costs, or when it happened.
The Semweb was originally supposed to give the web the “smarts” it lacked - and much of the early work on it was in things like calendaring and scheduling, and in expressing relationships between people. By late 2003, when Google began to seriously experiment with the Semweb (after two years of experiments at their research labs), it was still a slow-growing technology that almost no one understood and very few people used, except for academics with backgrounds in logic, computer science, or artificial intelligence. The learning curve was as steep as a cliff, and there wasn't a great incentive for new coders to climb it and survey the world from their new vantage.
The Semweb, it was promised, would make it much easier to schedule dentist's appointment, update your computer, check the train schedule, and coordinate shipments of car parts. It would make searching for things easier. All great stuff, stuff to make millions of dollars from, perhaps. But not exactly sexy to the people who write the checks, especially after they'd been burnt 95 times over by the dot-com bust. All they saw was the web - the same web that had lined a few pockets and emptied a few million - with the word “semantic” in front of it.
. . . . .
Semantics vs. Syntax, Fight at 9
The semantics of something is the meaning of it. Nebulous stuff, but in the world of AI, the goal has long been getting semantics out of syntax. See, the trillion dollar question is, when you have a whole lot of stuff arranged syntactically, in a given structure that the computer can chew up, how do you then get meaning out of it? How does syntax become semantics? Human brains are really good at this, but computers, are dreadful. They're whizzes at syntax. You can tell them anything, if you tell it in a structured way, but they can't make sense of it, they keep deciding that “The flesh is willing but the spirit is weak” in English translates to “The meat is full of stars but the vodka is made of pinking shears” or suchlike in Russian.
So the guess has always been that you need a whole lot of syntactically stable statements in order to come up with anything interesting. In fact, you need a whole brain's worth - millions. Now, no one has proved this approach works at all, and the #1 advocate for this approach was a man named Doug Lenat of the CYC corporation, who somehow ended up on President Ashcroft's post-coup blacklist as a dangerous intellectual and hasn't been seen since. But the basic, overarching idea with the Semweb was - and still is, really - to throw together so much syntax from so many people that there's a chance to generate meaning out of it all.
As you know, computers still aren't listening to us as well as we'd like, but in the meantime the Semweb technology matured, and all of a sudden centralized databases - and Amazon and Ebay were prime examples of centralized databases with millions of items each - could suddenly be spread out through the entire web. Everyone could own their little piece of the database, their own part of the puzzle. It was easy to publish the stuff. But the problem was that there was no good way to bring it all together. And it was hard to create RDF files, even for some programmers - so we're back to that steep learning curve.
That all changed - suprisingly slowly - in late 2004, when with little fanfare, Google introduced three services, Google Marketplace Search, Google Personal Agent, and Google Verification Manager, and a software product, Google Marketplace Manager.
. . . . .
Google Marketplace Search
Marketplace Search is a search feature built on top of the Google Semantic Search feature, and it's likely nearly everyone reading will have used it at least once. You simply enter:
sell:martin guitar
to see a list of people buying Martin-brand acoustic guitars, and
buy:martin guitar
to see a list of sellers. Google asked for, and remembered, your postal code, and you could use easy sort controls inside the page to organize the resulting list of guitars by price, condition, model number, new/used, and proximity. The pages drew from Google's “classic,” non-Semantic-Web search tools, long considered the best on the Web, to link to information on Martin models and buyer's guides, as well as from Google's Usenet News archive. Links to sites like Epinions filled in the gaps.
So where did Google Marketplace Search get its information? The same way Google got all of its information - by crawling through the entire web and indexing what it found. Except now it was looking for RDDL files, which pointed to RDF files, which contained logical statements like these:
(Scott Rahin) lives in Zip Code (11231). (Scott Rahin) has the email address (ford@ftrain.com). (Scott Rahin) has a (Martin Guitar). [Scott's] (Martin Guitar) is a model (245). [Scott's] (Martin Guitar) can be seen at (http://ftrain.com/picture/martin.jpg). [Scott's] (Martin Guitar) costs ($900). [Scott's] (Martin Guitar) is in condition (Good). [Scott's] (Martin Guitar) can be described as “Well cared for, and played rarely (sadly!). Beautiful, mellow sound and a spare set of strings. I'll be glad to show it to anyone who wants to stop by, or deliver it anywhere within the NYC area.”
What's important to understand is that the things in parentheses and brackets above are not just words, they're pointers. (Scott Rahin) is a pointer to http://ftrain.com/people/Scott. (Martin Acoustic Guitar) is a pointer to a URL that in turn refers to a special knowledge database that has other logical statements, like these:
(Martin Guitar) is an (Acoustic Guitar). (Acoustic Guitar) is a (Guitar). (Guitar) is an (Instrument).
Which means that if someone searches for guitar, or acoustic guitar, all Martin Guitars can be included in the search. And that means that Scott can simply say he has a Martin, or a Martin guitar, and the computers figure the rest out for him.
Actually, I just lied to you - it doesn't work exactly that way, and there's a lot of trickery with the pointers, and even the verb phrases are pointers, but rather than spout out a few dozen ugly terms like namespaces, URIs, prefixes, serialization, PURLs, and the like, we'll skip that part and just focus on the essential fact: everything on the Semantic Web describes something that has a URL. Or a URI. Or something like that. What that really means is that RDF is data about web data - or metadata. Sometimes RDF describes other RDF. So do you see how you take all those syntactic statements and hope to build a semantic web, one that can figure things out for itself? Combining the statements like that? Do you? Come on now, really? Yeah, well no one does.
So Google connects everyone by spidering RDF and indexing it. Of course, connecting anonymous buyers and sellers isn't enough. There needs to be accountability. Enter the “Web Accountability and Rating Framework.” There were a lot of various frameworks for accountability, but this one was certified, finally, by the World Wide Web Consortium, before the nuclear accident at MIT, and ECMA, and it's now the standard. How does it work? Well:
On Kara Dobbs's site, we find this statement:
[Kara Dobbs] says (Scott Rahin) is (Trustworthy).
On James Drevin's site, we find this statement:
[James Drevin] says (Scott Rahin) is (Trustworthy).
And so forth. Fine - but how do you know how to trust any of these people in the first place? Stay with me:
On Citibank's site:
[Citibank] says (Scott Rahin) is (Trustworthy).
On Mastercard's site:
[Mastercard] says (Scott Rahin) is (Trustworthy).
And inside Google:
[Google Verification Service] says (Scott Rahin) is (Trustworthy).
and if
[Citibank] says (Kara Dobbs, etc) is (Trustworthy).
then you start to see how it can all fit together, and you can actually get a pretty good sense of whether someone is the least bit dishonest or not. Now, this raises a billion problems about accountability and the nature of truth and human behavior and so forth, but we don't have the requisite 30 trillion pages, so just accept that it works for now. And that a lot of other stuff in this ilk is coming down the pike, like:
[The United States Social Security Administration] says (Pete Jefferson) was born in (1992).
Which means that Pete Jefferson can download smutty videos and “adult” video games from the Internet, since he's 19 and has a Social Security number. That's what the Safe Access for Minors bill says should happen, anyway. And don't forget the civil liberty ramifications of statements like these:
[The Sherriff's Department of Dallas, Texas] says (Martin Chalbarinstik) is a (Repeat Sexual Offender).
[The Sherriff's Department of Dallas, Texas] says (Dave Trebuchet) has (Bounced Checks).
[The Green Party, USA] says (Susan Petershaw) is a (Member).
Databases are powerful, and as much as they bring together data, they can intrude on privacy, but rather than giving the author permission to become a frothing mess lamenting the total destruction of our civil liberties at the hand of cruel machines, let's move on.
Anyway, when you think about it, you can see why Google was a natural to put it all together. Google already searched the entire Web. Google already had a distributed framework with thousands of independent machines. Google already looked for the links between pages, the way they fit together, in order to build its index. Google's search engine solved equations with millions of variables. Semantic Web content, in RDF, was just another search problem, another set of equations. The major problem was getting the information in the first place. And figuring out what to do with it. And making a profit from all that work. And keeping it updated....
. . . . .
Google Marketplace Manager
Well, first you need the information. Asking people to simply throw it on a server seemed like chaos - so enter Google Marketplace Manager, a small piece of software for Windows, Unix, and Macintosh (this is before Apple bought Spain and renamed it the Different-thinking Capitalist Republic of Information). The Marketplace Manager, or MM, looked like a regular spreadsheet and allowed you to list information about yourself, what you wanted to sell, what you wanted to buy, and so forth. MM was essentially an “logical statement editor,” disguised as a spreadsheet. People entered their names, addresses, and other relevant information about themselves, then they entered what they were selling, and MM saved RDF-formatted files to the server of their choice - and sent a “ping” to Google which told the search engine to update their index.
When it came out, the MM was a little bit magical. Let's say you wanted to sell a book. You entered “Book” in the category and MM queried the Open Product Taxonomy, then came back and asked you to identify whether it was a hardcover book, softcover, used, new, collectible, and so forth. The Open Product Taxonomy is a structured thesaurus, essentially, of product types, and it's quickly becoming the absolute standard for representing products for sale.
Then you enter an ISBN number from the back of the book, hit return, and the MM automatically fills in the author, copyright, number of pages, and a field for notes - it just queries a server for the RDF, gets it, chews it up, and gives it to you. If you were a small publishing house, you could list your catalog. If you had a first edition Grapes of Wrath you could describe it and give it a lowest acceptable price, and it'd appear in Google Auctions. Most of the smarts in the MM were actually on the server, as Google interpreted what was entered and adapted the spreadsheet around it. If you entered car, it asked for color. If you entered wine, it asked for vintage, vineyard, number of bottles. Then, when someone searched for 1998 Merlot, your bottle was high on the list.
You could also buy advertisements on Google right through the Manager for high-volume or big ticket items, and track how those advertisements were doing; it all updated and refreshed in a nice table. You could see the same data on the Web at any time, but the MM was sweet and fast and optimized. When you bought something, it was listed in your “purchases” column, organized by type of purchase - easy to print out for your accountant, nice for your records.
So, as we've said, Google allowed you to search for buyers and sellers, and then, using a service shamelessly copied from the then-ubiquitous PayPal, handled the transaction for a 1.75% charge. Sure, people could send checks or contact one another and avoid the 1.75%, but for most items that was your best bet - fast and cheap. 1.75% plus advertising and a global reach, and you can count on millions flowing smoothly through your accounts.
Amazon and Ebay - remember them? - doubtless saw the new product and realized they were in a bind. They would have to “cannibalize their own business” in order to go the Google path - give up their databases to the vagaries of the Web. So, in classic big-company style, they hedged their bets and did nothing.
Despite their inaction, before long all manner of competing services popped up, spidering the same data as Google and offering a cheaper transaction rate. But Google had the brand and the trust, and the profits.
It took 2 years for over a million individuals to accept and begin using the new, Semweb-based shopping. During that time, Google had about $300 million in volume - for a net of $4.5 million on transactions. But, just as Ebay and Amazon had once compelled consumers to bring their business to the web, the word-of-mouth began to work its magic. Since it was easy to search for things to buy, and easy to download the MM and get started, the number of people actively looking through Google Marketplace grew to 10 million by 2006.
. . . . .
Google Personal Agent
Now, search is not enough. You need service. You need the computer to help you. So Google also rolled out the Personal Agent - a small piece of software that, in essence, simply queried Google on a regular basis and sent you email when it found what you were looking for on the Semweb.
Want cheap phone rates? Ask the agent. Want to know when Wholand, the Who-based theme park, opens outside of London? Ask the agent. Or when your wife updates her web-based calendar, or when the price of MSFT goes up three bucks, or when stories about Ghanaian politics hit the wire. You could even program it to negotiate for you - if it found a first-edition Paterson in good condition for less than $2000, offer $500 below the asking price and work up from there. It's between you and the seller, anonymously, perhaps even tax-free if you have the right account number, no one takes a cut. Not using it to buy items began to be considered backwards. Just as the regular Google search negotiated the logical propositions of the Semweb, the Personal Agent did the same - it just did it every few minutes, and on its own, according to pre-set rules.
. . . . .
Google Verification Service
Finally, Google realized they could grab a cut on the “Web of Trust” idea by offering their own verification and rating service, $15 a year to answer a questionnaire, have your credit checked, and fill in some bank account information. But people signed up, because Google was the marketplace; the Google seal of approval meant more than the government's.
. . . . .
A Jury of Your Peer-to-Peers
Since all the information was already in RDF format, Google's own strategy came back to bite it. Free clones of Google Marketplace Manager began to appear, and other search engines began to aggregate without the 1.75% cut, trying to find other revenue models. The Peer-to-Peer model, long the favorite of MP3 and OGG traders, came back to include real-time sales data aggregation, spread over hundreds of thousands of volunteer machines - the same model used by Google, but decentralized among individuals. Amazon and Ebay began automatically including RDF-spidered data on their sites, fitting it right in with existing auctions and items for sale, taking whatever cuts they could find or force out of the situation.
In 2006, Citibank introduced Drop Box Accounts for $100/month, then $30, then $15, and $5/month for checking account holders. The Drop Box account is identified by a single number, and can only receive deposits, which can then be transferred into a checking or savings account. They were even URL-addressable, and hosted using the Finance Transfer Protocol. Simply point your browser to account://382882-2838292-29-1939 and enter the amount you want to deposit. There's no risk in giving out a secure drop box number, and no fee for deposits. Banks held the account information of depositors in federally supervised escrow accounts. Suddenly everyone could simply publish their bank account number and sell their goods without any middleman at all.
Feeling the pressure, and concerned, just as the music companies had been ears before, that their lead would slip to the peer-to-peer market, Google dropped its fees to 1%, allowed MM users to use Drop Box accounts, and began to charge $25 a year for the MM software and service for sellers, while still making it free for users. After a nervous few months, Google found that the majority of users who sold more than 10 items per year - the volume users - were glad to buy a working product with a brand name behind it; the peer-to-peer networks were considered less trustworthy, and the connection to Google advertising. Google also realized that they could also offer Drop Box accounts, and tie them to stock and money-market trading accounts, which opened a can of worms that we'll skip over here. If you're interested, you can read The Dragon in the Chicken Coop, by Tom Rawley.
Google's financials can, of course, be automatically inserted into your MM stock ticker; right now they're trading at 25,000 times earnings, heralding news of the “New New New New Economy.” You'll get no such heralding here; while they've pulled it off once, the competition is fierce. Google was the dream company for a little less than the last decade, but they're finally slowing down, and it's high time for a new batch of graduate students too itchy to finish their Ph.D.'s to get on the ball. And I'm sure they will.
. . . . .
A Semantically Terrifying Future?
The cultural future of the Semantic Web is a tricky one. Privacy is a huge concern, but too much privacy is unnerving. Remember those taxonomies? Well, a group of people out of the Cayman Islands came up with a “ghost taxonomy” - a thesaurus that seemed to be a listing of interconnected yacht parts for a specific brand of yacht, but in truth the yacht-building company never existed except on paper - it was a front for a money-laundering organization with ties to arms and drug smuggling. When someone said “rigging” they meant high powered automatic rifles. Sailcloth was cocaine. And an engine was weapons-grade plutonium.
So, you're a small African republic in the midst of a revolution with a megalomaniac leader, an expatriate Russian scientist in your employ, and 6 billion in heroin profits in your bank account, and you need to buy some weapons-grade plutonium. Who does it for you? Google Personal Agent, your web-based pal, ostensibly buying a new engine for your yacht, a little pricey for $18 million, sure. But you're selling aluminum coffeemakers through the Home Products Unlimited (Barbados) Ghost Taxonomy - or nearly pure heroin, you might say - so you'll make up the difference.
Suddenly one of the biggest problems of being a criminal mastermind - finding a seller who won't sell you out - is gone. With so many sellers, you can even bargain. Selling plutonium is as smooth and easy and anonymous (now that you can get Free Republic of Christian Ghana Drop Boxes) as selling that Martin guitar. Couldn't happen? Some people say it can, which explains the Mandatory Metadata Review bill on its way through Congress right now, where all RDF must be referenced to a public taxonomy approved by a special review board. Like the people say, may you live in interesting times. Which people? Look it up on Google.

. . . . .
See also: Robot Exclusion Protocol, Google Search, 12:35 AM, Internet Culture Review, and Speculation: ReichOS, in which Hitler learns about computers.

MS가 야후의 지분을 인수하려는 이유

[머니투데이 김유림 기자]마이크로소프트(MS)가 야후의 지분 일부를 인수해 두 회사가 전략적 제휴를 맺는 방안이 추진되고 있다.
MS가 MSN 온라인 네트워크 기반을 야후에 넘기는 대신 야후의 50% 미만 지분을 인수하는 안이 유력하다.
월스트리트저널은 3일(현지시간) 양사의 전략적 제휴는 몇 년 동안 가능성 높은 대안으로 협상 테이블에 올려져 있었지만 최근 들어 MS의 주주들이 적극적으로 스티브 발머 최고경영자에게 야후와의 제휴 추진을 압박하고 있다고 보도했다.
◆ MS "구글 대항마 되겠다"
소프트웨어 공룡 기업인 MS는 인터넷 비즈니스, 그 중에서도 검색 엔진과 광고를 결합한 검색 광고를 차세대 전략 사업으로 정했다. 구글의 대항마가 되겠다는 포석이다.
지난달 27일 성장 전략을 발표하는 자리에서 MS는 "올해 7월부터 시작되는 차기 회계연도에 당초 계획보다 20억 달러 더 많은 비용을 투입할 계획"이라고 밝혔다. 검색 엔진 연구 개발과 온라인 광고 시스템만 전문적으로 다루는 '애드센터'도 곧 공식 출범시킬 계획이다.
그러나 MSN서치의 점유율은 야심을 채우기엔 턱없이 부족하다.
전문조사기관 넷레이팅스에 따르면 지난 3월 MSN서치의 점유율은 10.9%로 지난해 같은 기간의 14.2%보다 더 떨어졌다. 구글과 야후는 각각 49%와 22.5%로 점유율이 늘었다.
이 때문에 사용자를 많이 확보한 검색 엔진은 늘 MS의 관심 대상이다. 지난해에는 타임워너의 AOL인터넷 사업부와 파트너십을 추진했지만 구글이 AOL에 5% 투자하기로 하자 관련 계획을 철회했다.
지난 8일에는 MSN의 부사장에 검색엔진 애스크닷컴의 스티브 버코위츠를 영입했다. 버코위츠는 40여 건의 크고 작은 합병 계약을 이끌어낸 협상의 귀재라는 점에서 검색 엔진의 인수를 담당할 책임자로 주목받고 있다.
야후에게도 구글은 위협적인 존재다. MS 주주들이 스티브 발머 CEO에게 야후와의 제휴를 압박하는 것 역시 이런 이유에서다.
애널리스트들은 MS와 야후의 제휴에 대해 대체로 긍정적인 반응이다. RCM 캐피탈 매니지먼트 월터 프라이스는 "MS 자체 검색 엔진으로 구글을 따라잡으려는 것은 돈과 시간 낭비에 불과할 뿐"이라며 "야후와의 제휴가 오히려 대안이 될 수 있다"고 전망했다.
◆ MS, 문어발식 경영 "초점이 없다"
MS가 인터넷 비즈니스에 취약하다는 것은 주주들의 불만이었다. 규모가 훨씬 작은 구글이 잘 해내는 인터넷 사업을 조직력과 시장 지배력을 겸비한 MS가 못 해낸다는 것 역시 MS로서는 자존심 상하는 일이다.
이 때문에 MS는 최근 들어 '구글 강박증 환자' 처럼 구글 따라하기게 목을 메고 있다.
마켓워치는 이와 관련 "구글은 이미 자신만의 영역을 구축해 놓았다"며 "구글이 어떤 사업을 하느냐에 강박증세를 보이지 말고 MS의 장점에 집중하는 것이 현명하다"고 지적했다.
더 큰 문제는 새 사업 모델에 몰두한 나머지 MS가 강점을 가진 분야에서 조차 버벅대는 일이 잦아졌다는 점이다.
마켓워치는 "MS의 모든 계획 중 가장 우선순위에 와야 할 비스타 출시가 늦어지면서 신뢰성과 수익에 큰 타격을 입게 될 전망"이라고 진단했다.
MSN 메신저 등 네트워크 기반도 수익이 나지 않는다면 포기하는게 현명하다는 지적이 많다. 이 밖에 오피스 2007버전 역시 시장의 기대를 채우지 못하고 있고 비디오게임인 엑스박스도 생각보다는 신통치 않다.
애널리스트들은 "변화를 선도해 온 MS가 변화에 늦어지고 있는 것은 물론 향후 사업의 선택과 집중에서 차질을 빚고 있다"며 우려의 목소리를 내고 있다.
MS의 주가는 지난 99년과 비슷한 수준을 유지하고 있고 월스트리트저널의 보도가 나간 3일 장에서는 33센트(1.4%) 하락한 23.66달러로 마감됐다.
김유림기자 kyr@

MS가 잘될 수 없는 8가지 이유

[이데일리 김현동기자] 세계 최대 소프트웨어 업체인 마이크로소프트(MS)가 내우외환에 시달리고 있다.
밖으로는 신생 구글의 도전에 고전하고 있고, 안으로는 관심을 모았던 `윈도 비스타`의 출시 지연으로 시장의 신뢰를 잃고 있다. 지난 주말에는 기대에 미치지 못한 분기 실적으로 인해 주가가 곤두박질쳤다.

마켓워치의 칼럼니스트인 존 드보락은 3일 `MS의 불안`이라는 칼럼에서 MS가 고전할 수 밖에 없는 이유를 8가지로 제시했다.

1, `윈도 비스타`의 실패 : 정보통신(IT) 수요를 자극할 것으로 관심을 모았던 `윈도 비스타` 출시가 내년 초에나 이뤄질 전망이다. 내년 초에 `윈도 비스타`가 나온다고 하더라도 `윈도 XP`의 개정판 정도에 그칠 것으로 보인다. MS가 최우선적으로 추진했던 `윈도 비스타`의 실패는 엄청난 실망감으로 표출될 것이다.

2. 실망스런 `오피스 2007` : MS 전체 매출의 3분의 1을 차지하는 것이 오피스 프로그램이다. 그런 면에서 새로울 게 없는 `오피스 2007`은 실망스럽다. 여기에다 7가지 다른 버전의 제품이 나올 것으로 예정돼 있어 혼란만 가중시킬 전망이다.

3. `10년전에 포기해야 했던` MSN : MS는 10년전에 이미 MSN을 포기했어야 했다. MS는 광고를 파는 회사가 아니라 광고를 하는 회사여야 한다. MS는 미디어회사가 아니라 소프트웨어 회사다.

4. `돈먹는 하마` MSN 검색 부문 : MS에게 검색 사업은 아무런 의미가 없다.

5. `계획성없는` X박스 360 공급 : `X박스 360` 경쟁력있는 게임기가 될 수 있는 잠재력이 충분하다. 그렇지만 MS는 소니의 플레이스테이션3 출시 지연을 예상하지 못해 충분한 제품을 공급하지 못했다. 이는 MS가 얼마나 계획성이 없고 사업감각이 없는지를 보여주는 사례이다.

6. 터치패드 PC : 몇 년 전 빌 게이츠 MS 회장은 터치패드형 PC가 주류를 형성할 것으로 전망했다. 그러나 지금 상황은 어떤가

7. 경쟁력 상실한 `닷넷` 프로젝트 : MS가 야심차게 추진했던 닷넷 프로젝트는 오픈 소스 운동에 대해 아무런 대응도 할 수 없었다.

8. 구글에 대한 강박관념 : 구글의 성공으로 인해 MS는 이제 자신의 프로젝트에 집중하는 것이 아니라 구글이 뭘 하느냐에 강박적으로 사로잡혀 있다. 공룡 MS가 야후 지분 인수에 나설 것이라는 얘기 역시 MS의 강박 때문이다. ☞관련기사 MS, 구글 견제위해 야후 지분인수 추진했었다-WSJ

Tuesday, May 02, 2006

우리 시대 6억의 의미는?

남조선 인민 공화국에 사는 우리들 세상...

[머니투데이 봉준호 컬럼중에서]

6억원은 연봉 3,000만 원짜리 샐러리맨이 20년간 한 푼도 안 쓰고 모아야 하는 큰돈이다. 산술적으로 계산해보면 스물일곱 살에 직장생활을 시작할 경우 대략 40대 중반이 되어야 만져볼까 말까한 돈이다. 미국 명문 사립대 6년 유학비용이 6억원이다. 또는 요즘 젊은이들이 목표하는 10억만들기의 60%에 해당하는 돈이다. 이 시대 고급주택의 기준도 6억원이다.

숫자로 표시되는 부동산 정책의 대부분은 6억원에서 바(Bar)를 그린다. 6억원…. 그것이 무슨 의미인지? 왜 6억원인지? 커피를 시키고 눈을 껌뻑이고 볼펜으로 숫자를 세어가면서 한참 생각을 해봤다. 집 한 채 값 6억원? 집 두 채 값 6억원... 강남 땅 10평 값 6억원, 대형자동차 10대 값 6억원... 강남권 아파트의 70%는 6억원 이상이며 서울 아파트의 20%가량도 6억원 이상이다. 매월 그 수와 비율은 계속 늘어간다.

① 종합부동산세의 기준 6억 원
올해 부과되는 종합부동산세의 기준은 6억원 초과주택이다. 공시지가 6억원 이상의 주택을 소유한 사람은 이른바 부자로 구분되어서 매년말 종합부동산세를 내야 한다. 종합부동산세는 사상 유례없는 초대형 세금이다. 앞으로 해가 바뀔 때마다 일정규모 이상가격의 부동산 소유자들은 엄청난 부피의 늘어나는 세금에 짓눌리게 되고, 매년 부동산의 공시가격이 올라갈수록 과세액 폭증은 계속된다. 종부세 기준이 주택 6억원으로 고정되어 있고, 지금처럼 집값 상승이 꺾이지 않는다면 곧 수백만 명이 종부세 대상자가 되는 때가 올 것이다.

② 비과세 면제선 6억원
6억 미만 주택보유자가 집을 잘 사서 얻게 되는 차익은 여전히 비과세된다. 반드시 1주택자여야 하고 3년 보유의 기본규칙을 지켜야 한다. 서울, 과천, 5개 신도시는 2년 동안 직접 거주하는 조건을 하나 더 완성해야 한다. 예컨대 1억 9,000만원에 집을 사서 3년 후에 5억 8,000만원이 되었을 때 그 차액은 과세대상이 아니다. 6억원 초과 주택은 고가주택으로 구분되어 양도세 면제주택이 아니다.

2007년이 되면 모든 아파트가격은 실거래가로 과세되며 2006년까지는 비투기지역등 일부 요건을 갖춘 경우라면 기준시가로 과세된다. 보통 기준시가는 실제가격의 70%~90%선에서 매겨진다. 6억 이하의 1주택이라도 보유기간이 1년 미만이면 양도차익의 50%, 1년 이상 2년 미만이면 양도차익의 40%를 양도세로 내야하며, 2년 이상일 때에는 양도차익에 따라 9~36%의 세율을 적용받는다.

③ 부동산 중개 수수료 협의선 6억원
부동산 중개 수수료도 부동산의 거래금액이 크면 상당히 부담이 된다. 2억원에서 6억원 이하까지는 0.4%, 6억원 이상부터는 0.2%~0.9%까지 중개사와 거래인이 협의하게 되어 있다. 이런 경우라면 중개사는 0.9%로 요율을 매기려하고 매매자는 0.2%로 계산하기를 원한다. 사람에 따라 서비스의 질에 따라 달리 매겨질 수밖에 없는 상황이 전개되는 것이 고급주택이라 분류되는 6억원 이상 부동산이다. 6억원 이상의 주택을 '협의'로 만들어 놓은 법규가 서로를 난처하게 하거나 분쟁으로 이를 수 있는 단초가 되는 경우도 많다.

④ 6억 넘는 아파트 대출 규제
3.30 대책은 투기지역 내 6억 넘는 아파트에 대출을 규제한다고 발표했다. 이른바 DTI(Dept TO Income)라는 연소득과 담보대출을 연관시키는 총부채상환비율은 6억 이상의 아파트에 해당된다. 결국 6억 이상의 주택을 사려는 사람에게는 융자를 제한함으로써 타인의 자본을 빌려서 도약하는 기회를 제한하는 정책을 쓴다고 한다.

이 효과는 어느 정도 시장에 파급력을 주고 있다. 그러나 자기 부동산에 대하여 6억이라는 바(Bar)를 정하고 대출을 제한하는 것이 적법할까? 그법은 발전하는 선진경제 시스템과는 크게 배치되어 합헌여부는 구체적으로 따져볼 수밖에 없는 또 다른 문제를 발생시킨다. 그보다 먼저 따져보아야 할 것은 고급주택의 기준과 6억원이라는 금액의 정당성일 것이다.

⑤ 역모기지 가능 주택 6억 이하
역모기지론(Reverse Mortgage Loan)이란 자신이 소유한 주택을 담보로 금융기관에서 일정금액을 연금식으로 지급받는 장기 주택저당 대출이다. 6억 이하의 주택은 역모기지론 대상이다. 65세 이상인 사람은 6억 이하의 집을 담보로 15년이든 20년이든 돈을 받아 쓸 수 있다. 자기 집에 살면서 죽을 때까지 은행에서 돈을 타 쓸 수 있는 선진적인 금융론(loan)이다. 그러나 이 또한 6억 이하의 주택에만 해당된다. 6억 이상의 주택은 돈 많은 사람의 집으로 간주되어 역모기지론 해당 주택이 아니다.

6억 주택 보유자의 모기지론 가능금액은 약 3억 정도이고 15년론인경우 월180만원씩을 받아쓸 수 있다.

이때의 주택가격 6억원이란 과세기준시가로 평가되며, 정부는 역모기지론 활성화 방안을 확정하면서 6억원 이하 중산, 서민층 중저가 주택 가능 이라는 표현을 썼다. 중산층이란 참으로 범위가 광범위한 것 같다.


◆ 6억원의 철학

외국에는 600억원짜리 집도 있고 6채 이상의 집을 가진 사람도 많다. 부자는 천문학적인 돈을 갖고 있고, 누가 고급주택을 얼마나 가지고 있는지는 뉴스거리가 아니다. 이제 우리나라에서 6억원은 서민층과 중산층을 가르는 기준같은게 되어 버린 느낌이다. 6억원은 10년 넘게 법제화된 고급주택의 기준이지만 물가상승이나 집값 폭등과 상관없이 시간이 지나면서 하향 획일화 하는 기준으로 이용되어져간다.

집으로 자본주의 국가의 논리와 재화의 가치, 부동산 투자의 근본원리를 다시 따져야 하는 지금의 사회 상황은 약 50년전쯤 이데올로기시대의 모습을 보는 듯하다. 모으고 불리고 굴리고, 청년은 아버지가 되고 할아버지가 되어간다.근로소득과 사업소득으로번 종자돈 1억원과 대출 1억원으로 산 집은 인플레를 반복해서 6억짜리 집이 됐다.

모든 돈은 정부와 사회가 공인하는 생산적인 일에 투자해야 하고, 오르는 것을 기대하고 집을 사서는 안되고, 필요한 집만 필요한 만큼의 크기만 사야하고 세금은 내라는 대로 내야하고,집값이 올라 6억원 이상의 주택을 보유하게 되면 각종 규제와 과세 상향 대상이 되고.... 그것이 국가가 국민에게 권하는 양식일 수밖에 없는 지금의 각종 정책들을 어떻게 받아들여야 하는가?

거리를 오가는 대부분의 사람들은 6억원의 의미도 모르며, 크게 오를 것을 기대하고 집을 사지도 않는다. 그냥 갑자기 수배로 늘어나는 세금이 무겁고 중산층과 서민층을 가르는 각종 정책과 부동산에 대한 과도한 정부의 규제와 어느 곳이 또 많이 올랐다는 언론의 시황리포트가 부담스러울 뿐이다.

Monday, May 01, 2006

Google's Meaning Layer

READERS How Google builds systems to enable the extraction of meaning from distributed information.
Betrifft: AMAZON COM Customer Relationship Management Direct Response Marketing GOOGLE Internet Search & Directory Media Retailing & Rental Strategy

by OWT
The author is registered user with YEALD. To get your own profile please become a Recognized Writer. more...


Yes, Amazon does have a very good system to extract meaning from seemingly disconnected information, but that is because the system is layed out to allow that to happen.

With Google, things are layed out differently. Advances in search technology allow for more meaning to be picked out of the stream of information. The bigger the system the better the prediction. So where does Google get their meaning? From an analysis of connections for one. For this we need to take a closer look at what Google already knows and might know in the future.

First they know what I search for, especially if you use the personalised search they are offering at google.com/ig. That is a lot of meaning right there if you aggregate it. Above that they know which links I click and if that site as Google AdSense on it, they know that I ended up there, how long I spent, and if I went to further sites. If those further sites also have Google AdSense they can track my entire path. Due to the login I have at google.com, any cookie they set cannot really be deleted consistently because it will always be reset with the right ID when I log back in.

If I click on an AdWords link, either on Google or on a Network site, then they know what that click was worth, and as some AdWords users use their tracking system to integrate further steps along a sign-up or sales process into the AdWords system, they potentially know that I bought something and might even know what it was worth for the advertiser.

If I searched for Sony, TV, HTDV and other items and then clicks a HDTV ad, that was linked to HDTV, Buy, Big, Crisp, ... they know I am looking for a HDTV TV with a big screen and if I buy it they know I have it. They also know through which shop as that name is attached to the AdWords account.

When Google Base launches, they will add a whole layer of meta-information to the mix as they seem to have set up the service in a way that you need to add meta information like this is a house I sell for 350.000 EURs in Cologne, Germany, with 350 square meters of ground.

Once they grow their own WiFi network which is starting in San Francisco, they know exactly what I surf to. The same is correct for the Google Accelerator, which has just relaunched. Through Google Mail they know who I talk to and possibly what they search for and do. Aggregating this allows them to find out about my community of practice and general interest.

Yes, it is not as clear and easy as for Amazon, but through the diversity the amount of information is a lot more rich, if analyzed correctly. And that is why they need the large amount of servers they have and have developed their own API and programming language internally that is there for one purpose only, analysis of huge amounts of data distributed among thousands of servers.


View My Stats