Daily Search Engine Tip - September 7, 2010

We have researched the internet (using search engines of course!) to find articles pertaining to search engines. Below is today's selected article. It is intended to help you get up-to-speed on what is going on in the search engine industry and keep you abreast of any tips that will help increase your site's searchability.



Title: I'M FEELING LUCKY
Author: Sergey Brin
Source: Internet Magazine (Feb, 2000)

The American search engine Google has caused a stir in the UK. Despite miniscule press coverage, it's built a loyal following through word of mouth thanks to the speed and accuracy of its results. Bill Thompson talks to the company's president and co-founder

When the Web was young and freshly minted, back in 1995, the few hundred thousand people who had access to the Internet from work or school were already wrestling with the problem of how to find the information they wanted.

There were around 100,000 Web sites in 1995. On 15 December, news started to spread over email and via the 'What's New' section on Mosaic that Digital had opened its search engine for business, Alta Vista was an instant success and rapidly replaced Lycos and Infoseek as the main search engine of dedicated browsers.

Few sites have created such a stir since, because the Web has grown so much, with a huge number and variety of browsers. A site or service now gets only occasional press coverage and has to build a user base by word of mouth. Strategic deals are put in place so the site appears on other people's portals, then everyone seems to be using it.

Google has followed this model closely, having been developed over the past four years by two US mathematicians who turned their work in statistics and data mining into a real product. Google (www.google.com) went from beta to full release in September 1999. In November it signed a deal with VirginNet (www.virgin.net), its first in the UK, and is now the core search tool for the ISP's 150,000 users.

Google is now getting on with its mission to take over the search engine space from the feature-rich, bloated and increasingly cluttered first generation services. The co-founder and president of Google, Sergey Brin, is on leave from Stanford University, where he is studying for a PhD in computer science. He has also received a fellowship from the National Science Foundation in the US.

Where did the Google search engine come from?

More than four years ago, Larry Page (co-founder of Google) started downloading from the Web at Stanford, just for fun, and to see what could be done with it. I was doing research in data mining at the time, and I said 'That's great data -- let's work on that'. We ended up studying the data of the entire Web. We asked questions like 'What can you learn from the data?' 'What can you mine from it?' It turned out that by mining the entire Web, you can do a much better job of searching.

What makes Google so good?

The key is to take advantage of the content you're searching -- this lets you do a much better job.

How do you update the results?

There are two different methods. One is query dependent and the other is query independent. The query independent part is where we can afford to do more computation, because we can do it once for every update to our index, rather than once for every query. We compute the relative importance of every Web page.

You also look at links...

What we look at is who links to who. Just counting links to a page doesn't work well at all. It's important to find out who links to a page and how important they are.

And you've transcribed these into a mathematical equation...

There's a large mathematical analysis in which the entire link structure of the Web gets transcribed into a huge equation. Last time we did it was with an equation of 400 million unknown variables, which showed the ranks of all the pages and three billion terms.

What comes out when you solve that equation?

What we get is a number for every Web page, and that's the query independent component of our ranking. Then we have the query dependent part. When Google analyses a Web page it looks at all the content on the page. It differentiates between things like headings and different font sizes, so it's designed to work with hypertext and where the text is located. Google also looks at the text of nearby pages on the Web, which really gives you an added sense of what the pages are about.

Google was founded in 1998 to build on your work -- how big is the company now?

Our company has around 60 people -- close to 40 of these are in engineering and research, and over a dozen have PhDs. We have a small, separate research group of three people, but it's growing and doing long term research. This is something we take very seriously. There's a lot of room for growth in searching technology.

Is that because existing search engines are failing?

The existing search engines are great at letting you find information in seconds. But it can be improved, as Google does it, and there's huge headroom in this technology to do an even better job in the years to come.

What would a 'better job' be?

Let me start with where Google is today. Instead of applying traditional data retrieval techniques, which were designed to work on small collections of similar documents, Google was designed to work with the Web, which means analysing hypertext, where there are different kinds of document, such as spam. I think it does a good job of dealing with that.

Can you give an example?

When you search for something obscure, Google gets the right answer because it uses more information to figure out the results. And when you search for broad subjects it still gives you the right results. It'll give you a company home page if you search for a company.

Can search engines cope with the size of the Web?

I think search engines need to get a lot smarter about looking at the meaning of your search terms,. When you say 'I need to ship something from London to the US', a search engine should be able to find things like shipping companies and airfreight. As a browser, you don't know the exact terminology that search engines are using and you might not be able to think of the right thing to type. I think search engines can still come up with them.

What would be the perfect search engine?

The perfect search engine would have superhuman intelligence. At least human intelligence and complete knowledge of all information, but that's many years away.

You've deliberately made Google's home page simple. Why did you avoid portal content?

When people come to Google, all they want to do is search. And that's our product. That's all we do. Being efficient about the search includes all aspects of the user interface, including how long it takes you to find the search box and type your terms.

You've been critical of portals like Excite and Yahoo! Why?

If you compare using Google to going to the Excite home page and trying to find the search box you'll see the difference. We work hard to make Google work fast -- it returns results in something like 0.2 seconds for most searches. Compared to this, the time it takes you to find the search box is significant. There's valuable information which you can put on portals, which aggregate information, and that's what our partner companies, such as Virgin, do well.

So you just provide your partners with a back-end search engine?

That's half our business. The other half is the Google.com site, which is where people go when they just want to search and nothing else.

How can you index dynamic content (which is seen as a serious problem)?

That's an interesting area. That's the missing piece right now. We're addressing it in a way because we gain information from nearby pages on the Web, so we can sort of point you at the right resource.

Is it a problem?

There are lots of good databases out there, such as WhitePages.com [the people finder]. If you're looking for a person, but they haven't got much content about them on the Web, you can still find their phone number and address. You definitely want to be able to incorporate this kind of data source easily, but it's not clear how to do that. It's a social problem as much as a technology problem -- how can you get people to do that? Balancing motivations properly would help -- if you're pointing people at their site more often, it's a good thing for them.

How will XML change the way you index and search sites?

There's the company perspective and my personal perspective on this one. I think XML is evil because it's anti-technology. A good technology reduces the amount of work people need to do, it improves something in the world -- it makes it better and it doesn't have significant negative consequences. The technology should serve the people. The people shouldn't be serving the technology. The problem with XML, which is shared by a lot of modern operating systems, is you do have to serve the technology. Your computer doesn't understand what you write too well.

So the solution is to put it all into XML, which I think is outrageous.

So what's the answer?

Natural language has evolved to be an efficient way of communicating complex ideas. The downside is computers don't understand it too well, but I think that's a problem for computers. It's outrageous to expect millions of content authors to do all this extra work in XML. Even if some of it's done by the software, they still have to sort out things like addresses. That's my objection.

Won't understanding the logical structure of documents help?

XML doesn't solve a lot of problems. For instance, it doesn't solve the problem of which Web sites search engines trust. It doesn't solve the problem of what to do about completely new areas or things that people haven't bothered to convert to XML. People search for some obscure things. I don't think those problems go away. The technology of search engines can be improved in less time than it'll take people to rewrite and speak Esperanto -- or XML, which is far worse.

So what's the official Google view on XML?

Right now there's not really any significant amount of XML on the Web, so there's no point supporting it -- but we will when it takes off. As PDF has become more popular, we've started to support the format. PDF content is important and we're not going to miss out on it. I don't think that'll be the case for XML.

How many pages do you index?

The current index serves results of over 250 million pages. Of those, about 130 million are fully text indexed and we know about the other pages because we have information about their neighbourhood.

What proportion of the Web is that?

I think it's probably zero per cent! Don't forget that there are lots of infinite sites. People have things like their calendar online, with one Web page for every day of the year. It's a complex number to count.

How many Web sites are there?

A study from NEC put the total at 800 million pages, which was done by sampling IP addresses, It's hard to say exactly what this number means. It's better to be bigger, as long as you're able to rank things properly. And it's a good thing to know you've done a comprehensive search. It's something we aim to increase.

What spider do you use?

We have all our own software to do this. These days we can crawl as much as a thousand pages per second. There are various constraints -- some sites aren't always happy to be crawled at a thousand pages per second. There are various constraints in terms of how we intersperse accesses between sites and there are all kinds of challenges there.

What else do you do?

As well as the spider, there's index building, searching the index and searching itself. Most of the computing power is used for the latter. In total we use around 2400 computers. It's a large-scale operation. We use PCs, commodity parts and everything runs on top of Linux.

Why do you use Linux?

It's cost effective and it gives us a huge amount of power. We pull in a lot more computation power per search than other sites. We're able to achieve that by using this cheaper platform. To a certain extent, our search engine rivals are more concerned about whether they're going to merge with Disney or Westinghouse. They're not really search engines for the most part. Searching is a small component of a company whose focus is much broader,

Is that your unique selling point?

We specialise in searching and we do it well. The early entrants have now diversified to the point where their core business is no longer searching, so they can't expect to be as good at searching as they used to be.

Teaming up with Virgin Net

According to Alex Dale, publisher of Virgin Net: "The attraction of Google to us, and vice versa, is it can slot in as a best of breed application. We like the clean design -- it's an uncluttered page, so it's quick at finding the information on service you want. It's a good fit. We gave Google 18,000 URLs that were UK sites, but none of them had.co.uk in the URL, which is quite a lot.

"Google is easily the bets search engine compared to the likes of Autonomy, Lycos, Excite and Muscal. It's the one search engine that closely fits what we're trying to do commercially and interms of branding. The fact that Google is a focused search service and not a portal is important.

"As sites get more specialised, you won't get the quality and depth in one site as you would have got a year or two ago. We're specialsing at Virgin Net-we spend most of our time developing services that are entertainment and leisure orientated as opposed to news, health and education. Part of that is to provide a gateway to the rest of the internet, whether it's entertainment and leisure or broader subjects - Google helps us to do that."

------------------------------------------------------------------------------- COPYRIGHT 2000 EMAP Media Ltd. in association with The Gale Group and LookSmart. COPYRIGHT 2000 Gale Group -------------------------------------------------------------------------------

Home | Packages | Benefits | Daily Tip | About Us | Purchase

 


                   


Search Engine Ranking Service, LLC
10790 Parkridge Blvd. Suite 200 | Reston, VA 20191

Our email address:
SUPPORT@SearchEngineRankingService.com



Updated
(c) 2002-2006 Search Engine Ranking Service, LLC

Quality Search Engine Submission