INTERNET SURFING
INTERNET SURFING
What is Internet?
The immense growth of information and technology within a short period of time has prompted many to utilise it as a medium of communication. This idea has now been put into pragmatic usage by means of Internet. Internet is a new communication technology that influences us on a large scale. It is also called as net or web.
Internet is a network of networks interconnected globally. It consists of an incredible number of participants, connected computers, software programs and a massive quantity of information spread all around the world. This is why it is called World Wide Web.
Search Engine
The Web is potentially a terrific place to get information on almost any topic. Doing research without leaving your desk sounds like a great idea, but all too often you end up wasting precious time chasing down useless URLs. Almost everyone agrees that there should be a better way! But for now we're stuck with making the best use of the search tools that already exist on the Web. As there is a massive quantity of information spread all around the world, we require some directory or index to find those information. Search engines provide us this service. Through a search engine, we can search any information. We type the keywords to be searched and the engine searches for a matching word and gives a list of hyperlinks to the websites containing some information relating to that keyword.
How to use Search Engine
• Avoid using common words. It will provide a list of links which will be useless.
• Be more precise. Use the most uncommon keyword.
• Use special characters (+ , etc.). When + is used between two words, it searches for the pages where both the words come together. When , is used between two words, it searches for the pages where any of the words appears.
It's important to give some thought to your search strategy. Are you just beginning to amass knowledge on a fairly broad subject? Or do you have a specific objective in mind--like finding out everything you can about carpal tunnel syndrome, or the e-mail address of your old college roommate?
If you're more interested in broad, general information, the first place to go is to a Web Directory. If you're after narrow, specific information, a Web search engine is probably a better choice.
Searching by Means of Subject Directories
Think back to the library card catalogue analogy. In the old card files, and even in today's computer terminal library catalogues, you find information by searching on either the author, the title, or the subject. You usually choose the subject option when you want to cover a broad range of information.
Example: You'd like to create your own home page on the Web, but you don't know how to write HTML, you've never created a graphic file, and you're not sure how you'd post a page on the Web even if you knew how to write one. In short, you need a lot of information on a rather broad topic--Web publishing.
Your best bet is not a search engine, but a Web directory like the Open Directory Project, Google Directory or Yahoo. A directory is a subject-tree style catalogue that organizes the Web into major topics, including Arts, Business and Economy, Computers and Internet, Education, Entertainment, Government, Health, News, Recreation, Reference, Regional, Science, Social Science, Society and Culture. Under each of these topics is a list of subtopics, and under each of those is another list, and another, and so on, moving from the more general to the more specific.
Example: To find out about Web page publishing from Yahoo, select the Computers and Internet Topic, under which you find a subtopic on the Wide World Web. Click on that and you find another list of subtopics, several of which are pertinent to your search: Web Page Authoring, CGI Scripting, Java, HTML, Page Design, Tutorials. Selecting any of these subtopics eventually takes you to Web pages that have been posted precisely for the purpose of giving you the information you need.
If you are clear about the topic of your query, start with a Web directory rather than a search engine. Directories probably won't give you anywhere near as many references as a search engine will, but they are more likely to be on topic.
Web directories usually come equipped with their own keyword search engines that allow you to search through their indices for the information you need.
Important note: Search engines and Web directories are being integrated in interesting ways. For example, if you use the Google search engine and one of the results happens to be found in the Google's Directory (which is based on the dmoz directory), Google will offer you a link to that section of the directory. Meanwhile, if you conduct your search in the Google directory, Google will order the results according to PageRank, which is Google's all-important measure of “link popularity.”
Searching by Means of Search Engines
This is where things start to get complicated.
Search engines are trickier than they look! You'll discover this the first time you enter a query on C++, the programming language. At least of the Web search engines will essentially say, "Huh?"
C++ is not a word. It's a letter followed by two characters that might, depending on the index, be regarded merely as punctuation. Many text search engines have trouble handling input of this type. Many don't deal too well with numbers, either. So much for "007," "R2D2,"or "Catch-22."
Important Note: This problem is no longer as bad as it used to be. I'm now finding relevant hits for C++ on a majority of search engines sites.
Here's another example of a text string search engines hate: To be or not to be. Just about anyone who finished junior high school will be able to tell you where the phrase comes from and (possibly!) what it means. But some search engines choke because all the words in the phrase are stop words--i.e., unimportant words too short and too common to be considered relevant strings on which to search. However, if you enclose the query in quotation marks, forcing the search engine to find the words, "to be or not to be" in that precise order, most search engines can recognize the phrase as a famous quotation from Hamlet.
Let's take a less obvious example. Suppose you're a fan of murder mysteries and you want to search the Web for the home pages of all your favorite authors in that genre. If you simply enter the words "mystery" and "writer," most search engines will return hyperlinks to all Web documents that contain the word "mystery" or the word, "writer." This will probably include hundreds--or even thousands--of URLs, most of which will have no relevance to your search. If you enter the words as a phrase, however, you stand a better chance of getting some good hits.
Keyword Searching
This is the most common form of text search on the Web. Most search engines do their text query and retrieval using keywords.
What is a keyword, exactly? It can simply be any word on a webpage. For example, I used the word "simply" in the previous sentence, making it one of the keywords for this particular webpage in some search engine's index. However, since the word "simply" has nothing to do with the subject of this webpage (i.e., how search engines work), it is not a very useful keyword. Useful keywords and key phrases for this page would be "search," "search engines," "search engine methods," "how search engines work," "search engine tutorials," etc. Those keywords would actually tell a user something about the subject and content of this page.
Unless the author of the Web document specifies the keywords for his document (by using meta tags), it's up to the search engine to determine them. Essentially, this means that search engines pull out and index words that appear to be significant. Since search engines are software programs, not rational human beings, they work according to rules established by their creators for what words are usually important in a broad range of documents. The title of a page, for example, usually gives useful information about the subject of the page. Words that are mentioned towards the beginning of a document are given more weight by most search engines. The same goes for words that are repeated several times throughout the document.
Some search engines index every word on every page. Others index only part of the document. Full-text indexing systems generally pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." Some of the search engines discriminate upper case from lower case; others store all words without reference to capitalization.
The Problem With Keyword Searching
Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (viz. a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to our query. Some search engines also have trouble with so-called stemming -- i.e., if you enter the word "big," should they return a hit on the word, "bigger?" What about singular and plural words? What about verb tenses that differ from the word you entered by only an "s," or an "ed"?
Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart."
Refining Search
Most sites offer two different types of searches--"basic" and "refined" or "advanced." In a "basic" search, you just enter a keyword without sifting through any pulldown menus of additional options. Depending on the engine, though, "basic" searches can be quite complex.
Advanced search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results. You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.
Some search engines also allow you to specify what form you'd like your results to appear in, and whether you wish to restrict your search to certain fields on the internet or to specific parts of Web documents (i.e., the title or URL).
Many, but not all search engines allow us to use so-called Boolean operators to refine our search. These are the logical terms AND, OR, NOT, and the so-called proximal locators, NEAR and FOLLOWED BY.
Boolean AND means that all the terms you specify must appear in the documents, i.e., "heart" AND "attack."
Boolean OR means that at least one of the terms must appear in the documents, i.e., bronchitis, acute OR chronic.
Boolean NOT means that at least one of the terms must not appear in the documents.
Some search engines use the characters + and - instead of Boolean operators to include and exclude terms.
NEAR means that the terms you enter should be within a certain number of words of each other. FOLLOWED BY means that one term must directly follow the other. ADJ, for adjacent, serves the same function.
Phrases: The ability to query on phrases is very important in a search engine. Those that allow it usually require that you enclose the phrase in quotation marks - “Institute of Chartered Accountants of India”
Capitalization: This is essential for searching on proper names of people, companies or products. Unfortunately, many words in English are used both as proper and common nouns--Bill, bill, Gates, gates, Oracle, oracle, Lotus, lotus, Digital, digital--the list is endless.
All the search engines have different methods of refining queries. The best way to learn them is to read the help files on the search engine sites and practice!
Relevancy Rankings
Most of the search engines return results with confidence or relevancy rankings. In other words, they list the hits according to how closely they think the results match the query. However, these lists often leave users shaking their heads on confusion, since, to the user, the results may seem completely irrelevant.
Why does this happen? Basically it's because search engine technology has not yet reached the point where humans and computers understand each other well enough to communicate clearly.
Most search engines use search term frequency as a primary way of determining whether a document is relevant. If you're researching diabetes and the word "diabetes" appears multiple times in a Web document, it's reasonable to assume that the document will contain useful information. Therefore, a document that repeats the word "diabetes" over and over is likely to turn up near the top of your list.
If the keyword is a common one, or if it has multiple other meanings, you could end up with a lot of irrelevant hits. And if your keyword is a subject about which you desire information, you don't need to see it repeated over and over--it's the information about that word that you're interested in, not the word itself.
Some search engines consider both the frequency and the positioning of keywords to determine relevancy, reasoning that if the keywords appear early in the document, or in the headers, this increases the likelihood that the document is on target. For example, one method is to rank hits according to how many times your keywords appear and in which fields they appear (i.e., in headers, titles or plain text). Another method is to determine which documents are most frequently linked to other documents on the Web. The reasoning here is that if other folks consider certain pages important, you should, too.
If you use the advanced query form on AltaVista, you can assign relevance weights to your query terms before conducting a search. Although this takes some practice, it essentially returns more relevant links.
As far as the user is concerned, relevancy ranking is critical, and becomes more so as the sheer volume of information on the Web grows. Most of us don't have the time to sift through scores of hits to determine which hyperlinks we should actually explore. The more clearly relevant the results are, the more we're likely to value the search engine.
Meta Tags
Some search engines are now indexing Web documents by the meta tags. What this means is that the Web page author can have some influence over which keywords are used to index the document, and even in the description of the document that appears when it comes up as a search engine hit.
This is obviously very important if you are trying to draw people to your website based on how your site ranks in search engines hit lists.
There is no perfect way to ensure that you'll receive a high ranking. Even if you do get a great ranking, there's no assurance that you'll keep it for long. For example, at one period a page from the Spider's Apprentice was the number- one-ranked result on Altavista for the phrase "how search engines work." A few months later, however, it had dropped lower in the listings.
There is a lot of conflicting information out there on meta-tagging. If you're confused it may be because different search engines look at meta tags in different ways. Some rely heavily on meta tags, others don't use them at all. The general opinion seems to be that meta tags are less useful than they were a few years ago, largely because of the high rate of spamdexing (web authors using false and misleading keywords in the meta tags).
Note: Google, currently the most popular search engine, does not index the keyword metatags. Be aware of this is you are optimizing your webpages for the Google engine.
It seems to be generally agreed that the "title" and the "description" meta tags are important to write effectively, since several major search engines use them in their indices. Use relevant keywords in your title, and vary the titles on the different pages that make up your website, in order to target as many keywords as possible. As for the "description" meta tag, some search engines will use it as their short summary of your url, so make sure your description is one that will entice surfers to your site.
Note: The "description" meta tag is generally held to be the most valuable, and the most likely to be indexed, so pay special attention to this one.
In the keyword tag, list a few synonyms for keywords, or foreign translations of keywords (if you anticipate traffic from foreign surfers). Make sure the keywords refer to, or are directly related to, the subject or material on the page. Do NOT use false or misleading keywords in an attempt to gain a higher ranking for your pages.
The "keyword" meta tag has been abused by some webmasters. For example, a recent ploy has been to put such words "sex" or "mp3" into keyword meta tags, in hopes of luring searchers to one's website by using popular keywords.
The search engines are aware of such deceptive tactics, and have devised various methods to circumvent them, so be careful. Use keywords that are appropriate to your subject, and make sure they appear in the top paragraphs of actual text on your webpage. Many search engine algorithms score the words that appear towards the top of your document more highly than the words that appear towards the bottom. Words that appear in HTML header tags (H1, H2, H3, etc) are also given more weight by some search engines. It sometimes helps to give your page a file name that makes use of one of your prime keywords, and to include keywords in the "alt" image tags.
One thing you should not do is use some other company's trademarks in your meta tags. Some website owners have been sued for trademark violations because they've used other company names in the meta tags. I have, in fact, testified as an expert witness in such cases. You do not want the expense of being sued!
Remember that all the major search engines have slightly different policies. If you're designing a website and meta-tagging your documents, we recommend that you take the time to check out what the major search engines say in their help files about how they each use meta tags. You might want to optimize your meta tags for the search engines you believe are sending the most traffic to your site.
Concept-based searching (The following information is out-dated, but might have historical interest for researchers)
Excite used to be the best-known general-purpose search engine site on the Web that relies on concept-based searching. It is now effectively extinct.
Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say. In the best circumstances, a concept-based search returns hits on documents that are "about" the subject/theme you're exploring, even if the words in the document don't precisely match the words you enter into the query.
How did this method work? There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won't even attempt to go into here. Excite used to a numerical approach. Excite's software determines meaning by calculating the frequency with which certain important words appear. When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is "about" a certain subject.
For example, the word heart, when used in the medical/health context, would be likely to appear with such words as coronary, artery, lung, stroke, cholesterol, pump, blood, attack, and arteriosclerosis. If the word heart appears in a document with others words such as flowers, candy, love, passion, and valentine, a very different context is established, and a concept-oriented search engine returns hits on the subject of romance.
The web has made it much easier to get information about people, including old friends and classmates, old boyfriends/girlfriends, ancestors, celebrities, politicians, public figures, criminals, and even your next-door neighbor. There are various opinions about this new flow of personal information. Most of us seem quite pleased to be able to get the information we need, but we're not necessarily happy if others can get the goods on us!
What follow are a few tips about finding people via the web. It can be harder than you think. Although many people access the internet on a daily basis now, they often use screen names that are known only to their friends. Other people, particularly women, tend to change their last name once or more often during the course of their lives.
Who is relatively easy to find on the web?
• People who have become famous.
• People who post on the internet under their real names.
• Academics and other people who publish articles and speak at conferences. Professors also tend to have personal web pages.
• Self-employed consultants and other people who own their own businesses.
• Senior management of companies, especially public companies.
• People active in community organizations.
• Anybody with a career that causes them to get written about, cited in articles, or quoted in the press.
Step 1: Enter The Name
Although this will often be a waste of time, you might as well begin with the quick and easy type of search: type the full name of the person you're seeking into a search engine. When you do this, the most likely outcome is that you will get lots of hits on people who are NOT the person you're seeking. Many, many people have the same first and last names. If the names are unusual, though, you might get lucky. Ditto if the person you seek already has a notable web presence, with lots of webpages citing him or her for some achievement.
You have a slightly higher chance of getting good results if you enter the first and last name as a phrase, surrounded by quotation marks. The middle name usually isn't important, unless the person typically uses her middle name. If the person typically uses his initials instead of the first and middle name, make sure you search as a phrase when looking him up.
Warning: Entering names will frequently bring upon many hits on genealogical records. Instead of getting info on a living person, you'll find yourself staring at data about someone who lived and died a hundred years ago. Although it's great that so much genealogical data is available via the web, these webpages can hopelessly muddy your chances of finding a living person using only her name.
Step 2: Enter The Domain Name
Many search engines will list a name that also appears as a web domain name among its top results. So if you suspect your friend may be active on the web, you can also try a search using his first and last names run together as one word. Most people's domain names tend to use both first and last name: e.g., firstnamelastname.com. On rare occasions, you might find that your friend has registered a domain using only her last name. Example: if you do a Google search on "Monash" the top two hits on Google will be monash.edu in Australia and monash.com, which the domain name of this website. This site is owned by Curt Monash, who registered his last name as a domain name many years ago.
Public figures, web geeks, and small business owners are more likely to have registered their names as web domains than the average person-on-the-street. Maintaining a domain costs money, and running a website requires knowledge of web design and programming.
Step 3 Refine Your Search:
Remember that search engines are simply software programs who cannot anticipate your needs. To a search engine, a name is just a collection of letters. All it cares about (usually) is matching those letters with all the other identical arrays of letters in its database. For example, if you enter the name "James Johnson" in Google, you will get 7,250,000 hits.
Therefore, in most cases, you will need to provide the search engine with more information. How can you narrow the search? It often helps to envision the result you're looking for. If you could find a page on the web that mentioned the person you're looking for, what would it say? If you think the person might be mentioned in a webpage that also refers to her hometown, add that, if you know it. If the person is interested in a particular career or activity, use that activity as one of your search terms. For me, a search on "Linda Barlow" and "novelist" bring up pages that rule out most of the zillion other "Linda Barlows" in the world.
If you happen to know where the person works, or even just what his profession is, try using the business or the profession as another keyword. Most businesses have websites, although not all employees are listed on such sites. But if your friend owns his or her own business, they probably have a website. If your friend is one of the executives of a public company, he or she may be listed in the company's tax filings or in press releases or corporate reports.
Did you and your friend attend the same school or college together? Try to get information through the website of the school or college. If your friend is not listed anywhere on these sites, try the various class reunion websites, like classmates.com.
Is your friend a member of a professional organization that has a web presence? Has he or she written a book, an article, or been cited in one? More and more books and articles are published to the web every month.
When The Info is Correct, but Over-abundant
If you are looking for a celebrity, a public figure, or someone who is extremely active on the internet, the above Step 1 and/or Step 2 are usually enough to find real information about that person. In fact, you'll probably find yourself confronted with far too much information. You'll need a way to winnow it down.
In the case of celebrities, a simple search on their names is likely to produce more results than you want. You're likely to find fan sites, which can be excellent resources, but beware of the ones that offer nude pictures. Generally, the offers of clothed pictures are legitimate and the offers of nude ones are fakes, come-ons to try to get you to buy a subscription to a porn site. (In case you haven't learned it already, the many varieties of sexual content on the internet are rarely offered free of charge).
To narrow it down, put in both the celebrity's name and the name of a movie or song or book or TV show they're associated with. Multiple titles are an even better way to find their sites. This works particularly well for authors.
If the celebrity is an athlete in a major US team sport, several big sports sites keep pages on every single player, including links to up-to-date news. These include Yahoo, ESPN, Sportsline, as well as the sites for leagues such as the NBA and NFL.
For-a-Fee Searches
If the person you're looking for is careful enough about his privacy to have removed his personal info from various websites and databases, you may find it difficult to get any information about him. You can pay to access special databases at sites like peoplefind.com and peoplesearch.com.
Anything that is a matter of public record is probably recorded in an electronic database. Not all such databases are web-accessible, and those that are usually charge a fee. What is available is often determined by individual state laws or policies. What you can find from the state of Iowa might be quite different from what you can find from the state of New York.
Information that is likely to be contained in public records includes birth and death certificates, marriage certificates, divorce judgments (sometimes), home purchase and sales information, professional credentials verification, court and legal proceedings (not always), arrest records, bankruptcy filings, and other events that are recorded by public officials, state and federal divisions of vital statistics, and other public entities.
Don't Forget the Phone Book
Telephone books (white pages and yellow pages) are widely available now on the web. This means that if you know someone's name and what town they live in, you can access their address, phone number, even their age. There are also databases ("reverse look-up") that allow you to type in a phone number and get the name and address of the person who owns the phone.
If you know the address of the person you are seeking, you can easily get a map of his town, street, and neighborhood on the many web map sites. Some maps are precise enough to show the exact location of his home.
Try Yahoo! PeopleSearch, which offers basic phone book style look-up and links you to a site that can execute background checks (for a fee).
Privacy Issues
The same things that you can find out about other people, other people can find out about you.
Here's a list of some of the databases someone might access when researching you:
• Real estate sales and ownership
• Postal service change-of-address records
• Telephone books, past and present
• Automobile registration records
• Other vehicle registration, i.e., boats, private airplanes
• Subscription lists for magazines and periodicals
• National marketing databases
• National email directories
• Voter registration lists
• Bankruptcy filings
• Court proceedings/judgments
• Professional licenses
• SEC filings
• Tax lien records
• Credit bureau records
• Whois domain name registration records
• Social Security Death Index
If you are concerned about your privacy, you can ask to have your personal information removed from web databases. It is difficult to remove all trace of yourself, though. Some events and transactions are legitimately matters of public record, and more of these public records are becoming available every year via the web.
What is Internet?
The immense growth of information and technology within a short period of time has prompted many to utilise it as a medium of communication. This idea has now been put into pragmatic usage by means of Internet. Internet is a new communication technology that influences us on a large scale. It is also called as net or web.
Internet is a network of networks interconnected globally. It consists of an incredible number of participants, connected computers, software programs and a massive quantity of information spread all around the world. This is why it is called World Wide Web.
Search Engine
The Web is potentially a terrific place to get information on almost any topic. Doing research without leaving your desk sounds like a great idea, but all too often you end up wasting precious time chasing down useless URLs. Almost everyone agrees that there should be a better way! But for now we're stuck with making the best use of the search tools that already exist on the Web. As there is a massive quantity of information spread all around the world, we require some directory or index to find those information. Search engines provide us this service. Through a search engine, we can search any information. We type the keywords to be searched and the engine searches for a matching word and gives a list of hyperlinks to the websites containing some information relating to that keyword.
How to use Search Engine
• Avoid using common words. It will provide a list of links which will be useless.
• Be more precise. Use the most uncommon keyword.
• Use special characters (+ , etc.). When + is used between two words, it searches for the pages where both the words come together. When , is used between two words, it searches for the pages where any of the words appears.
It's important to give some thought to your search strategy. Are you just beginning to amass knowledge on a fairly broad subject? Or do you have a specific objective in mind--like finding out everything you can about carpal tunnel syndrome, or the e-mail address of your old college roommate?
If you're more interested in broad, general information, the first place to go is to a Web Directory. If you're after narrow, specific information, a Web search engine is probably a better choice.
Searching by Means of Subject Directories
Think back to the library card catalogue analogy. In the old card files, and even in today's computer terminal library catalogues, you find information by searching on either the author, the title, or the subject. You usually choose the subject option when you want to cover a broad range of information.
Example: You'd like to create your own home page on the Web, but you don't know how to write HTML, you've never created a graphic file, and you're not sure how you'd post a page on the Web even if you knew how to write one. In short, you need a lot of information on a rather broad topic--Web publishing.
Your best bet is not a search engine, but a Web directory like the Open Directory Project, Google Directory or Yahoo. A directory is a subject-tree style catalogue that organizes the Web into major topics, including Arts, Business and Economy, Computers and Internet, Education, Entertainment, Government, Health, News, Recreation, Reference, Regional, Science, Social Science, Society and Culture. Under each of these topics is a list of subtopics, and under each of those is another list, and another, and so on, moving from the more general to the more specific.
Example: To find out about Web page publishing from Yahoo, select the Computers and Internet Topic, under which you find a subtopic on the Wide World Web. Click on that and you find another list of subtopics, several of which are pertinent to your search: Web Page Authoring, CGI Scripting, Java, HTML, Page Design, Tutorials. Selecting any of these subtopics eventually takes you to Web pages that have been posted precisely for the purpose of giving you the information you need.
If you are clear about the topic of your query, start with a Web directory rather than a search engine. Directories probably won't give you anywhere near as many references as a search engine will, but they are more likely to be on topic.
Web directories usually come equipped with their own keyword search engines that allow you to search through their indices for the information you need.
Important note: Search engines and Web directories are being integrated in interesting ways. For example, if you use the Google search engine and one of the results happens to be found in the Google's Directory (which is based on the dmoz directory), Google will offer you a link to that section of the directory. Meanwhile, if you conduct your search in the Google directory, Google will order the results according to PageRank, which is Google's all-important measure of “link popularity.”
Searching by Means of Search Engines
This is where things start to get complicated.
Search engines are trickier than they look! You'll discover this the first time you enter a query on C++, the programming language. At least of the Web search engines will essentially say, "Huh?"
C++ is not a word. It's a letter followed by two characters that might, depending on the index, be regarded merely as punctuation. Many text search engines have trouble handling input of this type. Many don't deal too well with numbers, either. So much for "007," "R2D2,"or "Catch-22."
Important Note: This problem is no longer as bad as it used to be. I'm now finding relevant hits for C++ on a majority of search engines sites.
Here's another example of a text string search engines hate: To be or not to be. Just about anyone who finished junior high school will be able to tell you where the phrase comes from and (possibly!) what it means. But some search engines choke because all the words in the phrase are stop words--i.e., unimportant words too short and too common to be considered relevant strings on which to search. However, if you enclose the query in quotation marks, forcing the search engine to find the words, "to be or not to be" in that precise order, most search engines can recognize the phrase as a famous quotation from Hamlet.
Let's take a less obvious example. Suppose you're a fan of murder mysteries and you want to search the Web for the home pages of all your favorite authors in that genre. If you simply enter the words "mystery" and "writer," most search engines will return hyperlinks to all Web documents that contain the word "mystery" or the word, "writer." This will probably include hundreds--or even thousands--of URLs, most of which will have no relevance to your search. If you enter the words as a phrase, however, you stand a better chance of getting some good hits.
Keyword Searching
This is the most common form of text search on the Web. Most search engines do their text query and retrieval using keywords.
What is a keyword, exactly? It can simply be any word on a webpage. For example, I used the word "simply" in the previous sentence, making it one of the keywords for this particular webpage in some search engine's index. However, since the word "simply" has nothing to do with the subject of this webpage (i.e., how search engines work), it is not a very useful keyword. Useful keywords and key phrases for this page would be "search," "search engines," "search engine methods," "how search engines work," "search engine tutorials," etc. Those keywords would actually tell a user something about the subject and content of this page.
Unless the author of the Web document specifies the keywords for his document (by using meta tags), it's up to the search engine to determine them. Essentially, this means that search engines pull out and index words that appear to be significant. Since search engines are software programs, not rational human beings, they work according to rules established by their creators for what words are usually important in a broad range of documents. The title of a page, for example, usually gives useful information about the subject of the page. Words that are mentioned towards the beginning of a document are given more weight by most search engines. The same goes for words that are repeated several times throughout the document.
Some search engines index every word on every page. Others index only part of the document. Full-text indexing systems generally pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." Some of the search engines discriminate upper case from lower case; others store all words without reference to capitalization.
The Problem With Keyword Searching
Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (viz. a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to our query. Some search engines also have trouble with so-called stemming -- i.e., if you enter the word "big," should they return a hit on the word, "bigger?" What about singular and plural words? What about verb tenses that differ from the word you entered by only an "s," or an "ed"?
Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart."
Refining Search
Most sites offer two different types of searches--"basic" and "refined" or "advanced." In a "basic" search, you just enter a keyword without sifting through any pulldown menus of additional options. Depending on the engine, though, "basic" searches can be quite complex.
Advanced search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results. You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.
Some search engines also allow you to specify what form you'd like your results to appear in, and whether you wish to restrict your search to certain fields on the internet or to specific parts of Web documents (i.e., the title or URL).
Many, but not all search engines allow us to use so-called Boolean operators to refine our search. These are the logical terms AND, OR, NOT, and the so-called proximal locators, NEAR and FOLLOWED BY.
Boolean AND means that all the terms you specify must appear in the documents, i.e., "heart" AND "attack."
Boolean OR means that at least one of the terms must appear in the documents, i.e., bronchitis, acute OR chronic.
Boolean NOT means that at least one of the terms must not appear in the documents.
Some search engines use the characters + and - instead of Boolean operators to include and exclude terms.
NEAR means that the terms you enter should be within a certain number of words of each other. FOLLOWED BY means that one term must directly follow the other. ADJ, for adjacent, serves the same function.
Phrases: The ability to query on phrases is very important in a search engine. Those that allow it usually require that you enclose the phrase in quotation marks - “Institute of Chartered Accountants of India”
Capitalization: This is essential for searching on proper names of people, companies or products. Unfortunately, many words in English are used both as proper and common nouns--Bill, bill, Gates, gates, Oracle, oracle, Lotus, lotus, Digital, digital--the list is endless.
All the search engines have different methods of refining queries. The best way to learn them is to read the help files on the search engine sites and practice!
Relevancy Rankings
Most of the search engines return results with confidence or relevancy rankings. In other words, they list the hits according to how closely they think the results match the query. However, these lists often leave users shaking their heads on confusion, since, to the user, the results may seem completely irrelevant.
Why does this happen? Basically it's because search engine technology has not yet reached the point where humans and computers understand each other well enough to communicate clearly.
Most search engines use search term frequency as a primary way of determining whether a document is relevant. If you're researching diabetes and the word "diabetes" appears multiple times in a Web document, it's reasonable to assume that the document will contain useful information. Therefore, a document that repeats the word "diabetes" over and over is likely to turn up near the top of your list.
If the keyword is a common one, or if it has multiple other meanings, you could end up with a lot of irrelevant hits. And if your keyword is a subject about which you desire information, you don't need to see it repeated over and over--it's the information about that word that you're interested in, not the word itself.
Some search engines consider both the frequency and the positioning of keywords to determine relevancy, reasoning that if the keywords appear early in the document, or in the headers, this increases the likelihood that the document is on target. For example, one method is to rank hits according to how many times your keywords appear and in which fields they appear (i.e., in headers, titles or plain text). Another method is to determine which documents are most frequently linked to other documents on the Web. The reasoning here is that if other folks consider certain pages important, you should, too.
If you use the advanced query form on AltaVista, you can assign relevance weights to your query terms before conducting a search. Although this takes some practice, it essentially returns more relevant links.
As far as the user is concerned, relevancy ranking is critical, and becomes more so as the sheer volume of information on the Web grows. Most of us don't have the time to sift through scores of hits to determine which hyperlinks we should actually explore. The more clearly relevant the results are, the more we're likely to value the search engine.
Meta Tags
Some search engines are now indexing Web documents by the meta tags. What this means is that the Web page author can have some influence over which keywords are used to index the document, and even in the description of the document that appears when it comes up as a search engine hit.
This is obviously very important if you are trying to draw people to your website based on how your site ranks in search engines hit lists.
There is no perfect way to ensure that you'll receive a high ranking. Even if you do get a great ranking, there's no assurance that you'll keep it for long. For example, at one period a page from the Spider's Apprentice was the number- one-ranked result on Altavista for the phrase "how search engines work." A few months later, however, it had dropped lower in the listings.
There is a lot of conflicting information out there on meta-tagging. If you're confused it may be because different search engines look at meta tags in different ways. Some rely heavily on meta tags, others don't use them at all. The general opinion seems to be that meta tags are less useful than they were a few years ago, largely because of the high rate of spamdexing (web authors using false and misleading keywords in the meta tags).
Note: Google, currently the most popular search engine, does not index the keyword metatags. Be aware of this is you are optimizing your webpages for the Google engine.
It seems to be generally agreed that the "title" and the "description" meta tags are important to write effectively, since several major search engines use them in their indices. Use relevant keywords in your title, and vary the titles on the different pages that make up your website, in order to target as many keywords as possible. As for the "description" meta tag, some search engines will use it as their short summary of your url, so make sure your description is one that will entice surfers to your site.
Note: The "description" meta tag is generally held to be the most valuable, and the most likely to be indexed, so pay special attention to this one.
In the keyword tag, list a few synonyms for keywords, or foreign translations of keywords (if you anticipate traffic from foreign surfers). Make sure the keywords refer to, or are directly related to, the subject or material on the page. Do NOT use false or misleading keywords in an attempt to gain a higher ranking for your pages.
The "keyword" meta tag has been abused by some webmasters. For example, a recent ploy has been to put such words "sex" or "mp3" into keyword meta tags, in hopes of luring searchers to one's website by using popular keywords.
The search engines are aware of such deceptive tactics, and have devised various methods to circumvent them, so be careful. Use keywords that are appropriate to your subject, and make sure they appear in the top paragraphs of actual text on your webpage. Many search engine algorithms score the words that appear towards the top of your document more highly than the words that appear towards the bottom. Words that appear in HTML header tags (H1, H2, H3, etc) are also given more weight by some search engines. It sometimes helps to give your page a file name that makes use of one of your prime keywords, and to include keywords in the "alt" image tags.
One thing you should not do is use some other company's trademarks in your meta tags. Some website owners have been sued for trademark violations because they've used other company names in the meta tags. I have, in fact, testified as an expert witness in such cases. You do not want the expense of being sued!
Remember that all the major search engines have slightly different policies. If you're designing a website and meta-tagging your documents, we recommend that you take the time to check out what the major search engines say in their help files about how they each use meta tags. You might want to optimize your meta tags for the search engines you believe are sending the most traffic to your site.
Concept-based searching (The following information is out-dated, but might have historical interest for researchers)
Excite used to be the best-known general-purpose search engine site on the Web that relies on concept-based searching. It is now effectively extinct.
Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say. In the best circumstances, a concept-based search returns hits on documents that are "about" the subject/theme you're exploring, even if the words in the document don't precisely match the words you enter into the query.
How did this method work? There are various methods of building clustering systems, some of which are highly complex, relying on sophisticated linguistic and artificial intelligence theory that we won't even attempt to go into here. Excite used to a numerical approach. Excite's software determines meaning by calculating the frequency with which certain important words appear. When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is "about" a certain subject.
For example, the word heart, when used in the medical/health context, would be likely to appear with such words as coronary, artery, lung, stroke, cholesterol, pump, blood, attack, and arteriosclerosis. If the word heart appears in a document with others words such as flowers, candy, love, passion, and valentine, a very different context is established, and a concept-oriented search engine returns hits on the subject of romance.
The web has made it much easier to get information about people, including old friends and classmates, old boyfriends/girlfriends, ancestors, celebrities, politicians, public figures, criminals, and even your next-door neighbor. There are various opinions about this new flow of personal information. Most of us seem quite pleased to be able to get the information we need, but we're not necessarily happy if others can get the goods on us!
What follow are a few tips about finding people via the web. It can be harder than you think. Although many people access the internet on a daily basis now, they often use screen names that are known only to their friends. Other people, particularly women, tend to change their last name once or more often during the course of their lives.
Who is relatively easy to find on the web?
• People who have become famous.
• People who post on the internet under their real names.
• Academics and other people who publish articles and speak at conferences. Professors also tend to have personal web pages.
• Self-employed consultants and other people who own their own businesses.
• Senior management of companies, especially public companies.
• People active in community organizations.
• Anybody with a career that causes them to get written about, cited in articles, or quoted in the press.
Step 1: Enter The Name
Although this will often be a waste of time, you might as well begin with the quick and easy type of search: type the full name of the person you're seeking into a search engine. When you do this, the most likely outcome is that you will get lots of hits on people who are NOT the person you're seeking. Many, many people have the same first and last names. If the names are unusual, though, you might get lucky. Ditto if the person you seek already has a notable web presence, with lots of webpages citing him or her for some achievement.
You have a slightly higher chance of getting good results if you enter the first and last name as a phrase, surrounded by quotation marks. The middle name usually isn't important, unless the person typically uses her middle name. If the person typically uses his initials instead of the first and middle name, make sure you search as a phrase when looking him up.
Warning: Entering names will frequently bring upon many hits on genealogical records. Instead of getting info on a living person, you'll find yourself staring at data about someone who lived and died a hundred years ago. Although it's great that so much genealogical data is available via the web, these webpages can hopelessly muddy your chances of finding a living person using only her name.
Step 2: Enter The Domain Name
Many search engines will list a name that also appears as a web domain name among its top results. So if you suspect your friend may be active on the web, you can also try a search using his first and last names run together as one word. Most people's domain names tend to use both first and last name: e.g., firstnamelastname.com. On rare occasions, you might find that your friend has registered a domain using only her last name. Example: if you do a Google search on "Monash" the top two hits on Google will be monash.edu in Australia and monash.com, which the domain name of this website. This site is owned by Curt Monash, who registered his last name as a domain name many years ago.
Public figures, web geeks, and small business owners are more likely to have registered their names as web domains than the average person-on-the-street. Maintaining a domain costs money, and running a website requires knowledge of web design and programming.
Step 3 Refine Your Search:
Remember that search engines are simply software programs who cannot anticipate your needs. To a search engine, a name is just a collection of letters. All it cares about (usually) is matching those letters with all the other identical arrays of letters in its database. For example, if you enter the name "James Johnson" in Google, you will get 7,250,000 hits.
Therefore, in most cases, you will need to provide the search engine with more information. How can you narrow the search? It often helps to envision the result you're looking for. If you could find a page on the web that mentioned the person you're looking for, what would it say? If you think the person might be mentioned in a webpage that also refers to her hometown, add that, if you know it. If the person is interested in a particular career or activity, use that activity as one of your search terms. For me, a search on "Linda Barlow" and "novelist" bring up pages that rule out most of the zillion other "Linda Barlows" in the world.
If you happen to know where the person works, or even just what his profession is, try using the business or the profession as another keyword. Most businesses have websites, although not all employees are listed on such sites. But if your friend owns his or her own business, they probably have a website. If your friend is one of the executives of a public company, he or she may be listed in the company's tax filings or in press releases or corporate reports.
Did you and your friend attend the same school or college together? Try to get information through the website of the school or college. If your friend is not listed anywhere on these sites, try the various class reunion websites, like classmates.com.
Is your friend a member of a professional organization that has a web presence? Has he or she written a book, an article, or been cited in one? More and more books and articles are published to the web every month.
When The Info is Correct, but Over-abundant
If you are looking for a celebrity, a public figure, or someone who is extremely active on the internet, the above Step 1 and/or Step 2 are usually enough to find real information about that person. In fact, you'll probably find yourself confronted with far too much information. You'll need a way to winnow it down.
In the case of celebrities, a simple search on their names is likely to produce more results than you want. You're likely to find fan sites, which can be excellent resources, but beware of the ones that offer nude pictures. Generally, the offers of clothed pictures are legitimate and the offers of nude ones are fakes, come-ons to try to get you to buy a subscription to a porn site. (In case you haven't learned it already, the many varieties of sexual content on the internet are rarely offered free of charge).
To narrow it down, put in both the celebrity's name and the name of a movie or song or book or TV show they're associated with. Multiple titles are an even better way to find their sites. This works particularly well for authors.
If the celebrity is an athlete in a major US team sport, several big sports sites keep pages on every single player, including links to up-to-date news. These include Yahoo, ESPN, Sportsline, as well as the sites for leagues such as the NBA and NFL.
For-a-Fee Searches
If the person you're looking for is careful enough about his privacy to have removed his personal info from various websites and databases, you may find it difficult to get any information about him. You can pay to access special databases at sites like peoplefind.com and peoplesearch.com.
Anything that is a matter of public record is probably recorded in an electronic database. Not all such databases are web-accessible, and those that are usually charge a fee. What is available is often determined by individual state laws or policies. What you can find from the state of Iowa might be quite different from what you can find from the state of New York.
Information that is likely to be contained in public records includes birth and death certificates, marriage certificates, divorce judgments (sometimes), home purchase and sales information, professional credentials verification, court and legal proceedings (not always), arrest records, bankruptcy filings, and other events that are recorded by public officials, state and federal divisions of vital statistics, and other public entities.
Don't Forget the Phone Book
Telephone books (white pages and yellow pages) are widely available now on the web. This means that if you know someone's name and what town they live in, you can access their address, phone number, even their age. There are also databases ("reverse look-up") that allow you to type in a phone number and get the name and address of the person who owns the phone.
If you know the address of the person you are seeking, you can easily get a map of his town, street, and neighborhood on the many web map sites. Some maps are precise enough to show the exact location of his home.
Try Yahoo! PeopleSearch, which offers basic phone book style look-up and links you to a site that can execute background checks (for a fee).
Privacy Issues
The same things that you can find out about other people, other people can find out about you.
Here's a list of some of the databases someone might access when researching you:
• Real estate sales and ownership
• Postal service change-of-address records
• Telephone books, past and present
• Automobile registration records
• Other vehicle registration, i.e., boats, private airplanes
• Subscription lists for magazines and periodicals
• National marketing databases
• National email directories
• Voter registration lists
• Bankruptcy filings
• Court proceedings/judgments
• Professional licenses
• SEC filings
• Tax lien records
• Credit bureau records
• Whois domain name registration records
• Social Security Death Index
If you are concerned about your privacy, you can ask to have your personal information removed from web databases. It is difficult to remove all trace of yourself, though. Some events and transactions are legitimately matters of public record, and more of these public records are becoming available every year via the web.

0 Comments:
Post a Comment
<< Home