seo chapter 2

Visualization of the various routes through a ...

Image via Wikipedia

Search Engine Optimization

Search engine optmization

     Search engine optimization can be defined as the process of fine tuning a web page so that it achieves higher ranking in search engine results. There is a very thin line separating search engine optimization and search engine spam. Any web page undergoing optimization should be not be optimized to a level where by it may qualify for search engine abuse. If a web page is detected as search engine spam, it may be penalized or even removed from the search engine index. The website will not show up in search results nor will it be indexed by a search engine spider until it is added back to the search engine index.

Factors affecting SEO

     Search engine optimization of a website can be broken down into two distinct groups.

  1. Web page optimization
  2. Web site optimization

Both categories are inter-related. All factors within each category should be paid equal attention to achieve proper optimization of the site.

Factors which tend to improve ranking of a webpage in search engine results are:

  1. Key Terms
  2. Title Tag
  3. Meta Tags
  4. Body Text
    1. Alternative Text in Img Tag
    2. H1-H6 tags
  5. Menu bar
  6. Keyword density analysis
  7. HTML code validation
  8. Absolute vs Relative URL
  9. Tables in HTML code

Factors which play a role in improving the ranking of the entire website are:

  1. Sitemap
  2. Inbound links
  3. Outbound links
  4. Reciprocal linking and link building
  5. Search engine friendly URL
  6. Domain name
  7. 404 Error page
  8. 301 Redirection
  9. Robots.txt meta file
  10. Search engine submission
  11. Visitor analysis

Factors that must be avoided during webpage construction:

  1. Frameset

     These factors are not discussed in any specific order. Each factor is significant and plays an important role in improving the ranking of a website in search engine results.

Key terms

     Key terms are queries by search engine users to find the information that they are looking for. Research should be conducted to identify the most and least used search terms relevant to the web page. Once the key terms are identified, they should be incorporated into the web page in a manner which would not constitute abusing the search engine. Search engines are misled by artificially inflating the density of key terms.

A higher density of key terms in a web page may lead to higher search engine rankings. It is advisable to purchase the domain name of a website which is identical to a key term. The domain name of a website is a primary factor used by search engine relevancy algorithms to rank a website.

A good source to identify key terms is http://www.wordtracker.com. Wordtracker gives suggestions based on over 300 million key terms used by users from Metacrawler.com and Dogpile.com in past 120 days with the actual frequency of key term and the predicted frequency.

Title tag

     The title tag is a very important component used by search engine relevancy algorithms to determine ranking. The title tag is also used by search engines in the search result listing. The title tag should incorporate a high frequency key term. At the same time the title should convey the overall information available on the web page, so that the user is enticed to click it. It is advisable to have different titles on different web pages of the same website. The title of a web page should reflect the overall content on the web page.

Figure 9: Relationship between search engine results and Title tag & Description meta tag

Meta tags

     Meta tags are used to provide information about relevant keywords and the purpose of the web page. Meta keywords tag was used as determinant in the relevancy of a web page by early search engines. Currently very few search engines consider the meta keyword tag in their relevancy algorithms. With reference to Figure 9, one can see that the description provided in search results is the information in the DESCRIPTION meta tag. This information should be precise to entice a search engine user to click on the link.

A few Meta tags that should be paid close attention while optimizing web pages are:

  1. <meta name=”robots” content=”…” />
    Robots meta tag specifies instructions to a search engine spider whether the owner of the website would allow/disallow the spider from indexing other web pages that are linked from this page
  2. <meta name=”keywords” content=”…” />
    Keywords meta tag is used to indicate keywords relevant to the web page
  3. <meta name=”description” content=”…” />
    Description meta tag provides information regarding the intended purpose of the web page

     It is worthwhile to spend extra time to have content related keywords and description on every page rather than specifying identical keywords and description for the entire website.

Body text

     Search engines tend to like unadulterated HTML code. The term adulteration is used in context of embedded JavaScript code, Flash movies and Image files. Search engines do not attempt to read these contents even though they might contain significant density of keywords within them. An example is the logo image on the website which in most case state the domain name and the caption. The caption may represent a keyword oriented phrase much similar to the title of the web page but since this information cannot be read by the search engines, it cannot be accounted to determine the relevancy of the web page. Simply put, “What a search engine cannot see does not exist on the web page”. This may or may not be true for human visitors but is certainly a rule adhered by the search engine spider. Flash text can only be read by FAST Alltheweb.com. None of the other search engines can read flash text or follow flash links [9]. Similar to flash files is JavaScript code embedded within the HTML files. Most search engines ignore JavaScript code and links within this code [10]. Another factor to consider is the keywords appearing in the “Above the fold” region of a webpage. The higher the keyword density in this region, the more relevant the web page is for a given keyword.

With these factors in mind, one can adapt the strategies outlined below to optimize body text:

  1. Include JavaScript code as a separate file. This can be done using the following HTML tag.
    <SCRIPT LANGUAGE=”JavaScript” SRC=”myJavaScript.js”></SCRIPT>
  2. Minimize usage of Flash movies
  3. Always use ALT attribute in IMG tags. The HTML IMG tag is
    <img src=”myImage.gif” alt=”My Image” />. This is the common usage of the tag. For optimization purposes, it might be better to use the tag as My Image. The objective is to bring the keyword phrase as close to the beginning of the HTML file so that the web page can increase the density of the keywords in the “Above the Fold” region.

     Style sheets are incorporated in almost every web page to enhance the visual appearance of the web page. This might entice the visitor since one will find the appearance of the web page more appealing compared to plain HTML, but for a search engine spider, style sheets are unrelated text in the “Above the Fold” region. The web page content can be optimized by including the style sheet file over embedding the style sheet code using the LINK HTML tag:
<LINK href=”myStyleSheet.css” rel=”stylesheet” type=”text/css”>

     Heading tags also play a very important role in the content of a web page. It is advisable to embed keywords in bold face within H1 to H6 tags with preference given to H1 tag over H6 tag as 1 through 6 determines the importance of the heading. Font faces like bold, italic and underline determine relative importance of text and should be used wherever applicable in conjunction with key phrases. A typical webpage may have keyword rich content with at least 200-250 words of text [11].

Menu bar

     Menu bar in a webpage link to the most important pages on the site. Since almost every page on the site contains menu, they vote for the pages linked by the menu bar. This increases link popularity of these pages within the website. These pages should have good targeted content and adhere to the linking guidelines discussed in the Sitemap section below. This may result in higher ranking for these pages.

Keyword density analysis

     Every search engine has a different keyword density calculation. Some search engines permit heavier keyword density on a webpage. Others like Google have stricter allowable density levels. The placement of keywords in different locations of the webpage has varying effects. A high density of keywords above the permissible limit will be considered as spam by the search engine and will cause the website to be penalized. Google allows a maximum of 2% of the webpage text to be keywords. Yahoo and MSN Search allow a keyword density of 5% [12].

     A free tool to check keyword density of a webpage is available at http://www.searchengineworld.com/cgi-bin/kwda.cgi

HTML code validation

     It is highly advisable to validate HTML code before submitting to search engines. Even though the webpage may look visually correct, it may have syntax errors which may be ignored by forgiving browsers like Internet Explorer. A free validation service is provided by W3C. This service is available at validator.w3.org. This software checks for W3C XHTML 1.0 compliance and gives a detailed report. W3C cascading style sheet validation is available at
jigsaw.w3.org/css-validator/

Absolute vs relative URL

     Search engine spiders prefer absolute URL over relative URL. Search engine spiders may miss indexing some web pages when relative URLs are used. Absolute URLs will significantly reduce the portability of the website in the event of domain name change. This can be overcome by using a global variable which will contain the domain name of the website. This variable can be used to generate absolute URLs within web pages.

Tables in HTML code

     Tables are used in webpage construction to make the layout more organized. Some web developers may use tables within table to simplify the webpage structure for maintenance purposes. This adds a lot of irrelevant text decreasing the keyword density in the “Above the Fold” region of the webpage. Most web pages have menu bar on the left hand side or the top of the web page. Having the menu bar positioned in such a way may decrease the density of keywords in the “Above the Fold” region.

Few alternatives to these issues are:

  1. Position the menu bar on the right side of the web page and keyword sensitive content on the the left side
  2. Use CSS stylesheet to define individual tag specifics. This CSS code must be placed in a separate file
    e.g.    <td id=”centerLcolumn”>My Text</td>
    instead of
    <td width=”100″ height=”400″ bgcolor=”#000000″ bordercolor=”#CCCCCC”> My Text</td>

Sitemap

     A site map is a web page with links to every webpage within the web site. This web page has high importance in the website. Once the sitemap gets spidered by a search engine, one can be sure that every page on the website has been indexed. When designing the sitemap of a website, key points to remember are:

  1. The sitemap should contain HTML anchor tags
  2. The link text should consist of keywords relevant to the destination webpage. The link text may contain identical phrase as the TITLE tag of the destination webpage. The link text is significant since it states what the content of the destination page may be. Link text is taken into consideration by the relevancy algorithm of search engines.
  3. The sitemap should be visible to the search engine. This means that there must be a link from the every page of the website (typically in the footer) to the sitemap and spiders must be permitted to index the sitemap

A typical link on a sitemap may be modeled on the following example
<a href=”http://mysite.com/gallery.htm”>Gallery</a>

Avoid the following:

  1. JavaScript handlers in anchor tag
    <a href=”#” onclick=”gotoURL(‘gallery’)”>Gallery</a>
  2. Flash movie for sitemap
  3. Images instead of link text
    <img alt=”Gallery” src=”gallery.gif” />
  4. Imagemap
  5. Irrelevant link text
    <a href=”http://mysite.com/gallery.htm”>Check this out</a>

     If the sitemap has more than 100 links, split the sitemap into multiple pages. A guide to creating sitemaps is provided by Google and is available at http://www.google.com/webmasters/sitemaps/docs/en/about.html. It is advisable to read these guidelines and follow them while creating the sitemap.

Inbound links

     For Google, inbound links help determine the PageRank of a website. Without any inbound links, a website is practically invisible to the search engine. One way for the search engine spider to index a website is by following inbound links from another indexed website. The alternative is to manually submit the website to the search engine spider’s crawling list. Though manual submission is encouraged, there is never a guarantee that the website will be indexed. On the other hand if there are inbound links from other sites, it is more predictable that the website will be indexed.

Inbound links from the following sources help in improving ranking of a web page [10]:

  1. All major and local directories; Yahoo, DMOZ, LookSmart, trade, business and industry related directories
  2. Suppliers, happy customers, sister companies and Partners
  3. Websites which provide accompanying services
    e.g. Inbound links from web hosting companies for a site selling website templates
  4. Related websites but not competing websites
    e.g. Websites that provide tutorials about web design and modification of website templates
  5. Competing websites

     Not all inbound links have the same weightage. Links from authoritative industry sources count more towards improving page ranks than links from a small private website. Some inbound links may have a negative effect on the PageRank. These are:

  1. Links from FFA (Free for all) link pages
  2. Link farms
    Link farming is the process of organized exchanging of unrelated links between websites.
  3. Links from doorway pages
    Doorway pages are web pages created with the intent of inflating the inbound links of a website. These pages are created with the sole purpose of serving search engine spiders with optimized content which may boost the ranking of the webpage.
  4. Links from discussion forms
    Discussion forums can be maliciously used to inflate the inbound links to a website. Given a good but unmoderated message board, spammers may include messages to their spam pages as part of seemingly innocent messages they post [7]. In a moderated message board, spammers post valid messages with links to their websites in their signatures.

     Most search engines penalize websites which employ malicious techniques to inflate link popularity to the extent of removing the website from the index.

Outbound links

     Outbound links may improve the ranking of a website as long as the website is citing good websites [10]. Good websites are the ones which have been recognized as authorities in the industry relevant to the website. Outbound links may cause PageRank Leak as discussed in the preceding chapter. If one is following a reciprocal linking program, PageRank Leak can be minimized by masking the destination URL of outbound links using JavaScript code or using the NOINDEX NOFOLLOW property in the Robots Meta Tag. This is not an ethical practice but is followed by some websites. An ethical solution would be to maintain outbound links to a few authoritative and related websites. Avoid linking to websites which follow the practice of masking URL in a reciprocal linking program as this would cause a PageLeak with no worthwhile benefit.

Reciprocal linking and link building

     Reciprocal Linking is a strategy to gain inbound links from websites that share the same idea as one’s website and provide an outbound link in exchange. This strategy improves the link popularity of a website. Link popularity can be defined as number and quality of inbound links to a website. Reciprocal linking is done by searching for websites that share the same idea and requesting for an inbound link in exchange for an outbound link. These websites should be rich in keywords and phrases that are emphasized on one’s website. Before starting a reciprocal linking strategy one should have the following web pages in place. They are:

  1. A webpage which contains outbound links to websites (Link directory). This webpage should be linked from the homepage so that it gets indexed by the search engine spider.
  2. A “Link to Us” page which gives cut and paste HTML code to link to one’s website.

Once these pages are in place, one should send emails to webmasters of short listed sites expressing interest in reciprocal linking. Key points to ask in this email are whether they would allow placing an outbound link on one’s website and if so, do they expect a specific format (text link, image, flash movie, etc.) and the HTML code that should be used for an inbound link. A text link will be most effective as an inbound link. Pay close attention to the link text in the anchor tag.

Zeus by cyber-robotic.com is a highly effective reciprocal link building software (Cost US$ 195). Zeus is a robot/spider which crawls the internet to find websites that have similar themes as one’s website. Once the list of sites is compiled, Zeus can be used to send personalized email messages to webmasters of these websites, track and maintain the details of each site. It dynamically generates keyword tuned link directory pages which can be uploaded to one’s website.

CPA affiliate programs provided by third parties like Commission Junction may bring qualified leads to one’s website. But if a website hosts its own CPA program, it serves a dual purpose. Not only does an affiliate program bring qualified leads, but also indirectly builds inbound links to the website. iDevAffiliate v4.0 Gold Edition (Cost US$ 149) by idevdirect.com is a popular software used by many merchants to host their own affiliate programs. One can promote their affiliate program by submitting to affiliate program directories, specifying the terms of their affiliate program.

Search engine friendly URL

     Many websites have dynamically generated content. Content is dynamically generated in most cases by passing parameters in the URL. The URL in case of a dynamic webpage resembles http://www.mysite.com/index.php?pageid=70. This URL will be indexed by a search engine. However, there is often more than one parameter attached to the URL like sort order, navigation setting. Hence different URLs end up pointing to the same webpage.

http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD (Hits Descending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsA (Hits Ascending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleD (Title Descending)
http://www.mysite.com/album/viewcat.php?pageid=70&orderby=titleA (Title Ascending)

     There is no way for the search engine to justify which parameter identifies a new page and which parameter is a setting that does not justify indexing the URL as a new page. Hence spiders have been programmed to detect and ignore dynamic pages. This can be resolved by making the URL search engine friendly by replacing the database characters (#&*!%) with equivalent search engine friendly terms or characters. The above four URLs can be made search engine friendly as follows:

http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsA
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleD
http://www.mysite.com/album/viewcat.php/pageid.70/orderby.titleA

The webpage is indexed since the spider is fooled into believing that since the URL does not contain a database character, it is not a dynamic webpage. This might be an intermediate solution adopted by search engine spiders until there is technique that will allow spiders to index dynamic web pages, since the problem of isolating unique pages from their clones is not resolved by generating search engine friendly URL. This type of conversion between dynamic URL to search engine friendly URL and vice-versa can be achieved on almost all types of servers either by proper configuration or installing third party software. One should communicate with their hosting service provider to know more about the software/server configuration available to generate search engine friendly URL. The website code may have to be modified to generate search engine friendly URL in each anchor tag that is parsed by the API.

     Mod_rewrite module in Apache server is used to make a URL search engine friendly. A URL request for “http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD” may be translated by mod_rewrite to “http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD“ depending on the regular expression specified.

     The web programmer will have modify the script to generate URL’s of the type “http://www.mysite.com/album/viewcat.php/pageid.70/orderby.hitsD” instead of “http://www.mysite.com/album/viewcat.php?pageid=70&orderby=hitsD” within the web pages so that all URLs on the website are search engine friendly.

Domain name

Figure 10: Search results for keyword – betterbody – Google snapshot

The significance of domain name in the ranking of websites in search results cannot be overlooked. Figure 10 is a listing of the search performed in Google for the keyword betterbody. The inbound links for the listings have been computed using the “Who links to you?” feature in Google search for the exact URL that came up in the search results.

Examine Listing A4 in Figure 10. http://www.betterbody.de has a ranking of 4 in search results out of 2780 for the term betterbody. This page has 18 inbound links. This is higher than the number of inbound links for A1, A2 or A3. There is no occurrence of the term betterbody in the title tag, meta keywords tag, meta description tag or the content of the webpage (Refer to Figure 11 Keyword density analysis of http://www.betterbody.de for the term betterbody). The only occurrence of the term betterbody is in the domain name of the website. In fact, the keyword and the domain name is an exact match. This webpage would not have shown up in search results for this keyword had it not been for the domain name since this is the only occurrence. Not only did the webpage come up in the search results but also came up on the first page with a ranking of 4. The above discussion illustrates the significance of domain name in SEO.

     Research should be conducted to determine popular keywords relative to the site. Keyword research has been discussed in key terms section of this chapter. Domain names are synonymous to brand names. Changing domain names after the website has been launched and gained popularity is highly discouraged. URLs can be optimized independent of the domain name by incorporating keywords in individual web page URL like http://www.mysite.com/betterbody/mainpage.htm
http://www.mysite.com/better-body/mainpage.htm

Figure 11: Keyword density analysis of betterbody.de

Using more than 2 keywords in the webpage URL may be treated as search engine spam.

404 Error page

     A 404 error page states that the page cannot be found. The spider receives this page from the server in response to a valid URL request. This page along with all rankings will be dropped from the search engine index. Moreover, the spider makes no attempt to crawl the website on receiving this page. Customize the 404 error page, typically with a sitemap to ensure successful crawling of all other web pages by the spider.

301 Redirection

     301 Redirection is a spider/visitor friendly strategy to redirect one webpage to another for websites hosted on Apache servers. 301 Redirection is implemented by specifying the source and destination URLs in the .htaccess file. 301 Redirection is interpreted as “moved permanently”. This is required to ensure stability of PageRank for the site. Google interprets http://www.mysite.com and http://mysite.com as two different URLs. As a result, Google assigns different PageRank to same web pages depending on whether they have www in the domain name. This causes the PageRank for mysite.com to be distributed between http://mysite.com and http://www.mysite.com. Implementing a 301 redirect from http://mysite.com to http://www.mysite.com will ensure that all pages will be indexed as http://www.mysite.com/myexample.htm. One should pay close attention to ensure that all link building strategies use www in the URL like:

  1. “Link to Us” page
  2. Search engine and directory submissions
  3. Reciprocal linking code
  4. Absolute URLs within the site

Robots.txt meta file

     Robots.txt (Robots Exclusion Standard) is a file with specific instructions to the spider specifying to crawl/ignore web pages. Robots.txt must be located in the root directory of the website. The same effect can be achieved with Robots meta tag. The difference is that Robots.txt file is a centralized location to specify instructions which may reduce maintenance. Robots.txt file allows blocking specific directories from being indexed. This is helpful for a website with member access web pages. A free tool is available at http://www.searchengineworld.com/cgi-bin/robotcheck.cgi to validate robots.txt file. Validating robots.txt is important as this can cause web pages to be indexed or blocked from spiders.

Website submission

     Website submission checklist

  1. Website is completed and optimized
  2. HTML code is validated
  3. Incoming links have been established
  4. Description of the website in less than 25 words with at least 2 to 3 key terms
  5. Keyword list
  6. Email address, preferably with the same domain name as the website to respond to submission notifications. e.g. submit@mysite.com

There is no need to submit each and every webpage on the site. Most search engines prefer only the top level page in submissions. Manual submission is preferred over automated submissions. Most search engines and directories have guidelines for proper submission. One should read these carefully before submitting the site.  Frequently submitting one’s website to search engines is considered spamming. This can cause the website to be penalized. Hence it is advisable to submit the website only once to each of the search engines. After submission, one should constantly check their submission email, since their might be responses from search engines and directories about improper submissions and corrections that need to be made. Also, some search engines and directories require validation of email address for each submission.

     Important search engines and directories [17] [18] [19] that may be considered for website submission are:
http://www.google.com                                         http://www.yahoo.com                                              http://www.askjeeves.com      http://www.alltheweb.com                                   http://www.aol.com                                                  http://www.hotbot.com              http://www.altavista.com                                     http://www.qango.com                                              http://www.gigablast.com     http://www.looksmart.com                                   http://www.lycos.com                                                           http://www.msn.com     http://www.netscape.com                         http://www.about.com                                              http://www.exite.com                 http://www.pepesearch.com                                 http://www.iwon.com                                                           http://www.dmoz.org            http://www.webcrawler.com                         http://www.webwombat.com                        http://www.aeiwi.com     http://www.links2go.com                         http://www.searchking.com                          http://www.joeant.com         http://www.zeal.com                                     http://www.wondir.com                                            http://www.illumirate.com    http://www.jayde.com                                               http://www.vlib.org                                                   http://www.goguides.org       dir.yahoo.com                                      http://www.business.com

     It is advisable not to redesign the website or change the webpage content after the site has been submitted and indexed since this can cause variations in website rankings in search results.

Submit the website sitemap to Google sitemaps. Submitting the sitemap to Google sitemaps may provide the site with better crawl coverage and fresher search results.

Visitor analysis

     Visitor Analysis is an important part of website maintenance. http://www.statcounter.com (US$ 29 per month) is a paid service that maintains website statistics. Website statistics gives in depth information about the geographical location of visitors, search terms used to reach the website, referring websites, popular web pages, Operating system, monitor resolution, browser information and time spent on the website by each visitor and peak traffic hours during each day. This information can be utilized to cater to different types of visitors with their individual needs which would add value to the time spent by the visitor on the site. An example would be the monitor resolution of the visitor. This information can be used to tune up the site so that minimum scrolling is needed. Another benefit would be to keep a check that the server never goes down during peak traffic hours.

Frameset

     Search engines tend to dislike websites with frames.  Frames have their inherent problems like book marking. A visitor who wants to bookmark a specific page on a website using frames is unable to do so. Search engines view pages using frames as different web pages even though it might visually appear as a single page. Hence the search engine may misunderstand the content of the webpage even though it might make perfect sense to the visitor. Though there are solutions to make a website using frames to display similar contents to the visitor as well as the search engine, it is better to avoid using frames altogether.

SEO Roadmap

     Figure 12 below gives a brief overview of the search engine optimization process categorizing different factors into development and maintenance tasks.

Figure 12: Search engine optimization roadmap

Search engine optimization is a constantly evolving area. Website owners are constantly trying to discover better techniques to improve ranking. Unfortunately, search engine algorithms are proprietary which adds mystery to the subject. As search engines tend to improve relevancy algorithms, so will SEO tactics change.

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: