techsir 登陆 |注册|TS首页
  首页 快活林 猿氏悟语

Optimizing phpBB 2.0 for Search Engines, Page Rank, and Security

By: 阿里爸爸 发表于 2008-9-26 10:56:22 · 9266次点击   回复:0   
DeveloperSide.NET Articles
  Search Engines -- Getting your Forums Indexed

  Problem: Search Engine bots (spiders, crawlers) will not index pages that contain Session IDs (sid) in URLs.

  Example of URLs that will not be indexed...

  http://forums.domain.com/index.p ... 5347f170b3e4r2067b7

  http://forums.domain.com/viewtop ... 5347f170b3e4r2067b7

  Example of URLs that will be indexed...

  http://forums.domain.com/index.php

  http://forums.domain.com/viewtopic.php?t=1689

  *Note that Session IDs are normally stored in cookies; otherwise they are transferred via the URL. For Session IDs to be visibly present in the URL, cookies have to be turned off under your browser's settings.

  Solution: Selectively remove Session IDs from URLs.

  Method one: Remove Session IDs for specific Search Engine bots by recognizing their 'User-Agent' HTTP header strings.

  Example of 'User-Agent' strings that are received on every HTTP request...

  Google -- "Googlebot/2.1 (+http://www.google.com/bot.html)"

  MSN -- "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"

  Yahoo -- "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

  Benefits: This method does not remove Session IDs from non-logged in users (guests); allowing guests to posts.

  Downside:

  (1) While the 'User-Agent' strings of the major Search Engine bots are known, some lesser-known bots will be missed.

  (2) The 'User-Agent' string of a bot can change from time to time; an updated list must be kept.

  Edit file 'includes/sessions.php'

  Replace function...

  (last function in file)


  1.   function append_sid($url, $non_html_amp = false)

  2.   {

  3.   global $SID;

  4.   if ( !empty($SID) && !preg_match('#sid=#', $url) )

  5.   {

  6.   $url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;

  7.   }

  8.   return $url;

  9.   }

  10.   
复制代码
with function...


  1.   function append_sid($url, $non_html_amp = false)

  2.   {

  3.   global $SID;

  4.   if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'msnbot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Slurp') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'almaden.ibm.com') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'zyborg') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Jeeves') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'crawler') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'spider') )

  5.   {

  6.   $url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;

  7.   }

  8.   return $url;

  9.   }

复制代码
  Method two: Remove Session IDs for all non-logged in (guest/anonymous) users.

  Benefits: All Search Engine bots will be able to crawl and index forum.

  Downside: Users will need to be registered and logged-in to have the ability to post.

  Under Administration Panel -- Forum Admin -- Permissions : switch all forums to "Registered"

  Edit file 'includes/sessions.php'

  Replace line...

 
  1.  $SID = 'sid=' . $session_id;   
复制代码
  with line...

 
  1.  if ( $userdata['session_user_id'] != ANONYMOUS ){ $SID = 'sid=' . $session_id; } else { $SID = ''; }  
复制代码
  Search Engines -- robots.txt

  Problem: Search Engine bots will try to index all available pages/links under the forum. Some of these pages/links have no value, can be harmful to page rank, and should not be indexed.

  Solution: Create phpBB forum root level file 'robots.txt'; specifying pages/links not to be indexed.

  Method:

  There are only 3 pages/links that are beneficial to page rank, that should be indexed...

 
  •  /index.php
  •   /viewforum.php
  •   /viewtopic.php


  Every other page/link should be disallowed.

  View phpBB root level directory/file structure; disallow everything but the above 3 pages/links...

  Contents of 'robots.txt'...


  1.   User-agent: *

  2.   Disallow: /admin/

  3.   Disallow: /db/

  4.   Disallow: /images/

  5.   Disallow: /includes/

  6.   Disallow: /language/

  7.   Disallow: /templates/

  8.   Disallow: /common.php

  9.   Disallow: /config.php

  10.   Disallow: /faq.php

  11.   Disallow: /groupcp.php

  12.   Disallow: /login.php

  13.   Disallow: /memberlist.php

  14.   Disallow: /modcp.php

  15.   Disallow: /posting.php

  16.   Disallow: /privmsg.php

  17.   Disallow: /profile.php

  18.   Disallow: /search.php

  19.   Disallow: /viewonline.php
复制代码
  The first line specifies a match for all Search Engines.

  The following lines state that any link that starts with the given text should not be indexed.

  Page Rank

  Problem: The most valuable (local) text for page rank is located in the title of the page. phpBB adds text to the title of a page that takes up valuable space.

  Example:


  1.   URL /index.php : "SITENAME :: Index"

  2.   URL /viewforum.php : "SITENAME :: View Forum - forum name here"

  3.   URL /viewtopic.php : "SITENAME :: View topic - topic text here"

复制代码
  Solution: Remove the unnecessary text.

  Remove general "SITENAME" text from all pages...

  Edit file 'templates/subSilver/overall_header.tpl'

  Replace line...

 
  1.  <title>{SITENAME} :: {PAGE_TITLE}</title>   
复制代码
  with line...
  1.   <title>{PAGE_TITLE}</title>  
复制代码
  Replace the index page "Index" text with site name or keyword text...

  Edit file 'language/lang_english/lang_main.php'

  Replace line...
  1.   $lang['Index'] = 'Index';  
复制代码
  with line...

 
  1.  $lang['Index'] = 'Your-site-name Forums or keyword text';   
复制代码
  Remove "View Forum - " text...

  Edit file 'viewforum.php'

  Replace line...
  1.   $page_title = $lang['View_forum'] . ' - ' . $forum_row['forum_name'];  
复制代码
  with line...

 
  1.  $page_title = $forum_row['forum_name'];  
复制代码
  Remove "View topic - " text...

  Edit file 'viewtopic.php'

  Replace line...
  1.   $page_title = $lang['View_topic'] .' - ' . $topic_title;  
复制代码
  with line...

 
  1.  $page_title = $topic_title;
复制代码
  Cosmetic Changes

  Remove the intrusive phpBB logo...

  Edit file 'templates/SubSilver/overall_header.tpl'

  Delete or comment out (with <--! -->) line...

  

  Hyperlink the sitename back to your main site...

  Edit file 'templates/SubSilver/overall_header.tpl'

  Edit line...
  1.   {SITENAME}  
复制代码
  with line...
  1.   {SITENAME} - Forums  
复制代码
  Remove Faq, Memberlist, and Grouplist links from header...

  Edit file 'templates/subSilver/overall_header.tpl'

  Find...

 
  1.   {L_FAQ}   {L_SEARCH}   {L_MEMBERLIST}   {L_USERGROUPS}   
复制代码
  Replace with...

 
  1.     {L_SEARCH}   
复制代码
  Spammers and Bots

  Tell search engines not to index, nor follow links, of the memberlist and user profile pages...

  Edit files memberlist.php and includes/usercp_viewprofile.php

  Find line...
  1.   // Generate page   
复制代码
  Right under this line add...


  1.   $template->assign_vars(array('META'=>''));

复制代码
  Remove the Newest User link from being displayed on the main page...

  Edit file templates/subSilver/index_body.tpl

  Find...

  1.   {TOTAL_POSTS}
  2. {TOTAL_USERS}
  3. {NEWEST_USER}
复制代码
  Edit to...


  1.   {TOTAL_POSTS}
  2. {TOTAL_USERS}
复制代码
  Do not display un-activated nor zero-post members in your memberlist...

  Edit file memberlist.php

  Find...

  1.   $sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatar

  2.   FROM " . USERS_TABLE . "

  3.   WHERE user_id <> " . ANONYMOUS . "

  4.   ORDER BY $order_by";

  5.  
复制代码
 Edit to...

  1.   $sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatar

  2.   FROM " . USERS_TABLE . "

  3.   WHERE user_id <> " . ANONYMOUS . "

  4.   AND user_active = 1

  5.   AND user_posts > 0

  6.   ORDER BY $order_by";   
复制代码
  Delete all spambot and unwanted users...

  Enter the mysql shell...
  1.   mysql -u root -p  
复制代码
  Select the phpBB database...

 
  1.  use phpbb2;   
复制代码
  Display and delete all un-activated users, older than 2 days...
  1.   SELECT username, user_id, FROM_UNIXTIME(user_regdate) FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY) ORDER BY user_regdate;   
复制代码
  1.   DELETE FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY);
复制代码
  Display and delete all users with zero posts and a website...
  1.   SELECT username, FROM_UNIXTIME(user_lastvisit), FROM_UNIXTIME(user_regdate), user_website FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL ORDER BY user_lastvisit;  
复制代码
 
  1.  DELETE FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL;  
复制代码
  Display and delete all users that have not logged-in in 1 year...
  1.   Select username, user_id, FROM_UNIXTIME(user_lastvisit) FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000 ORDER BY user_lastvisit;   
复制代码
  1.   DELETE FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000;   
复制代码
  Ban all *@*.ru, *@*.biz, *@*.info email addresses [this has to be done directly; will not work from the phpbb admin interface]...

 
  1.  insert into phpbb_banlist (ban_email) values ('*@*.ru');   
复制代码
  1.   insert into phpbb_banlist (ban_email) values ('*@*.biz');   
复制代码
 
  1.  insert into phpbb_banlist (ban_email) values ('*@*.info');  
复制代码
9266次点击
0个回复  |  直到 2008-9-26 10:56:22
添加一条新回复
您需要登录后才可以回帖 登录 | 成为会员 新浪微博登陆

标签云|手机版|科技先生 ( 京ICP备07036130号 Powered by Discuz! X )

GMT+8, 2024-12-22 22:22