DeveloperSide.NET Articles
Search Engines -- Getting your Forums Indexed
Problem: Search Engine bots (spiders, crawlers) will not index pages that contain Session IDs (sid) in URLs.
Example of URLs that will not be indexed... ... 5347f170b3e4r2067b7 ... 5347f170b3e4r2067b7
Example of URLs that will be indexed...
*Note that Session IDs are normally stored in cookies; otherwise they are transferred via the URL. For Session IDs to be visibly present in the URL, cookies have to be turned off under your browser's settings.
Solution: Selectively remove Session IDs from URLs.
Method one: Remove Session IDs for specific Search Engine bots by recognizing their 'User-Agent' HTTP header strings.
Example of 'User-Agent' strings that are received on every HTTP request...
Google -- "Googlebot/2.1 (+"
MSN -- "msnbot/1.0 (+"
Yahoo -- "Mozilla/5.0 (compatible; Yahoo! Slurp;"
Benefits: This method does not remove Session IDs from non-logged in users (guests); allowing guests to posts.
(1) While the 'User-Agent' strings of the major Search Engine bots are known, some lesser-known bots will be missed.
(2) The 'User-Agent' string of a bot can change from time to time; an updated list must be kept.
Edit file 'includes/sessions.php'
Replace function...
(last function in file)
- function append_sid($url, $non_html_amp = false)
- {
- global $SID;
- if ( !empty($SID) && !preg_match('#sid=#', $url) )
- {
- $url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;
- }
- return $url;
- }
复制代码with function...
- function append_sid($url, $non_html_amp = false)
- {
- global $SID;
- if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'msnbot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Slurp') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'zyborg') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Jeeves') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'crawler') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'spider') )
- {
- $url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;
- }
- return $url;
- }
复制代码 Method two: Remove Session IDs for all non-logged in (guest/anonymous) users.
Benefits: All Search Engine bots will be able to crawl and index forum.
Downside: Users will need to be registered and logged-in to have the ability to post.
Under Administration Panel -- Forum Admin -- Permissions : switch all forums to "Registered"
Edit file 'includes/sessions.php'
Replace line...
- $SID = 'sid=' . $session_id;
复制代码 with line...
- if ( $userdata['session_user_id'] != ANONYMOUS ){ $SID = 'sid=' . $session_id; } else { $SID = ''; }
复制代码 Search Engines -- robots.txt
Problem: Search Engine bots will try to index all available pages/links under the forum. Some of these pages/links have no value, can be harmful to page rank, and should not be indexed.
Solution: Create phpBB forum root level file 'robots.txt'; specifying pages/links not to be indexed.
There are only 3 pages/links that are beneficial to page rank, that should be indexed...
- /index.php
- /viewforum.php
- /viewtopic.php
Every other page/link should be disallowed.
View phpBB root level directory/file structure; disallow everything but the above 3 pages/links...
Contents of 'robots.txt'...
- User-agent: *
- Disallow: /admin/
- Disallow: /db/
- Disallow: /images/
- Disallow: /includes/
- Disallow: /language/
- Disallow: /templates/
- Disallow: /common.php
- Disallow: /config.php
- Disallow: /faq.php
- Disallow: /groupcp.php
- Disallow: /login.php
- Disallow: /memberlist.php
- Disallow: /modcp.php
- Disallow: /posting.php
- Disallow: /privmsg.php
- Disallow: /profile.php
- Disallow: /search.php
- Disallow: /viewonline.php
复制代码 The first line specifies a match for all Search Engines.
The following lines state that any link that starts with the given text should not be indexed.
Page Rank
Problem: The most valuable (local) text for page rank is located in the title of the page. phpBB adds text to the title of a page that takes up valuable space.
- URL /index.php : "SITENAME :: Index"
- URL /viewforum.php : "SITENAME :: View Forum - forum name here"
- URL /viewtopic.php : "SITENAME :: View topic - topic text here"
复制代码 Solution: Remove the unnecessary text.
Remove general "SITENAME" text from all pages...
Edit file 'templates/subSilver/overall_header.tpl'
Replace line...
- <title>{SITENAME} :: {PAGE_TITLE}</title>
复制代码 with line...
- <title>{PAGE_TITLE}</title>
复制代码 Replace the index page "Index" text with site name or keyword text...
Edit file 'language/lang_english/lang_main.php'
Replace line...
- $lang['Index'] = 'Index';
复制代码 with line...
- $lang['Index'] = 'Your-site-name Forums or keyword text';
复制代码 Remove "View Forum - " text...
Edit file 'viewforum.php'
Replace line...
- $page_title = $lang['View_forum'] . ' - ' . $forum_row['forum_name'];
复制代码 with line...
- $page_title = $forum_row['forum_name'];
复制代码 Remove "View topic - " text...
Edit file 'viewtopic.php'
Replace line...
- $page_title = $lang['View_topic'] .' - ' . $topic_title;
复制代码 with line...
- $page_title = $topic_title;
复制代码 Cosmetic Changes
Remove the intrusive phpBB logo...
Edit file 'templates/SubSilver/overall_header.tpl'
Delete or comment out (with <--! -->) line...
Hyperlink the sitename back to your main site...
Edit file 'templates/SubSilver/overall_header.tpl'
Edit line...
with line...
Remove Faq, Memberlist, and Grouplist links from header...
Edit file 'templates/subSilver/overall_header.tpl'
复制代码 Replace with...
Spammers and Bots
Tell search engines not to index, nor follow links, of the memberlist and user profile pages...
Edit files memberlist.php and includes/usercp_viewprofile.php
Find line...
Right under this line add...
- $template->assign_vars(array('META'=>''));
复制代码 Remove the Newest User link from being displayed on the main page...
Edit file templates/subSilver/index_body.tpl
复制代码 Edit to...
复制代码 Do not display un-activated nor zero-post members in your memberlist...
Edit file memberlist.php
- $sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatar
- WHERE user_id <> " . ANONYMOUS . "
- ORDER BY $order_by";
复制代码 Edit to...
- $sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatar
- WHERE user_id <> " . ANONYMOUS . "
- AND user_active = 1
- AND user_posts > 0
- ORDER BY $order_by";
复制代码 Delete all spambot and unwanted users...
Enter the mysql shell...
Select the phpBB database...
Display and delete all un-activated users, older than 2 days...
- SELECT username, user_id, FROM_UNIXTIME(user_regdate) FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY) ORDER BY user_regdate;
复制代码- DELETE FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY);
复制代码 Display and delete all users with zero posts and a website...
- SELECT username, FROM_UNIXTIME(user_lastvisit), FROM_UNIXTIME(user_regdate), user_website FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL ORDER BY user_lastvisit;
复制代码 - DELETE FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL;
复制代码 Display and delete all users that have not logged-in in 1 year...
- Select username, user_id, FROM_UNIXTIME(user_lastvisit) FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000 ORDER BY user_lastvisit;
复制代码- DELETE FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000;
复制代码 Ban all *@*.ru, *@*.biz, *@*.info email addresses [this has to be done directly; will not work from the phpbb admin interface]...
- insert into phpbb_banlist (ban_email) values ('*@*.ru');
复制代码- insert into phpbb_banlist (ban_email) values ('*@*.biz');
复制代码 - insert into phpbb_banlist (ban_email) values ('*@*.info');