Mario's World Forums  

Go Back   Mario's World Forums > The End Is Nigh > ► Off-topic
FAQ Community Calendar Today's Posts Search

► Off-topic (Forum related to the discussion of anything that does not have it's own specific forum. Nothing is off limits.)

Reply
 
Thread Tools Display Modes
Old 24th April 2009, 11:05 PM   #1
Mario
Special Guest
 
Mario's Avatar
 
Join Date: Dec 2008
Location: @home
Posts: 627
Exclamation Munax webcrawler indexing sites. (IP range 82.99.30.2 to 82.99.30.73)

If you run a forum and have recently noticed some strange guest activity as per the included screenshot, then know that it is Munax, a Stockholm based Swedish Internet search company who is crawling your website or forums and sucking up bandwidth. Following is their FAQ section as of the date of this post.



Quote:
Crawling FAQ


Dear site owner,

Your site might have been visited by our crawlers, with network addresses in the range of 82.99.30.2 - 82.99.30.73. Here is a short FAQ answering some of the questions you might have:


What is the name of your crawler ?

Our crawler does not have a "name", yet. Instead it announces itself to be a standard web browser, a "Mozilla 4.0" kind-of-browser compatible with the browser Microsoft Internet Explorer 6.0, running on the Windows NT 5.1 operating system. The reasons for this are: (a) Today, web servers are intelligent enough to react on the type of user agent. If our crawlers had a name, say MunaxRob or something like that, many web servers would not know about it and would return junk or maybe nothing at all. (b) We want the web server to return a page to us where the page looks as close as possible to a page that can be viewed with a standard web browser. This, to create the best possible indexing in our database and a WYSIWYG experience for anybody that is visiting our search engine.

It is true that many of todays search engines are doing well with a name set to 'Robot' or something like that, but those search engines are well known and site owners have given the crawlers of those search engines a chance to retrieve the best possible information. We want to be given the same chance.


How often do you visit my site ?

The period between each time our crawlers will visit your site should be somewhere between 15 minutes to several days.


Do you store the things you index ?


Unlike other search engines, we do not want to steal the things you have on your site. For instance, some search engines download and convert your images and then display them as thumbnails in their search results. We just store the links to the images. Additionally, since we just try to access your images, not downloading them, we will use practically nil of your bandwidth.

For pages, we do the same thing as many other search engines, i.e. our crawlers download and store a copy of the page in the database cache.

For other things, like video and audio, we follow the rules accepted on the web and take only a small snapshot. For a video we take 3 - 4 seconds and for audio about 16 seconds.


Why do you supply the URL http://www.munax.com/referer.htm as 'Referer' ?

You might have set your web server to deny access to things (images for instance) on your site unless the Referer is a page on your web site. This is why the crawler access your site with a Referer page outside your site; The crawler wants to know if we will be denied linkage to the images on your site. If yes, the crawler must set a low rank value on, or remove, your image link from the search index to avoid displaying a broken/missing image in the search results.


Do you honour the robots.txt protocol ?


Yes we do. However, the crawler will always (almost) fetch the first page of the site, i.e. the page of the root URL "/". This is for ranking calculation reasons. When we leave beta state we will most likely change this so the first page will be skipped too. Also, if other sites links to multimedia on your site, the crawler will index those links, assuming that those links must be OK to index since other sites are allowed to link to your site.

The crawlers will ignore a robots.txt file if it is not correctly written.


How do I exclude my site from being indexed ?


Remove NOSPAM from the email address info2@NOSPAMmunax.com and send an email with the subject "Exclude my site from indexing, code: 84jdur74ud". In your email you should state the full URL of the site. Also, note that others might want to have your site excluded, so be sure to use a correct senders email address. It should have the same domain name as the site you want to exclude.

Because of being in beta state and so many things to do and so many requests to serve, your site might not be excluded until the next time we crawl & index the web.



Again... Please note that we are in beta state. We try to correct things as soon as possible and we are sorry for any inconvenience our crawlers might have caused you and your site.
Source.
Mario is offline   Reply With Quote
Reply

Bookmarks

Tags
82.99.30.2, 82.99.30.73, crawler, forum, munax, sweeden


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +8. The time now is 07:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
All content ©1997 - 2023, Mario's World, Inc.

eXTReMe Tracker