<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: P2P Search Engine</title>
	<atom:link href="http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/</link>
	<description>The personal web site of Brendon J. Wilson, a software developer, technologist, and entrepreneur living in Vancouver, British Columbia, Canada.</description>
	<lastBuildDate>Mon, 06 Sep 2010 10:19:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Brendon</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-385</link>
		<dc:creator>Brendon</dc:creator>
		<pubDate>Sun, 23 Oct 2005 04:07:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-385</guid>
		<description>It&#039;s not quite the same as what I had in mind, but it&#039;s close. The purpose of the system I proposed was not only to use distributed peers to index the net, but to use collaborative filtering to improve search results by feeding individuals&#039; behaviour towards search results back into the index.</description>
		<content:encoded><![CDATA[<p>It&#8217;s not quite the same as what I had in mind, but it&#8217;s close. The purpose of the system I proposed was not only to use distributed peers to index the net, but to use collaborative filtering to improve search results by feeding individuals&#8217; behaviour towards search results back into the index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jayanath</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-366</link>
		<dc:creator>Jayanath</dc:creator>
		<pubDate>Sun, 02 Oct 2005 13:02:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-366</guid>
		<description>Cud you tell me if the protottype presenteed in http://www.yacy.net/yacy is similar to the ideas presented by you. Are there anyways we could improve the work of YaCY with regard to your view of a P2P based web searching engine ?</description>
		<content:encoded><![CDATA[<p>Cud you tell me if the protottype presenteed in <a href="http://www.yacy.net/yacy" rel="nofollow">http://www.yacy.net/yacy</a> is similar to the ideas presented by you. Are there anyways we could improve the work of YaCY with regard to your view of a P2P based web searching engine ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evan Wise</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-64</link>
		<dc:creator>Evan Wise</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-64</guid>
		<description>Put up your ideas in HTML and perhaps I will read them....</description>
		<content:encoded><![CDATA[<p>Put up your ideas in HTML and perhaps I will read them&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Boris Mann</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-65</link>
		<dc:creator>Boris Mann</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-65</guid>
		<description>Ditto Evan&#039;s comment. Although I would read it if it were a PDF as well :p

Also, Google partially uses this in their algorithm as well -- if a search for &quot;ottawa baseball&quot; has people consistently clicking on the third link result, it gets higher rankings.

There might be a way to refine this, in that Google doesn&#039;t necessarily know that you &quot;found what you were looking for&quot; when clicking on the third link, so you would have to track the entire query, possibly with a browser toolbar that you can click a thumbs up/down for to communicate back with the search engine.

As regards the P2P aspect, I&#039;m going to put some thoughts together (since I can&#039;t read what you wrote) and put a post on my blog.

Mmm...plain links only:
&lt;a href=&quot;http://www.bmannconsulting.com/node.php?id=342&quot;&gt;http://www.bmannconsulting.com/node.php?id=342&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Ditto Evan&#8217;s comment. Although I would read it if it were a PDF as well :p</p>
<p>Also, Google partially uses this in their algorithm as well &#8212; if a search for &#8220;ottawa baseball&#8221; has people consistently clicking on the third link result, it gets higher rankings.</p>
<p>There might be a way to refine this, in that Google doesn&#8217;t necessarily know that you &#8220;found what you were looking for&#8221; when clicking on the third link, so you would have to track the entire query, possibly with a browser toolbar that you can click a thumbs up/down for to communicate back with the search engine.</p>
<p>As regards the P2P aspect, I&#8217;m going to put some thoughts together (since I can&#8217;t read what you wrote) and put a post on my blog.</p>
<p>Mmm&#8230;plain links only:<br />
<a href="http://www.bmannconsulting.com/node.php?id=342">http://www.bmannconsulting.com/node.php?id=342</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brendon J. Wilson</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-66</link>
		<dc:creator>Brendon J. Wilson</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-66</guid>
		<description>I&#039;ve updated the entry with a link to a HTML version of the document, for the M$ haters out there.

One more note I should add: I know that the problem of indexing the &quot;deep web&quot; was being tackled by the Gnutella guys working with Infrasearch - which eventually got bought by Sun and integrated into the JXTA Search team. Just thought I&#039;d mention it.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve updated the entry with a link to a HTML version of the document, for the M$ haters out there.</p>
<p>One more note I should add: I know that the problem of indexing the &#8220;deep web&#8221; was being tackled by the Gnutella guys working with Infrasearch &#8211; which eventually got bought by Sun and integrated into the JXTA Search team. Just thought I&#8217;d mention it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brendon J. Wilson</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-67</link>
		<dc:creator>Brendon J. Wilson</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-67</guid>
		<description>Boris: I&#039;m not certain that Google uses that technique, at least not anymore. If you do a &quot;view source&quot; on a search results page, you&#039;ll see that all the search result links are normal links, not links that go to a redirect script on Google&#039;s site. So, it would appear they have no way to track whether you click through search results.

That said, they may be doing that kind of tracking with the Google Toolbar - or they could do it fairly easily. Of course, if would require more back end horsepower to track user activities if they did.</description>
		<content:encoded><![CDATA[<p>Boris: I&#8217;m not certain that Google uses that technique, at least not anymore. If you do a &#8220;view source&#8221; on a search results page, you&#8217;ll see that all the search result links are normal links, not links that go to a redirect script on Google&#8217;s site. So, it would appear they have no way to track whether you click through search results.</p>
<p>That said, they may be doing that kind of tracking with the Google Toolbar &#8211; or they could do it fairly easily. Of course, if would require more back end horsepower to track user activities if they did.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Evan Wise</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-68</link>
		<dc:creator>Evan Wise</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-68</guid>
		<description>I am not an M$ hater. I am a hater of the assumption that everyone uses MS Office/MS Media Player/etc and thus the slow creep of MS owning the net continues; sorry, comes to a close.</description>
		<content:encoded><![CDATA[<p>I am not an M$ hater. I am a hater of the assumption that everyone uses MS Office/MS Media Player/etc and thus the slow creep of MS owning the net continues; sorry, comes to a close.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan Reilly</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-69</link>
		<dc:creator>Ryan Reilly</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-69</guid>
		<description>Your underlying assumption is that people search for the same things.  Although this sounds reasonable, I&#039;m not sure that it&#039;s true.  In certain cases, clearly this holds.  For example, &quot;What movies are playing now?&quot;, &quot;How does the Canon Powershot G3 compare?&quot;, etc.  Yet these are not the cases that search engines have difficulty with.

Today, I wanted to find information about entry and exit technique for kayaking on rocky coastlines.  Googling for this information was very difficult, and required a great deal of &quot;-class -course -tour&quot; type terms.  The likelihood of anyone else searching for this information, however, is very small, especially within a short time period.

Like any other branch of computer science, the shortcomings of search engines are in the special cases; the finding of information that is rarely accessed.  Google rarely fails me, but when it does, I&#039;m looking for something off the beaten path.</description>
		<content:encoded><![CDATA[<p>Your underlying assumption is that people search for the same things.  Although this sounds reasonable, I&#8217;m not sure that it&#8217;s true.  In certain cases, clearly this holds.  For example, &#8220;What movies are playing now?&#8221;, &#8220;How does the Canon Powershot G3 compare?&#8221;, etc.  Yet these are not the cases that search engines have difficulty with.</p>
<p>Today, I wanted to find information about entry and exit technique for kayaking on rocky coastlines.  Googling for this information was very difficult, and required a great deal of &#8220;-class -course -tour&#8221; type terms.  The likelihood of anyone else searching for this information, however, is very small, especially within a short time period.</p>
<p>Like any other branch of computer science, the shortcomings of search engines are in the special cases; the finding of information that is rarely accessed.  Google rarely fails me, but when it does, I&#8217;m looking for something off the beaten path.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brendon J. Wilson</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-70</link>
		<dc:creator>Brendon J. Wilson</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-70</guid>
		<description>That&#039;s a good point Ryan - the system is based on the assumption that people mean the same thing when they enter a number of keywords. This is exactly the reason Google fails for obscure searches, because no one has helped Google build that knowledge by linking to that information in that context (ie: no one&#039;s thrown up a page on &quot;kayak entry and exit techniques&quot; and linked to a bunch of pages). Part of the problem also lies in the fact that Google indexes everything in a page, so if it&#039;s a popular site, updated often, and includes those words, it&#039;ll be returned as a result - hence the problem Google is facing with indexing blogs.

The problem is that the feedback signal to Google is extremely attenuated. When someone, such as yourself, finally finds a piece of information by picking through Google&#039;s results and refining their search, that knowledge (&quot;this is what I was looking for when I said &#039;entry exit technique kayak&#039;) is lost. Google throws away that knowledge, even though it could be tracked and incorporated into its database of knowledge.

Ryan&#039;s observation implies additional required functionality: a method for allowing users to refine and sort through search results in a simple fashion, and augmenting the meta-data with the refinements. Basically this would say &quot;hey, when someone searches for &#039;exit entry technique kayak&#039; exclude pages that match the subsearch &#039;class course tour&#039;. 

This idea reminds me of another proposal. Currently you can add a set of meta-tags to your web pages to guide search engines on what your page is about (a set of keywords and descriptions). This is failure-prone, as people have an incentive to lie in their keywords and descriptions. However, what if you could include a meta-tag giving a list of key words for which your site was not a match? This would allow you to tell a search engine &quot;yeah, my page is about kayaking, but I don&#039;t cover exit and entry techniques. Sorry!&quot;</description>
		<content:encoded><![CDATA[<p>That&#8217;s a good point Ryan &#8211; the system is based on the assumption that people mean the same thing when they enter a number of keywords. This is exactly the reason Google fails for obscure searches, because no one has helped Google build that knowledge by linking to that information in that context (ie: no one&#8217;s thrown up a page on &#8220;kayak entry and exit techniques&#8221; and linked to a bunch of pages). Part of the problem also lies in the fact that Google indexes everything in a page, so if it&#8217;s a popular site, updated often, and includes those words, it&#8217;ll be returned as a result &#8211; hence the problem Google is facing with indexing blogs.</p>
<p>The problem is that the feedback signal to Google is extremely attenuated. When someone, such as yourself, finally finds a piece of information by picking through Google&#8217;s results and refining their search, that knowledge (&#8220;this is what I was looking for when I said &#8216;entry exit technique kayak&#8217;) is lost. Google throws away that knowledge, even though it could be tracked and incorporated into its database of knowledge.</p>
<p>Ryan&#8217;s observation implies additional required functionality: a method for allowing users to refine and sort through search results in a simple fashion, and augmenting the meta-data with the refinements. Basically this would say &#8220;hey, when someone searches for &#8216;exit entry technique kayak&#8217; exclude pages that match the subsearch &#8216;class course tour&#8217;. </p>
<p>This idea reminds me of another proposal. Currently you can add a set of meta-tags to your web pages to guide search engines on what your page is about (a set of keywords and descriptions). This is failure-prone, as people have an incentive to lie in their keywords and descriptions. However, what if you could include a meta-tag giving a list of key words for which your site was not a match? This would allow you to tell a search engine &#8220;yeah, my page is about kayaking, but I don&#8217;t cover exit and entry techniques. Sorry!&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Boris Mann</title>
		<link>http://www.brendonwilson.com/blog/2003/06/25/p2p-search-engine/comment-page-1/#comment-71</link>
		<dc:creator>Boris Mann</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.brendonwilson.com/blog/?p=100#comment-71</guid>
		<description>The negative keywords is an interesting concept...but it has been my experience that people do not optimize their sites to not be found.

As for meta-tags...they have not been used by search engines for years now, precisely because of the &quot;gaming&quot; of them.

Link tracking: I was actually thinking of the &quot;sponsored links&quot; that appear on the right hand side. You&#039;re right on the &quot;main&quot; links. Yes, they do do some of that tracking with the Google toolbar if you accepted the &quot;we will track you&quot; option on install.</description>
		<content:encoded><![CDATA[<p>The negative keywords is an interesting concept&#8230;but it has been my experience that people do not optimize their sites to not be found.</p>
<p>As for meta-tags&#8230;they have not been used by search engines for years now, precisely because of the &#8220;gaming&#8221; of them.</p>
<p>Link tracking: I was actually thinking of the &#8220;sponsored links&#8221; that appear on the right hand side. You&#8217;re right on the &#8220;main&#8221; links. Yes, they do do some of that tracking with the Google toolbar if you accepted the &#8220;we will track you&#8221; option on install.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
