<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sleepydisco &#187; Programming</title>
	<atom:link href="http://www.sleepydisco.com/category/programming/feed" rel="self" type="application/rss+xml" />
	<link>http://www.sleepydisco.com</link>
	<description>A blog about technology, music, food and photography.</description>
	<lastBuildDate>Sun, 18 Apr 2010 20:53:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Documenting HttpServletRequest</title>
		<link>http://www.sleepydisco.com/programming/documenting-httpservletrequest</link>
		<comments>http://www.sleepydisco.com/programming/documenting-httpservletrequest#comments</comments>
		<pubDate>Sun, 20 Dec 2009 14:27:29 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[httpservletrequest]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[reference]]></category>

		<guid isPermaLink="false">http://www.sleepydisco.com/?p=201</guid>
		<description><![CDATA[For such a core object the HttpServletRequest javadoc is a little lacking in the Examples Dept. when it comes to documented output. With various methods returning various parts of URLs, it&#8217;s often easy to pick the wrong one, so I thought I&#8217;d knock up a little table with the getters which trip me up sometimes.
This [...]]]></description>
			<content:encoded><![CDATA[<p>For such a core object the <a title="HttpServletRequest javadoc" href="http://java.sun.com/webservices/docs/1.6/api/javax/servlet/http/HttpServletRequest.html">HttpServletRequest javadoc</a> is a little lacking in the Examples Dept. when it comes to documented output. With various methods returning various parts of URLs, it&#8217;s often easy to pick the wrong one, so I thought I&#8217;d knock up a little table with the getters which trip me up sometimes.<span id="more-201"></span></p>
<p>This assumes a simple webapp, living on a server on the local machine, sitting under the context called &#8216;context&#8217;, responding to the URL:</p>
<pre>http://localhost:8080/context/hello/world?foo=bar</pre>
<table>
<thead>
<tr>
<th>Method</th>
<th>Response</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>getContextPath()</td>
<td>/context</td>
<td>The context part of the URL. Should be obvious.</td>
</tr>
<tr>
<td>getPathInfo()</td>
<td>/hello/world</td>
<td>The part of the URL after the context, but not including the query string.</td>
</tr>
<tr>
<td>getPathTranslated()</td>
<td>/Users/david/projects/java/hello-world-webapp/target/context/hello/world</td>
<td>Hrm. Sort of where this would be on the local filesystem, but not really.</td>
</tr>
<tr>
<td>getProtocol()</td>
<td>HTTP/1.1</td>
<td>This doesn&#8217;t produce anything verbatum from the URL, such as &#8216;http&#8217; or &#8216;https&#8217; or &#8216;ftp&#8217; or&#8230;</td>
</tr>
<tr>
<td>getQueryString()</td>
<td>foo=bar</td>
<td>Like it says, the query string.</td>
</tr>
<tr>
<td>getRequestURI()</td>
<td>/context/hello/world</td>
<td>Everything from but not including the port, up to but not including the query string.</td>
</tr>
<tr>
<td>getRequestURL()</td>
<td>http://localhost:8080/context/hello/world</td>
<td>This is a StringBuffer object, containing everything but the query string.</td>
</tr>
<tr>
<td>getServerName()</td>
<td>localhost</td>
<td>The server name as presented in the URL, not the hostname of the box.</td>
</tr>
<tr>
<td>getServerPort()</td>
<td>8080</td>
<td>As expected. Good, good.</td>
</tr>
<tr>
<td>getServletPath()</td>
<td></td>
<td>This is the path that the servlet is configured (e.g. in web.xml) to respond to relative to the context. It&#8217;ll be an empty-string if, as here, this is in response to a wildcard mapping &#8220;/*&#8221;.</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://www.sleepydisco.com/programming/documenting-httpservletrequest/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>URL connection timeouts&#8230; not timing out</title>
		<link>http://www.sleepydisco.com/programming/url-connection-timeouts-not-timing-out</link>
		<comments>http://www.sleepydisco.com/programming/url-connection-timeouts-not-timing-out#comments</comments>
		<pubDate>Tue, 14 Apr 2009 21:48:17 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[url java apache http]]></category>

		<guid isPermaLink="false">http://www.sleepydisco.com/?p=198</guid>
		<description><![CDATA[After much fussing about, I&#8217;ve found out that the default UrlConnection object in the core Java libraries doesn&#8217;t correctly obey connect or read timeouts set via the setConnectTimeout() or setReadTimeout() methods. The only way I&#8217;ve managed to get this to work, is to add the following arguments to the JVM.
-Dsun.net.client.defaultConnectTimeout=&#60;CONNECT_TIMEOUT&#62;
-Dsun.net.client.defaultReadTimeout=&#60;READ_TIMEOUT&#62;
In future, I&#8217;d be more inclined [...]]]></description>
			<content:encoded><![CDATA[<p>After much fussing about, I&#8217;ve found out that the default <a title="java.net.URLConnection" href="http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLConnection.html">UrlConnection</a> object in the core Java libraries doesn&#8217;t correctly obey connect or read timeouts set via the setConnectTimeout() or setReadTimeout() methods. The only way I&#8217;ve managed to get this to work, is to add the following arguments to the JVM.<span id="more-198"></span></p>
<pre>-Dsun.net.client.defaultConnectTimeout=&lt;CONNECT_TIMEOUT&gt;
-Dsun.net.client.defaultReadTimeout=&lt;READ_TIMEOUT&gt;</pre>
<p>In future, I&#8217;d be more inclined to use something like the <a title="Apache HttpCommons Client" href="http://hc.apache.org/httpcomponents-client/index.html">Apache HttpCommons Client</a> which provides a full-featured HTTP client.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sleepydisco.com/programming/url-connection-timeouts-not-timing-out/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Excluding log4j dependencies</title>
		<link>http://www.sleepydisco.com/programming/excluding-log4j-dependencies</link>
		<comments>http://www.sleepydisco.com/programming/excluding-log4j-dependencies#comments</comments>
		<pubDate>Mon, 02 Feb 2009 23:22:13 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[log4j]]></category>
		<category><![CDATA[maven]]></category>
		<category><![CDATA[notes]]></category>

		<guid isPermaLink="false">http://www.sleepydisco.com/?p=133</guid>
		<description><![CDATA[While it&#8217;s noted in a few other places that there can be issues with the small world of unnecessary transient Sun dependencies that log4j pulls in to a Maven 2 project, and that the easiest thing is to exclude them; I thought I&#8217;d get it down here too, as trouble has again come a knocking [...]]]></description>
			<content:encoded><![CDATA[<p>While it&#8217;s noted in a <a title="Maven2 Log4J and JMX dependencies" href="http://onemanwenttomow.wordpress.com/2007/12/31/maven2-log4j-and-jmx-dependencies/" target="_blank">few other places</a> that there can be issues with the small world of unnecessary transient Sun dependencies that log4j pulls in to a Maven 2 project, and that the easiest thing is to exclude them; I thought I&#8217;d get it down here too, as trouble has again come a knocking at my door with this particular hat on.</p>
<pre>&lt;dependency&gt;
    &lt;groupId&gt;log4j&lt;/groupId&gt;
    &lt;artifactId&gt;log4j&lt;/artifactId&gt;
    &lt;version&gt;1.2.15&lt;/version&gt;
    &lt;optional&gt;false&lt;/optional&gt;
    &lt;exclusions&gt;
        &lt;exclusion&gt;
            &lt;groupId&gt;com.sun.jdmk&lt;/groupId&gt;
            &lt;artifactId&gt;jmxtools&lt;/artifactId&gt;
        &lt;/exclusion&gt;
        &lt;exclusion&gt;
            &lt;groupId&gt;com.sun.jmx&lt;/groupId&gt;
            &lt;artifactId&gt;jmxri&lt;/artifactId&gt;
        &lt;/exclusion&gt;
        &lt;exclusion&gt;
            &lt;groupId&gt;javax.jms&lt;/groupId&gt;
            &lt;artifactId&gt;jms&lt;/artifactId&gt;
        &lt;/exclusion&gt;
    &lt;/exclusions&gt;
&lt;/dependency&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.sleepydisco.com/programming/excluding-log4j-dependencies/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evaluating Feeds</title>
		<link>http://www.sleepydisco.com/programming/evaluating-feeds</link>
		<comments>http://www.sleepydisco.com/programming/evaluating-feeds#comments</comments>
		<pubDate>Mon, 02 Feb 2009 21:05:29 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[atom]]></category>
		<category><![CDATA[feeds]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[rome]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://www.sleepydisco.com/?p=124</guid>
		<description><![CDATA[A not so uncommon situation I&#8217;m finding is that a website will have more than one feed associated with it. This is sometimes just to point to alternative markup (e.g. different versions of RSS spec, or a site offering both RSS and Atom feeds, or combinations thereof), or to hook up with feed aggregation services [...]]]></description>
			<content:encoded><![CDATA[<p>A not so uncommon situation I&#8217;m finding is that a website will have more than one feed associated with it. This is sometimes just to point to alternative markup (e.g. different versions of RSS spec, or a site offering both RSS and Atom feeds, or combinations thereof), or to hook up with feed aggregation services (<a title="Feedburner: Feed aggregation" href="http://www.feedburner.com/" target="_blank">Feedburner</a> easily being the most prevalent), but the content of the feed can also sometimes be quite different.</p>
<p>Initially, I had made the crude assumption that for me, RSS is more useful than Atom (as I had written a very lightweight RSS parser). Now that I&#8217;m incorporating the <a title="ROME Project: Java APIs for feed processing" href="https://rome.dev.java.net/" target="_blank">ROME Java API</a> for feed processing, I&#8217;m not so bothered about the choice of tech, or the spec of that tech, but I am quite interested in hooking up with the best feed for my purposes. I also don&#8217;t want to have to approve a few hundred feeds manually.</p>
<p>So what&#8217;s the best feed for my purposes? Assuming that these feeds are concerning the same subjects (i.e. new posts to the blog), then the best purpose feed is most likely going to be the one with the most content.</p>
<h3>A really simple algorithm for deriving the feed with the most content</h3>
<p>The first task is to pre-process the content of each feed to determine a value for the content of each post of each feed, measured by the number of words in the description and the largest number of words in each representation of the post content, once all markup has been removed.</p>
<p>We&#8217;re then left with a representation of feeds to lists of word counts for relative posts, such as:</p>
<p>feed<sub>1..n</sub> → { words<sub>post1</sub>, words<sub>post2</sub>, .. words<sub>post<sub>x</sub></sub> }</p>
<p>Since the number of posts in each feed could vary (and the number of posts a feed covers shouldn&#8217;t be a discriminating factor), we take the minimum length of all the word count lists, and sum the word counts within that range for each feed. We can then select the feed which has the highest word count as the preferred feed to use.</p>
<p>This method assumes that the feed entries are in the same order and about the same posts in each feed, on the basis that each feed is most likely to originate from the same blog management system and therefore either dynamically produced, or published at a similar time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sleepydisco.com/programming/evaluating-feeds/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
