Good Recommendations

Posted: February 7th, 2009 | Author: David | Filed under: Computer Science | Tags: , , , | Comments Off

A recommendation can said to be advice about some Thing based up on an Advisor’s prior experience of the Thing, knowledge of the wider Domain, and knowledge about an Advisee. I’ve home-brewed that definition from a variety of dictionary sources, but I’m hoping it doesn’t push the levels of acceptability too far. It suits what is to follow quite well, and I’ve even drawn a diagram:

From personal experience, people generally recommend things that they know a bit about and won’t often recommend things they don’t personally like. If the assumption holds that an Advisor will have better knowledge about the Things they like, recommendations should be best made about things the advisor personally rates.

From this we can approximate ‘things that people like’ to equal ‘things they are likely to recommend’.

Knowledge about the preferences of the Advisee matters too. I’m more likely to provide a well received recommendation to someone I know than someone I don’t. In this regard, I might also be able to provide a recommendation about some Thing I don’t necessarily like; although my knowledge of the Thing is likely to be more limited.

It follows that the more knowledge the Advisor has about the Thing, and about the wider Domain, and about the Advisee, the better the recommendation the Advisor will probably make.

Are recommendations best made from an Advisor who knows more about the Thing or the Domain than the Advisee? Most people I know don’t like being told things they already know, but self affirmation is nice sometimes.

Is an Advisee with only a small amount in common with the Advisor more likely to receive a recommendation less in line with their current preferences, but one that may be more interesting as a consequence? Conversely, is the Advisee with a large amount in common with the Advisor more likely to receive a recommendation in line with their current preferences, but is likely to be more obvious as a consequence? How much does the variety / specialism of the Advisors and Advisee’s current preferences matter?

It seems there is potential for a sliding scale between interesting and obvious recommendations, both of which may be good for different reasons.

A ‘good’ recommendation depends entirely on the Advisees expectation of the type of recommendation anticipated from the Advisor. How much can this be inferred by size of and the variation within the Domain shown in their initial preferences? It could well be that an Advisee that already shows more variety in their current preferences will be more ‘willing’ to accept off-kilter recommendations than one which is already more specialist. But it could also be that an Advisor who exhibits similar variation in their preferences to their Advisees will also make more acceptable recommendations.

We can infer that recommendations containing a good balance of interestingness while in keeping with existing preferences, are best made when an Advisee has a good proportion of current preferences in common with an Advisor. But should this be relative to their own preferences, or relative to their Advisor’s preferences? And to what extent does the variation of Domain preferences matter?

In the next post I’ll be introducing Set Theory as a mechanism for analysing these relationships.


Evaluating Feeds

Posted: February 2nd, 2009 | Author: David | Filed under: Programming | Tags: , , , , , , , | Comments Off

A not so uncommon situation I’m finding is that a website will have more than one feed associated with it. This is sometimes just to point to alternative markup (e.g. different versions of RSS spec, or a site offering both RSS and Atom feeds, or combinations thereof), or to hook up with feed aggregation services (Feedburner easily being the most prevalent), but the content of the feed can also sometimes be quite different.

Initially, I had made the crude assumption that for me, RSS is more useful than Atom (as I had written a very lightweight RSS parser). Now that I’m incorporating the ROME Java API for feed processing, I’m not so bothered about the choice of tech, or the spec of that tech, but I am quite interested in hooking up with the best feed for my purposes. I also don’t want to have to approve a few hundred feeds manually.

So what’s the best feed for my purposes? Assuming that these feeds are concerning the same subjects (i.e. new posts to the blog), then the best purpose feed is most likely going to be the one with the most content.

A really simple algorithm for deriving the feed with the most content

The first task is to pre-process the content of each feed to determine a value for the content of each post of each feed, measured by the number of words in the description and the largest number of words in each representation of the post content, once all markup has been removed.

We’re then left with a representation of feeds to lists of word counts for relative posts, such as:

feed1..n → { wordspost1, wordspost2, .. wordspostx }

Since the number of posts in each feed could vary (and the number of posts a feed covers shouldn’t be a discriminating factor), we take the minimum length of all the word count lists, and sum the word counts within that range for each feed. We can then select the feed which has the highest word count as the preferred feed to use.

This method assumes that the feed entries are in the same order and about the same posts in each feed, on the basis that each feed is most likely to originate from the same blog management system and therefore either dynamically produced, or published at a similar time.