About this time last year, I was busy putting the finishing touches to a data harvesting program which would go off to the internet and grab posts from music blogs when notified of updates via a feed. The motivation was my MSc Computer Science project, at the time untitled, and without much of a plan or a direction to go in. I knew I wanted to do something related to music, and probably to do with recommendations, with a view to creating a hopefully fresh take at how content can be discovered from editorially subjective sources, rather than behavioural sources such as playlists. Read the rest of this entry »
This post is a continuation of Good Recommendations, Using Set Theory to analyse Recommendation relationships, Variation within Preferences and Predicting Preferences.
I’ve offered the idea that good recommendations lie on a scale between Obvious and Interesting. Taken to their extreme, the full line could actually run from Boring through to Random, with Obvious and Interesting somewhere in between. ‘Boring’ recommendations could be said to exist where the Advisor’s Preference Set is entirely made up of the Common Set (i.e. the Advisee knows at least as much as the Advisor). ‘Random’ recommendations could be said to exist where there is no Common Set at all (i.e. A disjoint B )
I’ve also attempted to explain my thoughts on the kinds of relationships that could exist between an Advisor and an Advisee, and ways in which they could be elaborated. I think I’ve done that, albeit in a not very scientific way. So there’s a lot of room for refinement, and there are some gaps to be filled (specifically around calculating variation within a set of preferences and analysing effects of different weightings of those preferences) and some of my assumptions are a little more tenuous than maybe they should be, but I think this could provide the basis for some interesting results.
I’m also keen to explore the possibility that subjective relationships (rather than behavioural relationships) between Things would produce better/more interesting recommendations and routes for discovery. For my MSc project, I’m intending to focus on the Music Domain, analysing music blogs to deduce relationships between artists to augment recommendations from existing services such as last.fm.
This post is a continuation of Good Recommendations, Using Set Theory to analyse Recommendation relationships and Variation within Preferences.
Does all of this just mean that the underlying rule is to pair the Advisee to the Advisor with biggest Preference Set? And how does that relate to what was concluded earlier about the similarity of the two sets; the Jaccard co-effient. Taking an extreme scenario illustrated here:
Advisor (A) has a Preference Set of a large order of magnitude greater than the Advisee (B). Assuming that there’s a similar variance within the two sets, and that the previous assertions were correct, then this would clearly be approaching a Utopian case.
In reality we would find it difficult to assume so much about such a small sample relative to the larger one. It would be much better if we could transform the smaller set in to a larger sample on the basis of predicted future recommendations. Such as in the following diagram, where the dashed line shows the expansion of the Advisee’s original Preference Set to the Advisee’s Predicted Preference Set (B).
However, this makes the assumption that growth would be uniformly distributed out from the current Preference Set, and that the coupling of Advisor (A) with Advisee (B) was a good one to begin with and one that holds throughout the introduction of future recommendations.
My assumption would be that the variation within the Preference Sets holds the solution for this. Taking the difference in variation in the Common Set (A∩B) and the Recommendation Set (A—B) as ‘pull’ factors towards the Advisor and the difference in variation between the Common Set (A∩B) and B—A as ‘push’ factors away from the Advisor, we could infer growth of the Common Set with a the centre weighted towards or away from the Advisee. A greater pull could be shown as in the following diagram:
Whilst a greater push could be illustrated as:
The Jaccard co-effient could then be applied to work out the predicted similarity between the two sets. And from this, predict whether the original relationship is likely to provide more interesting, average, or more obvious recommendations.
Formally this can be represented as:
A∩PPS(B) / A∪PPS(B)
where: PPS(X) = weighted Predicted Preference Set of X
Summary: Between Obvious and Interesting
This post is a continuation of Good Recommendations and Using Set Theory to analyse Recommendation Relationships.
Until now, I’ve kept the assumption that both the Advisor’s and Advisee’s Preference Sets exhibit a similar level of variation within the Domain. That is to say that any member of the Recommendation Set would be as likely to be as good a recommendation as any other member, which is clearly unrealistic. The following diagram shows quadrants on a cartesian plane that show the relationships between increasing variety of a Preference Set within a Domain (from specialist to generalist), and the size of a preference set.
A more specialist Preference Set is one that contains more elements from within a particular sub division (e.g. Genre or sub-Domain) of the Domain than a more generalist Preference Set.
I am interested in the influence that each of these groupings is likely to have on the other, assuming a constant proportion of commonality between them. I would offer that a specialist Preference Set is less likely to receive good recommendations from a generalist Preference Set, but a smaller specialist Preference Set is more likely to receive good recommendations from a larger Preference Set of the same specialism.
I would also suggest that a larger generalist Preference Set would be more likely to receive a good recommendation from a specialist Preference Set of any size, than a smaller generalist Preference Set.
The following matrix summarises my assumptions about the suitability of each group to produce a good recommendation to another. (• indicates a good match between groups).
|Niche Expert||Niche Novice||Domain Expert||Domain Novice|
|Advisee (B)||Niche Expert||•||-||-||-|
From this, a good recommendation is more likely in the case where the following holds true:
v(B) ≥ v(A) and n(A) ≥ n(B)
v(X) = variation of set X
n(X) = number of members of set X
That is to say the recommendation will be better received when the Advisee is more flexible about what he likes within the Domain, and the Advisor knows more about the Domain than the Advisee. Which also seems a bit of a no brainer when put like that.
Variation needs to indicate the range of member groupings within a set. This is problematic as we don’t necessarily know what all the groups are, how their boundaries lie, and where to allocate members as there may be many levels of sub-Domains or Genres within a Domain. I will make an assumption that there is some method of calculating variation within a set through some other means of set member classification (e.g. Naive Bayes, k-Nearest Neighbour, or otherwise).
In the next post, I’ll examine the possibility of predicting Preference Sets as a method to normalize preferences for comparative analysis.
This is a continuation of the post Good Recommendations.
Recommendations are found in the extension of the preferences of an Advisee to those of an Advisor, where the Advisee and Advisor share some known common interest. I’ll call someone’s existing preferences their Preference Set, and the shared preferences between Advisee and Advisor the Common Set.
The following diagram shows the Preference Sets of an Advisor (A) and an Advisee (B).
The total knowledge about Things they currently like is expressed within the union of these two Preference Sets (A∪B). Note that they are equally sized, indicating equivalent Domain knowledge, and for now we’ll assume that each preference within the set has equal significance, and that the two sets exhibit equal variation. It’s also important to state that A∪B does not represent total knowledge of all Things in the Domain. From this we can also state that no single Advisor has total knowledge of every Thing that could possibly be recommended, and no Advisee already has knowledge of every Thing they could possibly like.
The intersection of A and B (A∩B) indicates the Common Set of things that A and B both know and like. A minus B (A−B) is the subset of things where A’s recommendations to B would be found. I’ll call this the Recommendation Set. B minus A (B−A) is the subset of things A doesn’t know about, or wouldn’t recommend to B as he hasn’t already expressed a preference for.
This case represents a balanced proportion of knowledge about the Advisee’s Preference Set, and ability to recommend something the Advisor doesn’t already know.
The next diagram illustrates the case where there is a limited Common Set of preferences. It also shows the potential for a wider range of recommendations,.
The following diagram shows the opposite case where there is a large overlap of preferences, but a limited scope for recommendation.
So far, we’ve established that the scope for recommendation is tied to the size of the Common Set (A∩B) in proportion to the size of the Preference Sets (A∪B). In other words, the similarity of the Advisor and Advisee. This is more commonly represented by a statistic known as the Jaccard co-efficient. However, I’ve also only been concerned with Preference Sets of equal size; whereas in reality they will most likely be different according to an individual’s Domain knowledge.
The next diagram represents the relationship between an Advisor (A) who has more Domain knowledge than the Advisee (B).
Whilst the Advisor does have a far larger Preference Set (n(A) > n(B)), it could be argued that this is also a good match given the relative size of the Common Set (A∩B) to the Advisee’s Preference Set (B) and the large Recommendation Set (A−B) from which to draw recommendations.
The opposite is illustrated by the following diagram.
Here the Advisor (A) has a smaller Preference Set than the Advisee (B), and a small Recommendation Set (A−B) from which to produce recommendations. This would appear to be a difficult scenario from which to produce anything other than obvious recommendations since the size of the Recommendation Set to the size of the Advisee’s Preference Set is so small. From these two scenarios we can deduce that the proportional sizes of the Advisor’s Recommendation Set and the Advisee’s Exclusive Set (B-A) is important.
In the next post I’ll be looking how the variation within a Preference Set affects the types of recommendations that can be drawn.