<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title></title>
	<atom:link href="http://www.craiget.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.craiget.com</link>
	<description>In which I write mostly about programming and that sort of thing</description>
	<lastBuildDate>Fri, 20 Aug 2010 17:32:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>A Little Python To Put Chartbeat in the Console</title>
		<link>http://www.craiget.com/2010/08/a-little-python-to-put-chartbeat-in-the-console/</link>
		<comments>http://www.craiget.com/2010/08/a-little-python-to-put-chartbeat-in-the-console/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 05:14:15 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=187</guid>
		<description><![CDATA[I can&#8217;t freakin&#8217; stop watching Chartbeat. So.. to reclaim a little bit of my day, I&#8217;ve thrown together a little python script so I can cast an eye on the two important numbers (total users and page load time) without keeping a browser window open. This just uses the Chartbeat JSON API to grab the [...]]]></description>
			<content:encoded><![CDATA[<p>I can&#8217;t freakin&#8217; stop watching <a href="http://chartbeat.com">Chartbeat</a>. So.. to reclaim a little bit of my day, I&#8217;ve thrown together a little python script so I can cast an eye on the two important numbers (total users and page load time) without keeping a browser window open.</p>
<p>This just uses the Chartbeat JSON API to grab the data and uses pycurses to keep it all on the same screen.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> json
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">curses</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
&nbsp;
host = <span style="color: #483d8b;">'HOSTNAME'</span> <span style="color: #808080; font-style: italic;"># your .com here, sans http://www.</span>
key = <span style="color: #483d8b;">'APIKEY'</span> <span style="color: #808080; font-style: italic;"># you gotta register an API key with Chartbeat first</span>
url = <span style="color: #483d8b;">'http://api.chartbeat.com/summize?host=%s&amp;apikey=%s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>host, key<span style="color: black;">&#41;</span>
&nbsp;
screen = <span style="color: #dc143c;">curses</span>.<span style="color: black;">initscr</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">curses</span>.<span style="color: black;">noecho</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">curses</span>.<span style="color: black;">curs_set</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
screen.<span style="color: black;">nodelay</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
screen.<span style="color: black;">keypad</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">try</span>:
	<span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
		x = screen.<span style="color: black;">getch</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
		<span style="color: #ff7700;font-weight:bold;">if</span> x == <span style="color: #008000;">ord</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'q'</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">break</span>
		page = <span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
		data = json.<span style="color: black;">loads</span><span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
		screen.<span style="color: black;">clear</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
		screen.<span style="color: black;">addstr</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>,<span style="color: #483d8b;">&quot;Visitors: %d&quot;</span><span style="color: #66cc66;">%</span>data<span style="color: black;">&#91;</span><span style="color: #483d8b;">'people'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
		screen.<span style="color: black;">addstr</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">0</span>,<span style="color: #483d8b;">&quot;Pageload: %.1f&quot;</span><span style="color: #66cc66;">%</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#91;</span><span style="color: #483d8b;">'domload'</span><span style="color: black;">&#93;</span>/<span style="color: #ff4500;">1000.0</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
		screen.<span style="color: black;">refresh</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
		<span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">600</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># sleep for 10 minutes</span>
<span style="color: #ff7700;font-weight:bold;">except</span>:
	<span style="color: #ff7700;font-weight:bold;">pass</span>
<span style="color: #ff7700;font-weight:bold;">finally</span>:
	<span style="color: #dc143c;">curses</span>.<span style="color: black;">nocbreak</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	screen.<span style="color: black;">keypad</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
	<span style="color: #dc143c;">curses</span>.<span style="color: black;">echo</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
	<span style="color: #dc143c;">curses</span>.<span style="color: black;">endwin</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/08/a-little-python-to-put-chartbeat-in-the-console/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick Note: Getting WordPress Post ID from URL</title>
		<link>http://www.craiget.com/2010/08/quick-note-getting-wordpress-post-id-from-url/</link>
		<comments>http://www.craiget.com/2010/08/quick-note-getting-wordpress-post-id-from-url/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 16:46:19 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=184</guid>
		<description><![CDATA[Took a bit of googling to find this handy function to convert a url into the corresponding WordPress Post ID, so I&#8217;m writing it down so I don&#8217;t forget. Not a ton of cases where this is useful because you&#8217;re usually inside the Loop and have the ID readily available. I&#8217;m using it as part [...]]]></description>
			<content:encoded><![CDATA[<p>Took a bit of googling to find this handy function to convert a url into the corresponding WordPress Post ID, so I&#8217;m writing it down so I don&#8217;t forget. Not a ton of cases where this is useful because you&#8217;re usually inside the Loop and have the ID readily available. I&#8217;m using it as part of a little <a href="http://chartbeat.com">http://chartbeat.com</a> plugin. Since the API can only give back the url to the page, you need this little gem to get the ID.</p>
<p>You know this functionality had to exist somewhere in WP because that&#8217;s what it does internally &#8211; route urls to post_ids, but sometimes it is tricky to figure out what it&#8217;s called..</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$post_id</span> <span style="color: #339933;">=</span> url_to_postid<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'http://example.com/path/to/page'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Thanks to <a href="http://www.tech-otaku.com/blogging/posts-id-posts-url-wordpress/">http://www.tech-otaku.com/blogging/posts-id-posts-url-wordpress/</a> for posting it first.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/08/quick-note-getting-wordpress-post-id-from-url/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mass Re-Indexing for the WordPress YARPP Plugin</title>
		<link>http://www.craiget.com/2010/07/mass-re-indexing-for-the-wordpress-yarpp-plugin/</link>
		<comments>http://www.craiget.com/2010/07/mass-re-indexing-for-the-wordpress-yarpp-plugin/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 16:05:05 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=180</guid>
		<description><![CDATA[On my buddy&#8217;s WordPress blog, we wanted to add a Related Posts feature to get people hooked after they read the article they came for. A quick search of the available plugins turned up YARPP (Yet Another Related Posts Plugin). The plugin is hugely configurable and it seems to do a great job of selecting [...]]]></description>
			<content:encoded><![CDATA[<p>On my buddy&#8217;s WordPress blog, we wanted to add a Related Posts feature to get people hooked after they read the article they came for. A quick search of the available plugins turned up <a href="http://wordpress.org/extend/plugins/yet-another-related-posts-plugin/">YARPP</a> (Yet Another Related Posts Plugin). The plugin is hugely configurable and it seems to do a great job of selecting relevant posts to display.</p>
<p>The one downside? It takes about 4 seconds to index a post.</p>
<p>Now, that&#8217;s not much, but when you have ~8,000 posts, that&#8217;s about <strong>9 hours</strong>! Digging in a bit, the plugin is very smart: caching aggressively and only doing the expensive calculation when a page is saved or the first time that someone looks at it (in the case of old posts). However, it would be even nicer to do all the calculations offline and just upload a full cache.</p>
<p>So here&#8217;s a function that does that. Create a new file in the wp-content/plugins/yet-another-related-posts-plugin/ folder. I run this through PHP on the command-line, otherwise you risk problems with timeouts:</p>
<p>(Note: I am using YARPP 3.1.8, your mileage may vary, use at your own risk, etc..)</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// Hook into WP so we can access the DB</span>
<span style="color: #b1b100;">include</span> <span style="color: #0000ff;">'../../../wp-blog-header.php'</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// Load all the YARPP functions</span>
<span style="color: #b1b100;">include</span> <span style="color: #0000ff;">'yarpp.php'</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// Let YARPP create tables, if they don't exist already</span>
yarpp_activate<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">//</span>
<span style="color: #000088;">$time_start</span> <span style="color: #339933;">=</span> <span style="color: #990000;">time</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$sql</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;SELECT ID FROM <span style="color: #006699; font-weight: bold;">$wpdb-&gt;posts</span> WHERE post_type='post' and post_status='publish' ORDER BY ID desc&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$ids</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$wpdb</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">get_col</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sql</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$c</span> <span style="color: #339933;">=</span> <span style="color: #990000;">count</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$ids</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">&lt;</span><span style="color: #000088;">$c</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000088;">$id</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$ids</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;<span style="color: #009933; font-weight: bold;">%d</span>/<span style="color: #009933; font-weight: bold;">%d</span><span style="color: #000099; font-weight: bold;">\t</span>ID: <span style="color: #009933; font-weight: bold;">%d</span><span style="color: #000099; font-weight: bold;">\t</span>ELAPSED: <span style="color: #009933; font-weight: bold;">%d</span><span style="color: #000099; font-weight: bold;">\t</span>REMAINING: <span style="color: #009933; font-weight: bold;">%d</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$i</span><span style="color: #339933;">,</span><span style="color: #000088;">$c</span><span style="color: #339933;">,</span><span style="color: #000088;">$id</span><span style="color: #339933;">,</span><span style="color: #000088;">$time_elapsed</span><span style="color: #339933;">,</span><span style="color: #000088;">$time_remaining</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">flush</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// this fn causes yarpp to compute relatedness for the post</span>
    yarpp_related<span style="color: #009900;">&#40;</span><span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'post'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span><span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">false</span><span style="color: #339933;">,</span><span style="color: #000088;">$id</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'website'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$time_elapsed</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #990000;">time</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> <span style="color: #000088;">$time_start</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$time_remaining</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$c</span><span style="color: #339933;">-</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">-</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$time_elapsed</span><span style="color: #339933;">/</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>If it finishes too early, try turning on <strong>error_reporting(E_ALL)</strong>. I found that I was running out of memory until adding <strong>ini_set(&#8216;memory_limit&#8217;, &#8217;512M&#8217;)</strong>.</p>
<p>And that&#8217;s about it.. Since it takes a looong time to run, I&#8217;ve added some logging so you can be sure it doesn&#8217;t get stuck, etc..</p>

<div class="wp_syntax"><div class="code"><pre class="sh" style="font-family:monospace;">...
3/10	ID: 45647	ELAPSED: 10	REMAINING: 35
4/10	ID: 45625	ELAPSED: 14	REMAINING: 28
5/10	ID: 45601	ELAPSED: 19	REMAINING: 23
6/10	ID: 45593	ELAPSED: 22	REMAINING: 17
7/10	ID: 45572	ELAPSED: 29	REMAINING: 14
8/10	ID: 45571	ELAPSED: 31	REMAINING: 8
9/10	ID: 45570	ELAPSED: 34	REMAINING: 4
...</pre></div></div>

<p>If you run this on your live server, then just activate the plugin and you are done. Otherwise, activate the plugin and upload the two tables (_yarpp_keyword_cache and _yarpp_related_cache) to your live server.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/07/mass-re-indexing-for-the-wordpress-yarpp-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Collaborative Filtering in Clojure, First Try</title>
		<link>http://www.craiget.com/2010/06/collaborative-filtering-in-clojure-first-try/</link>
		<comments>http://www.craiget.com/2010/06/collaborative-filtering-in-clojure-first-try/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 04:04:37 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=176</guid>
		<description><![CDATA[For reasons that are unclear, I would apparently rather spend the day screwing around with Clojure than working on my Android apps or finding freelance gigs. I *really* want to love Clojure because the blogs make it sound so cool once you know what you&#8217;re doing. But.. I have no idea what I&#8217;m doing.. Maybe [...]]]></description>
			<content:encoded><![CDATA[<p>For reasons that are unclear, I would apparently rather spend the day screwing around with Clojure than working on my Android apps or finding freelance gigs. I *really* want to love Clojure because the blogs make it sound so cool once you know what you&#8217;re doing.</p>
<p>But.. I have no idea what I&#8217;m doing.. Maybe I can learn?</p>
<p>The trouble with learning programming languages is that you need to find a problem that is just the right difficulty. If you choose a toy problem, you solve it too quickly without learning anything. If you choose a real nasty one, you&#8217;ll get hung up and become frustrated.</p>
<p>With that in mind, I&#8217;m gonna try implementing some algorithms from the rather delightful &#8220;<a href="http://oreilly.com/catalog/9780596529321">Programming Collective Intelligence</a>&#8221; in Clojure (the book uses Python) (also on <a href="http://books.google.com/books?id=fEsZ3Ey-Hq4C">Google Books</a>) (and <a href="http://blog.kiwitobes.com/?p=44">source code</a>). The examples in the book are well explained and include sample data, so you can be sure your implementation is at least getting the right answers. My hope is that by solving the problems, learning some more and then revisiting a few weeks later, I will eventually start writing better Clojure code.</p>
<p>So Chapter 3 of the book covers Collaborative Filtering. Briefly, it is a technique of using ratings to make suggestions. The notion makes intuitive sense: If I like &#8220;A, B and C&#8221; and you like &#8220;A and B&#8221;, there&#8217;s a good chance you&#8217;re gonna like &#8220;C&#8221; as well.</p>
<p>Ratings take the form of a Map of Maps:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;Person A&quot;</span>: <span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;Item 1&quot;</span>: <span style="color: #ff4500;">1.5</span>, <span style="color: #483d8b;">&quot;Item 2&quot;</span>: <span style="color: #ff4500;">2.5</span><span style="color: black;">&#125;</span>, 
 <span style="color: #483d8b;">&quot;Person B&quot;</span>: <span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;Item 2&quot;</span>: <span style="color: #ff4500;">3.5</span><span style="color: black;">&#125;</span><span style="color: black;">&#125;</span></pre></div></div>

<p>Additionally, the algorithm employs a similarity metric, which sets a numerical value on how closely two people&#8217;s ratings agree. Intuitively, you&#8217;d be more interested in my recommendations if we like the same things, so the similarity metric provides a way to gauge that.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">similarity<span style="color: black;">&#40;</span>ratings_a,ratings_b<span style="color: black;">&#41;</span> = <span style="color: #ff4500;">0.5</span></pre></div></div>

<p>Recommendations are the combination of all ratings, weighted by the similarity of the user who made the rating to the user we are making recommendations for.</p>
<p>At this point, it should be clear that I&#8217;m terrible at explaining things. Just check out Chapter 3 of <a href="http://oreilly.com/catalog/9780596529321">Programming Collective Intelligence</a> and all will become clear.</p>
<p>Anyhow, here is my first attempt at an implementation:</p>

<div class="wp_syntax"><div class="code"><pre class="clojure" style="font-family:monospace;">(use 'clojure.set)
&nbsp;
;;
;; Some utility functions
;;
&nbsp;
;; sum of squares of differences
(defn sum-of-squares [a b]
  (apply + (map (fn [x y] (Math/pow (- x y) 2)) a b)))
&nbsp;
;; book has typo in definition here, see errata
(defn inverse-sum-of-squares [a b]
  (/ 1 (+ 1 (Math/sqrt (sum-of-squares a b)))))
&nbsp;
;; items this person has rated 
(defn seen-by [db person]
  (into #{} (keys (get db person))))
&nbsp;
;; items person hasn't rated
(defn unseen-by [db name items]
  (difference items (set (map first (get db name)))))
&nbsp;
;; items both people have rated
(defn co-rated-items [db name-1 name-2]
  (intersection (set (keys (get db name-1))) (set (keys (get db name-2)))))
&nbsp;
;; ratings by name of co-rated items in alphabetical order
(defn co-ratings [db name corated]
  (vals (filter (fn [v] (contains? corated (first v))) (sort (get db name)))))
&nbsp;
;;
;; Distance metrics
;;
&nbsp;
;; Manhattan distance
(defn sim-distance [db name-1 name-2]
  (let [corated (co-rated-items db name-1 name-2)]
    (if (not corated) 0
	(inverse-sum-of-squares (co-ratings db name-1 corated)
				(co-ratings db name-2 corated)))))
&nbsp;
;; Pearson distance
;; book has a typo causing incorrect float division
(defn sim-pearson [db name-1 name-2]
  (let [corated (co-rated-items db name-1 name-2)]
    (if (not corated) 0
	(let [n (count corated)
	      ratings-1 (co-ratings db name-1 corated)
	      ratings-2 (co-ratings db name-2 corated)
	      sum-1 (apply + ratings-1)
	      sum-2 (apply + ratings-2)
	      sum-1-sq (apply + (map (fn [x] (* x x)) ratings-1))
	      sum-2-sq (apply + (map (fn [x] (* x x)) ratings-2))
	      psum (apply + (map (fn [x y] (* x y)) ratings-1 ratings-2))
	      num (- psum (/ (* sum-1 sum-2) n))
	      den (Math/sqrt (* (- sum-1-sq (/ (* sum-1 sum-1) n))
				(- sum-2-sq (/ (* sum-2 sum-2) n))))]
	  (if (= den 0) 0
	      (/ num den))))))
&nbsp;
;;
;; Recommendation algorithm
;;
&nbsp;
(defn total-sums [db other items sim]
  (reduce (fn [m item] (assoc m item (* (get (get db other) item) sim))) {} items))
&nbsp;
(defn sim-sums [items sim]
  (reduce (fn [m item] (assoc m item sim)) {} items))
&nbsp;
;; combines sum(rating*similarity) and sum(similarity) of all raters
(defn loop-totals [db others me metric]
  (loop [others others t {} s {}]
    (if (= (count others) 0) [t s]
	(let [other (first others)
	      unseen-items (unseen-by db me (seen-by db other))
	      sim (metric db me other)]
	  (if (&lt; sim 0)
	    (recur (rest others) t s)
	    (recur (rest others)
		   (merge-with + t (total-sums db other unseen-items sim))
		   (merge-with + s (sim-sums unseen-items sim))))))))
&nbsp;
;; generates recommendations of the form {rating: item}
(defn recommend [db me metric]
  (let [others (disj (set (map first db)) me)
	[my-totals my-sims] (loop-totals db others me metric)]
    (reverse (sort (map (fn [item] [(/ (get my-totals item) (get my-sims item)) item]) (set (keys my-totals)))))))
&nbsp;
;;
;; functions dealing with creation of DB
;;
&nbsp;
;; returns DB with a new rating added
(defn add-rating [db name item rating]
  (assoc db name (conj (get db name {item rating}) {item rating})))
&nbsp;
;; returns DB with a list of new ratings added
(defn add-ratings [db ratings]
  (reduce
   (fn [db [name item rating]]
     (add-rating db name item rating))
   db ratings))
&nbsp;
;; creates the DB used for examples in the book
(defn init-db []
  (add-ratings {}
	       [[&quot;Lisa Rose&quot; &quot;Lady in the Water&quot; 2.5]
		[&quot;Lisa Rose&quot; &quot;Snakes on a Plane&quot; 3.5]
		[&quot;Lisa Rose&quot; &quot;Just My Luck&quot; 3.0]
		[&quot;Lisa Rose&quot; &quot;Superman Returns&quot; 3.5]
		[&quot;Lisa Rose&quot; &quot;You, Me and Dupree&quot; 2.5]
		[&quot;Lisa Rose&quot; &quot;The Night Listener&quot; 3.0]
		[&quot;Gene Seymour&quot; &quot;Lady in the Water&quot; 3.0]
		[&quot;Gene Seymour&quot; &quot;Snakes on a Plane&quot; 3.5]
		[&quot;Gene Seymour&quot; &quot;Just My Luck&quot; 1.5]
		[&quot;Gene Seymour&quot; &quot;Superman Returns&quot; 5.0]
		[&quot;Gene Seymour&quot; &quot;The Night Listener&quot; 3.0]
		[&quot;Gene Seymour&quot; &quot;You, Me and Dupree&quot; 3.5]
		[&quot;Michael Phillips&quot; &quot;Lady in the Water&quot; 2.5]
		[&quot;Michael Phillips&quot; &quot;Snakes on a Plane&quot; 3.0]
		[&quot;Michael Phillips&quot; &quot;Superman Returns&quot; 3.5]
		[&quot;Michael Phillips&quot; &quot;The Night Listener&quot; 4.0]
		[&quot;Claudia Puig&quot; &quot;Snakes on a Plane&quot; 3.5]
		[&quot;Claudia Puig&quot; &quot;Just My Luck&quot; 3.0]
		[&quot;Claudia Puig&quot; &quot;The Night Listener&quot; 4.5]
		[&quot;Claudia Puig&quot; &quot;Superman Returns&quot; 4.0]
		[&quot;Claudia Puig&quot; &quot;You, Me and Dupree&quot; 2.5]
		[&quot;Mick LaSalle&quot; &quot;Lady in the Water&quot; 3.0]
		[&quot;Mick LaSalle&quot; &quot;Snakes on a Plane&quot; 4.0]
		[&quot;Mick LaSalle&quot; &quot;Just My Luck&quot; 2.0]
		[&quot;Mick LaSalle&quot; &quot;Superman Returns&quot; 3.0]
		[&quot;Mick LaSalle&quot; &quot;The Night Listener&quot; 3.0]
		[&quot;Mick LaSalle&quot; &quot;You, Me and Dupree&quot; 2.0]
		[&quot;Jack Matthews&quot; &quot;Lady in the Water&quot; 3.0]
		[&quot;Jack Matthews&quot; &quot;Snakes on a Plane&quot; 4.0]
		[&quot;Jack Matthews&quot; &quot;The Night Listener&quot; 3.0]
		[&quot;Jack Matthews&quot; &quot;Superman Returns&quot; 5.0]
		[&quot;Jack Matthews&quot; &quot;You, Me and Dupree&quot; 3.5]
		[&quot;Toby&quot; &quot;Snakes on a Plane&quot; 4.5]
		[&quot;Toby&quot; &quot;Superman Returns&quot; 4.0]
		[&quot;Toby&quot; &quot;You, Me and Dupree&quot; 1.0]]))</pre></div></div>

<p>Then to use it, you&#8217;d do something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="clojure" style="font-family:monospace;">(def *db* (init-db))
(recommend *db* &quot;Toby&quot; sim-distance)</pre></div></div>

<p>So it works, and that makes me reasonably happy.. though I hope to improve it in the future. Learning more of the core library should help.</p>
<p>Comments and suggestions are very welcome.</p>
<p>A few notes:</p>
<p>* Having never made a *serious* commitment to a REPL before, it was interesting to build up functions piece-at-a-time from the inside out.. I can dig that.</p>
<p>* I&#8217;m not too happy with the way computing sums-of-squares works where I&#8217;m figuring out which items are rated by the same people and then having to sort each person&#8217;s ratings by name to make sure they&#8217;re in the same order. sums-of-squares itself is cool, but the way I prepare the data feels weird. Not sure what would be better though..</p>
<p>* Also, there are a couple of typos in the version of the book I have that cause some of the numbers to come out differently. Not sure if they&#8217;ve been corrected in later editions. More info: <a href="http://oreilly.com/catalog/errataunconfirmed.csp?isbn=9780596529321">Unconfirmed Errata</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/06/collaborative-filtering-in-clojure-first-try/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Making My Own WordPress Chartbeat Plugin</title>
		<link>http://www.craiget.com/2010/06/making-my-own-wordpress-chartbeat-plugin/</link>
		<comments>http://www.craiget.com/2010/06/making-my-own-wordpress-chartbeat-plugin/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 23:32:21 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=175</guid>
		<description><![CDATA[Instead of doing something useful this morning, I made my own little plugin using the Chartbeat API to display the most popular posts on a WordPress blog. Note: There is really no reason to do this. The Chartbeat Plugin does this exact same thing and more. However, it was an entertaining exercise for me to [...]]]></description>
			<content:encoded><![CDATA[<p>Instead of doing something useful this morning, I made my own little plugin using the <a href="http://chartbeat.pbworks.com/">Chartbeat API</a> to display the most popular posts on a WordPress blog.</p>
<p><strong>Note</strong>: There is <strong>really</strong> no reason to do this. The <a href="http://wordpress.org/extend/plugins/chartbeat/">Chartbeat Plugin</a> does this exact same thing and more. However, it was an entertaining exercise for me to practice writing wordpress plugins.</p>
<p><strong>Also Note</strong>: This only works if you have signed up for <a href="http://chartbeat.com">Chartbeat</a> and get an API Key.</p>
<p>The reason this is cool? Well, most of your &#8220;most popular posts&#8221; plugins need to make an extra call to the database to get/set a counter because wordpress doesn&#8217;t track page views by default. But if you&#8217;re using chartbeat to track your blog&#8217;s performance, you can save some effort by using their numbers instead.</p>
<p>And with no further ado, here&#8217;s the code:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">/*
Plugin Name: Ct Most Popular
Plugin URI: http://www.craiget.com
Description: Display most viewed posts using the Chartbeat API, exposes one function: ct_most_popular_plugin_widget(); 
Version: 0.1
Author: Craige
Author URI: http://craiget.com
License: For example and testing purposes. Not suggested for use on a real site.
*/</span>
&nbsp;
<span style="color: #000088;">$ct_most_popular_plugin_version</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;0.1&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$ct_most_popular_plugin_data</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// create a most_popular option</span>
register_activation_hook<span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">__FILE__</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_install'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_install<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	add_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_data&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$ct_most_popular_data</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	add_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_version&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$ct_most_popular_plugin_version</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// schedule hourly update</span>
	wp_schedule_event<span style="color: #009900;">&#40;</span><span style="color: #990000;">time</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'hourly'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_update_event'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// delete the most_popular option</span>
register_deactivation_hook<span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">__FILE__</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_uninstall'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_uninstall<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	delete_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_data&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	delete_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_version&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// un-schedule hourly update</span>
	wp_clear_scheduled_hook<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin_update_event'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// appear under &quot;Settings&quot; on the admin page</span>
add_action<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'admin_menu'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_menu'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_menu<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	add_options_page<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Ct Most Popular'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'Ct Most Popular'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'manage_options'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">''</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_options'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// init option values in db</span>
add_action<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'admin_init'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_options_init'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_options_init<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
	register_setting<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin_options'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_validate'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// sanitize and validate input</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_validate<span style="color: #009900;">&#40;</span><span style="color: #000088;">$input</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'host'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span>  wp_filter_nohtml_kses<span style="color: #009900;">&#40;</span><span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'host'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'chartbeat_api_key'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span>  wp_filter_nohtml_kses<span style="color: #009900;">&#40;</span><span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'chartbeat_api_key'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span>  <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #000088;">$input</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">10</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$input</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// display options page html</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_options<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>current_user_can<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'manage_options'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>  <span style="color: #009900;">&#123;</span>
		wp_die<span style="color: #009900;">&#40;</span>__<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'You do not have sufficient permissions to access this page.'</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span>
&lt;div class=&quot;wrap&quot;&gt;
	&lt;h2&gt;Ct Most Popular Plugin Options Title&lt;/h2&gt;
	&lt;form method=&quot;post&quot; action=&quot;options.php&quot;&gt;
		<span style="color: #000000; font-weight: bold;">&lt;?php</span> settings_fields<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin_options'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
		<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #000088;">$options</span> <span style="color: #339933;">=</span> get_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
		&lt;table class=&quot;form-table&quot;&gt;
		&lt;tr valign=&quot;top&quot;&gt;
			&lt;th scope=&quot;row&quot;&gt;Host&lt;/th&gt;
			&lt;td&gt;&lt;input type=&quot;text&quot; name=&quot;ct_most_popular_plugin[host]&quot; value=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'host'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; /&gt;&lt;/td&gt;
			&lt;td&gt;&lt;i&gt;ie, example.com&lt;/i&gt;&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr valign=&quot;top&quot;&gt;
			&lt;th scope=&quot;row&quot;&gt;Chartbeat API Key&lt;/th&gt;
			&lt;td&gt;&lt;input type=&quot;text&quot; name=&quot;ct_most_popular_plugin[chartbeat_api_key]&quot; value=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'chartbeat_api_key'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; /&gt;&lt;/td&gt;
			&lt;td&gt;&lt;i&gt;&lt;a href=&quot;http://chartbeat.com/apikeys/&quot;&gt;http://chartbeat.com/apikeys/&lt;/a&gt;&lt;/i&gt;&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr valign=&quot;top&quot;&gt;
			&lt;th scope=&quot;row&quot;&gt;Limit&lt;/th&gt;
			&lt;td&gt;&lt;input type=&quot;text&quot; name=&quot;ct_most_popular_plugin[limit]&quot; value=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; /&gt;&lt;/td&gt;
			&lt;td&gt;&lt;i&gt;number of items to show, 10&lt;/i&gt;&lt;/td&gt;
		&lt;/tr&gt;
		&lt;/table&gt;
		&lt;p class=&quot;submit&quot;&gt;
			&lt;input type=&quot;submit&quot; class=&quot;button-primary&quot; value=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> _e<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Save Changes'</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot; /&gt;
		&lt;/p&gt;
		&lt;p&gt;
		This plugin uses the &lt;a href=&quot;http://chartbeat.pbworks.com/&quot;&gt;Chartbeat API&lt;/a&gt; to show the most popular pages on your site, updated hourly.
		&lt;/p&gt;
		&lt;p&gt;
		This plugin was created for my own amusement and to practice creating Wordpress plugins, it is &lt;strong&gt;NOT RECOMMENDED&lt;/strong&gt; for use.
		&lt;/p&gt;
		&lt;p&gt;
		Chartbeat has released a perfectly good plugin that does this and more: &lt;a href=&quot;http://wordpress.org/extend/plugins/chartbeat/&quot;&gt;http://wordpress.org/extend/plugins/chartbeat/&lt;/a&gt;
		&lt;/p&gt;
		&lt;p&gt;
		This plugin fetches new data once every hour using Wordpress's built-in &lt;a href=&quot;http://codex.wordpress.org/Function_Reference/wp_schedule_event&quot;&gt;scheduling hooks&lt;/a&gt; to update the list of popular posts hourly.
		This keeps things self-contained, but doesn't provide much flexibility. You may want to use cron instead, which would require a little hacking.
		&lt;/p&gt;
	&lt;/form&gt;
&lt;/div&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// get popularity data from chartbeat, store in db</span>
add_action<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin_update_event'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'ct_most_popular_plugin_update_chartbeat'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_update_chartbeat<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #666666; font-style: italic;">// construct chartbeat call</span>
	<span style="color: #000088;">$options</span> <span style="color: #339933;">=</span> get_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ct_most_popular_plugin'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$host</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'host'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$apikey</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'chartbeat_api_key'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$limit</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$options</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'limit'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// build url</span>
	<span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'http://api.chartbeat.com/toppages/?host=HOST&amp;limit=LIMIT&amp;apikey=APIKEY'</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'HOST'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$host</span><span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'APIKEY'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$apikey</span><span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'LIMIT'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$limit</span><span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// fetch data</span>
	<span style="color: #000088;">$data</span> <span style="color: #339933;">=</span> <span style="color: #990000;">file_get_contents</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$data</span> <span style="color: #339933;">=</span> <span style="color: #990000;">json_decode</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// exit if not enough results back</span>
	<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">count</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;</span> <span style="color: #000088;">$limit</span><span style="color: #009900;">&#41;</span>
		<span style="color: #b1b100;">return</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">&lt;</span>count<span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'path'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">&quot;/&quot;</span><span style="color: #009900;">&#41;</span>
			<span style="color: #b1b100;">continue</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$result</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$data</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_slice</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #000088;">$limit</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// store in db</span>
	update_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_data&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// add this function in your sidebar</span>
<span style="color: #000000; font-weight: bold;">function</span> ct_most_popular_plugin_widget<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000088;">$data</span> <span style="color: #339933;">=</span> get_option<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ct_most_popular_plugin_data&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&lt;ul&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$post</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&lt;li&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&lt;a href=&quot;'</span><span style="color: #339933;">.</span><span style="color: #000088;">$post</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'path'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&quot;&gt;'</span><span style="color: #339933;">.</span><span style="color: #000088;">$post</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'visitors'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'-'</span><span style="color: #339933;">.</span><span style="color: #000088;">$post</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'i'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">'&lt;/a&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&lt;/li&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'&lt;/ul&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Go to &#8220;Settings&#8221; > &#8220;Ct Most Popular&#8221; to set your API Key and other options.</p>
<p>Updates occur once each hour.</p>
<p>You&#8217;ll almost certainly want to tweak the way the posts are displayed in the ct_most_popular_plugin_widget() function.</p>
<p>Anyway.. just fooling around.. For all the frustration it has caused me.. Still gotta say, WordPress is pretty friggin&#8217; cool.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/06/making-my-own-wordpress-chartbeat-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is AdMob worth it? Maybe..</title>
		<link>http://www.craiget.com/2010/06/is-admob-worth-it-maybe/</link>
		<comments>http://www.craiget.com/2010/06/is-admob-worth-it-maybe/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 14:49:47 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=173</guid>
		<description><![CDATA[I admit having mixed feeling about internet advertising. I guess at present, I view it as a necessary evil in which I happen to participate. As I can hardly muster my thoughts into coherence on THAT subject, instead, here&#8217;s a bit on how it is working out for me. So &#8220;paid&#8221; apps on the Android [...]]]></description>
			<content:encoded><![CDATA[<p>I admit having mixed feeling about internet advertising. I guess at present, I view it as a necessary evil in which I happen to participate. As I can hardly muster my thoughts into coherence on THAT subject, instead, here&#8217;s a bit on how it is working out for me.</p>
<p>So &#8220;paid&#8221; apps on the Android Market are not making money for most developers. There may be a host of reasons, but I suspect it has to do with iPhone users being very comfortable with the $0.99 music purchases from iTunes and applying the same mentality to the App Store, while Android users simply aren&#8217;t used to pulling the trigger. Free apps, on the other hand, do just fine.</p>
<p>The conversion rate from free to paid on my two popular apps? </p>
<p>1/10,000 and 1/1,000. No joke. 400,00 free downloads. 40 paid.</p>
<p>Since that sucks, like many folks, I&#8217;ve resorted to using Ads to make some money from free apps.</p>
<p>After showing ads for awhile, I thought it would be an interesting to try promoting my newest app by BUYING some ads as well.</p>
<p>First, I setup 2 House Ads, which are free, but only show up in your own applications. (So you can advertise your own stuff, but you don&#8217;t make any money). Over a week, these have had a rather good click-thru rate of 4.44%.</p>
<p>For the experiment, I spent $50 (the minimum allowed) for one day of regular advertising on AdMob, creating two ads identical to my House Ads. Surprisingly, these had a much lower click-thru rate of 0.47% (from 353,690 impressions).</p>
<p>I&#8217;m not sure what to make of this, but the most obvious conclusion seems to be that my purchased ads were badly targeted. Oddly, when creating ads, you can choose some basic demographic information like location, age and gender, but you don&#8217;t get to target specific keywords. However, as an app publisher, you *do* get to target keywords. How does that work? I can only guess that they match the keywords against the 35 (max) letters of the ad text. But that can&#8217;t possibly be reliable in the same way as matching a long webpage body text. Maybe they do it manually? That would explain the 24 hour-ish ad approval period.. Hmm&#8230;</p>
<p>Anyway, kinda guessing a bit, since there&#8217;s not a good way to know when each download occurred and whether it was an ad click-thru or a normal download, it looks like the $50 netted about 1000 downloads. Or, about $0.05 per download.</p>
<p>That&#8217;s on a free app, by the way. Paid apps will likely have a MUCH lower conversion, resulting in a higher cost per download.</p>
<p>So, is it worth it? Well, that&#8217;s hard to say. My current feeling is that it might be worth it initially to bootstrap a new app with a couple thousand downloads. When people download an app, they see a range indicating the approximate number of downloads (1-50, 50-250, etc..) I think it inspires confidence to see that an app has been downloaded 10,000 times. Also, more downloads seems to mean higher rankings in the &#8220;Top Free&#8221; section, more or less..</p>
<p>However, my most popular app, which was never advertised, has nearly 400,000 downloads. That would cost $20,000!! (yeah, I know that estimate makes *tons* of assumptions) So paid advertising is certainly not a viable way to get all the way to the top of the &#8220;Top Free&#8221; section.</p>
<p>Anyway, I would be interested in anyone else&#8217;s experiences with advertising free apps on AdMob. </p>
<p>One point to note, the AdMob Help seems to indicate that you need to pay a higher CPC rate if you want your ads to be shown. In my experience, that was not the case. Even with the lowest $0.03, my ads still got shown over 350k times. So don&#8217;t pay $0.20 for each click! Furthermore, as an app publisher, I rarely see a 100% fill rate. While the reality may be more complicated, that seems to indicate that there are too many publishers and not enough advertisers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/06/is-admob-worth-it-maybe/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fetching Android Market Stats with Selenium RC</title>
		<link>http://www.craiget.com/2010/05/fetching-android-market-stats-with-selenium-rc/</link>
		<comments>http://www.craiget.com/2010/05/fetching-android-market-stats-with-selenium-rc/#comments</comments>
		<pubDate>Thu, 20 May 2010 22:14:34 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=170</guid>
		<description><![CDATA[Finally.. I&#8217;ve got a reasonably decent way to pull Android Market stats. For some reason I keep coming back to this topic (see here and here). This time, the way forward is to use Selenium RC, part of the Selenium browser testing suite. My example will be in Python, but Selenium has bindings for several [...]]]></description>
			<content:encoded><![CDATA[<p>Finally.. I&#8217;ve got a reasonably decent way to pull Android Market stats. For some reason I keep coming back to this topic (see <a href="http://www.craiget.com/2009/05/getting-your-stats-from-the-android-marketplace-with-phpcurl/">here</a> and <a href="http://www.craiget.com/2009/04/get-android-market-stats-with-python-mozrepl-and-beautifulsoup/">here</a>). This time, the way forward is to use Selenium RC, part of the Selenium browser testing suite.</p>
<p>My example will be in Python, but Selenium has bindings for several languages.</p>
<p>First of all, you gotta download Selenium RC from here: <a href="http://seleniumhq.org/download/">http://seleniumhq.org/download/</a></p>
<p>Then, extract it someplace you can remember. I&#8217;ve been putting things in ~/opt lately.</p>
<p>Okay, now create a new python script, comma ca:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #dc143c;">sys</span>.<span style="color: black;">path</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/the/path/to/selenium-python-client-driver-1.0.1'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> selenium <span style="color: #ff7700;font-weight:bold;">import</span> selenium
&nbsp;
<span style="color: #dc143c;">email</span> = <span style="color: #483d8b;">'YOUR_GOOGLE_LOGIN'</span>
passwd = <span style="color: #483d8b;">'YOUR_PASSWORD'</span>
&nbsp;
s = selenium<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;localhost&quot;</span>, <span style="color: #ff4500;">4444</span>, <span style="color: #483d8b;">&quot;*firefox&quot;</span>, <span style="color: #483d8b;">&quot;http://market.android.com&quot;</span><span style="color: black;">&#41;</span>
s.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
s.<span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;/publish/Home&quot;</span><span style="color: black;">&#41;</span>
s.<span style="color: #008000;">type</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Email&quot;</span>, <span style="color: #dc143c;">email</span><span style="color: black;">&#41;</span>
s.<span style="color: #008000;">type</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Passwd&quot;</span>, passwd<span style="color: black;">&#41;</span>
s.<span style="color: black;">click</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;signIn&quot;</span><span style="color: black;">&#41;</span>
s.<span style="color: black;">wait_for_page_to_load</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;30000&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
n = <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>s.<span style="color: black;">get_xpath_count</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;//div[@class='listingRow']&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">3</span>,n<span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">try</span>:
    title = s.<span style="color: black;">get_text</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;xpath=(//div[@class='listingRow'])[%s]/div[1]/div[1]&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span>
    downloaded = s.<span style="color: black;">get_text</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;xpath=(//div[@class='listingRow'])[%s]/div[2]/div[1]/span[1]&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span>
    installed = s.<span style="color: black;">get_text</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;xpath=(//div[@class='listingRow'])[%s]/div[2]/div[2]/span[1]&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span>
    comments = s.<span style="color: black;">get_text</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;xpath=(//div[@class='listingRow'])[%s]/table&quot;</span> <span style="color: #66cc66;">%</span> i<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span>:-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> title, downloaded, installed, comments
  <span style="color: #ff7700;font-weight:bold;">except</span>:
    <span style="color: #ff7700;font-weight:bold;">pass</span></pre></div></div>

<p>* Be sure to fill in YOUR_GOOGLE_LOGIN with your email (or whatever login) and the matching password.</p>
<p>This script is a bit of a trainwreck.. but it works and I don&#8217;t feel like screwing with it..</p>
<p>* Working with xpath in selenium-rc&#8217;s python binding feels really weird.. doesn&#8217;t seem to behave quite the way you would expect. </p>
<p>* Why does the iteration start at 3? I dunno.. there are some empty rows at the beginning I guess..</p>
<p>* Why is it wrapped in a try-except block? I dunno.. some empty rows at the end?</p>
<p>* It works on Ubuntu 10.04 / FF 3.6.3. Your mileage may vary. I wouldn&#8217;t be surprised if those xpath selectors needed more tweaking in some cases.</p>
<p>To run the script, you need to start the Selenium RC server. Go to the place you downloaded it:</p>

<div class="wp_syntax"><div class="code"><pre class="sh" style="font-family:monospace;">cd /path/to/selenium
java -jar selenium-server.jar</pre></div></div>

<p>Then, you should be able to run this script from a terminal and it will start firefox, log you in to the Android Developer Console, wait a few seconds til the Ajax all loads, then use xpath to scrape each row of data from the table and print it to the terminal.</p>
<p>From there it should be pretty simple to export the results into a CSV file or make pretty charts or whatever it is you wanna do.</p>
<p>It does pop up a window on the screen, which is kinda annoying. Cooler to run firefox headless, maybe some other time..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/05/fetching-android-market-stats-with-selenium-rc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Integer overflow, a first time for everything</title>
		<link>http://www.craiget.com/2010/05/integer-overflow-a-first-time-for-everything/</link>
		<comments>http://www.craiget.com/2010/05/integer-overflow-a-first-time-for-everything/#comments</comments>
		<pubDate>Tue, 18 May 2010 22:54:47 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=169</guid>
		<description><![CDATA[Somehow until today I had avoided the bite of an integer overflow bug. I wanted to get a series of date strings in the format &#8220;yyyymmdd&#8221; for fetching some resources from a website. So the following seemed like it should work: //bad - don't do this long TS0 = 1272690000000L; //may 1st, 2010 long millis [...]]]></description>
			<content:encoded><![CDATA[<p>Somehow until today I had avoided the bite of an integer overflow bug.</p>
<p>I wanted to get a series of date strings in the format &#8220;yyyymmdd&#8221; for fetching some resources from a website. So the following seemed like it should work:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">//bad - don't do this</span>
<span style="color: #000066; font-weight: bold;">long</span> TS0 <span style="color: #339933;">=</span> 1272690000000L<span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//may 1st, 2010</span>
<span style="color: #000066; font-weight: bold;">long</span> millis <span style="color: #339933;">=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #cc66cc;">1000</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">24</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">365</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//one year from today</span>
<span style="color: #000000; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span>millis <span style="color: #339933;">&gt;</span> TS0<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #003399;">String</span> date <span style="color: #339933;">=</span> <span style="color: #003399;">DateFormat</span>.<span style="color: #006633;">format</span><span style="color: #009900;">&#40;</span>yyyyMMdd, millis<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  millis <span style="color: #339933;">-=</span> <span style="color: #cc66cc;">1000</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">24</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>That method worked.. kind of.. doing the next 10 days worked just fine. But doing 365 days didn&#8217;t! Huh?!</p>
<p>After bashing on it in place for waaay too long and beginning to question my sanity, I decided to write a separate little program to isolate the problem.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> WaitWhat <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> args<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">long</span> millis <span style="color: #339933;">=</span> <span style="color: #cc66cc;">1000</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">24</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">365</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// should be 31,536,000,000</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>millis<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>And the result?</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #cc66cc;">1471228928</span></pre></div></div>

<p>Well, that came as something of a surprise..</p>
<p>So what&#8217;s going on? Well, Java is using integers instead of longs, so since 31,536,000,000 is larger than the maximum integer of 2,147,483,647, it wraps around.</p>
<p>I guess I assumed that multiplication would automatically use longs if it needed to. Apparently not the case!</p>
<p>So what&#8217;s the fix? Force long multiplication so it doesn&#8217;t overflow, like this:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">//note the &quot;L&quot;</span>
<span style="color: #000066; font-weight: bold;">long</span> millis <span style="color: #339933;">=</span> 1000L<span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">60</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">24</span><span style="color: #339933;">*</span><span style="color: #cc66cc;">365</span><span style="color: #339933;">;</span></pre></div></div>

<p>Well, an interesting little lesson.. Stupid bugs like that are always humbling.. I wonder how many little gems like that are buried in my code, just waiting for their day..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/05/integer-overflow-a-first-time-for-everything/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can Clojure Find Me An Apartment?</title>
		<link>http://www.craiget.com/2010/02/can-clojure-find-me-an-apartment/</link>
		<comments>http://www.craiget.com/2010/02/can-clojure-find-me-an-apartment/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 01:56:52 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[clojure]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=166</guid>
		<description><![CDATA[This post was going to be about how I spent the better part of a day trying to get clojure and emacs and slime and the java classpath all working together. The gist of it is this: I am an idiot sometimes. I spent most of an afternoon trying to figure out why it is [...]]]></description>
			<content:encoded><![CDATA[<p>This post was going to be about how I spent the better part of a day trying to get clojure and emacs and slime and the java classpath all working together.</p>
<p>The gist of it is this: I am an idiot sometimes. I spent most of an afternoon trying to figure out why it is an error to (use &#8216;clojure.contrib). Earlier in the day, my classpath was setup wrong, so (use &#8216;clojure.contrib.duck-streams) didn&#8217;t work. At some point, I stopped typing the whole thing, thinking that if &#8216;clojure.contrib.duck-streams works, then so should the parent package &#8216;clojure.contrib. A-ha! Save myself a bit of typing! Nope. That never works.. so, when I finally <strong>did</strong> get my classpath working,<em> I didn&#8217;t know it</em> because I was typing something that&#8217;s just plain wrong. Hilarious and Awesome, huh?</p>
<p>So, with everything finally working, I made my first little half-way real Clojure program.</p>
<p>Our current lease runs out in about a 6 weeks, so me and my roommate need to find a new place to live &#8211; sounds like a job for Craigslist. There&#8217;s a problem though: in big cities, Craigslist is absolutely flooded with apartments and the search functions just aren&#8217;t that good. I have no interest in skimming hundreds or thousands of posts looking for that perfect combination of price/location/amenities (well, mostly price and location, actually), so why not let the computer do the work instead? Usually this would be a job for Python/BeautifulSoup, but in the interest of learning Clojure, here goes..</p>
<p>Following is what I&#8217;ve come up with so far for scraping apartments off Craiglist as gently as possible by filtering out links that don&#8217;t meet my criteria. Right now, this code only generates the list of matching links, it doesn&#8217;t actually follow them. If I continue further with this program, that will be Step 2, probably using <a href="http://lethain.com/entry/2009/nov/24/scalable-scraping-in-clojure/">http://lethain.com/entry/2009/nov/24/scalable-scraping-in-clojure/</a> for inspiration.</p>
<p>This is based on the <a href="http://github.com/cgrand/enlive">Enlive</a> library, which provides a very usable syntax for ripping through HTML (though I don&#8217;t quite understand it all yet). As I&#8217;m still a complete beginner with Clojure and functional programming in general, the following code is probably far from idiomatic and may look sloppy to you pros out there. Comments and suggestions are welcome!</p>

<div class="wp_syntax"><div class="code"><pre class="clojure" style="font-family:monospace;">;; import enlive
(use 'net.cgrand.enlive-html)
&nbsp;
;; html helper
(defn fetch-url [url]
  (html-resource (java.net.URL. url)))
&nbsp;
;; pulls link from paragraph
;; ie, (map get-link (select *cl* [:p]))
(defn get-link [p]
  (:href (:attrs (first (:content p)))))
&nbsp;
;; pulls text of link from paragraph
(defn get-link-text [p]
  (:content (first (:content p))))
&nbsp;
;; pulls text of parens following link
;; usually this is zipcode/location info
;; &quot;&quot;, if absent
(defn get-paren-text [p]
  (let [content (:content p)]
    (if (&lt; 2 (count content))
      (:content (nth content 2))
      &quot;&quot;)))
&nbsp;
;; pulls link/text/location into a map
(defn get-all [p]
  {:link (get-link p)
   :text (str (get-link-text p)
	      (get-paren-text p))})
&nbsp;
;; some helpers to remove links we don't care about 
&nbsp;
;; (affordable &quot;$800&quot; 600 1000) #t
;; (affordable &quot;$1500&quot; 600 1000) #f
(defn affordable? [text min max]
  (let [price (second (re-find #&quot;\$(\d+)&quot; text))]
    (if price
      (let [price (Integer/parseInt price)]
	(and (&lt;= min price)
	     (&gt;= max price))))))
&nbsp;
;; (has-kword &quot;downtown&quot; (list &quot;down&quot;)) #t
;; (has-kword &quot;down&quot; (list &quot;downtown&quot;)) #f
(defn has-kword? [text kwords]
  (let [vals (map #(re-find (re-matcher (re-pattern %) text)) kwords)]
    (some #(not (= nil %)) vals)))
&nbsp;
;; parameterizes a function to decide if a link is worth retrieving
;; this would be cooler if the criteria functions
;; came in as a list too.. but that makes my head
;; spin.. maybe later
(defn keep-link? [min max areas beds]
  (fn [{link :link text :text}]
    (let [text (.toLowerCase text)]
      (and link
	   (re-find #&quot;/apa/&quot; link)
	   (affordable? text min max)
	   (has-kword? text areas)
	   (has-kword? text beds)))))
&nbsp;
;; some top level definitions
;; you may need to change these to get non-empty results
(def *url* &quot;http://losangeles.craigslist.org/apa/&quot;)
(def *min-price* 100)
(def *max-price* 10000)
;; I kinda like it in the South Bay, but whatever..
(def *areas* (list &quot;hollywood&quot; &quot;weho&quot;))
(def *beds* (list &quot;2br&quot; &quot;3br&quot;))
(def my-keep-link? (keep-link? *min-price* *max-price* *areas* *beds*))
&nbsp;
;; actually do the work
(filter my-keep-link? (map get-all (select (fetch-url *url*) [:p])))
&nbsp;
;; References
;; 1) http://wiki.github.com/cgrand/enlive/
;; 2) http://github.com/swannodette/enlive-tutorial/
;; 3) Programming Clojure, Stuart Halloway
;; 4) lots and lots of Googling</pre></div></div>

<p>On the whole, I&#8217;m liking Clojure a lot, but there is also a lot to learn.</p>
<p>(Shocking conclusion, I know!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/02/can-clojure-find-me-an-apartment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A few cool videos from Google Tech Talks</title>
		<link>http://www.craiget.com/2010/01/a-few-cool-videos-from-google-tech-talks/</link>
		<comments>http://www.craiget.com/2010/01/a-few-cool-videos-from-google-tech-talks/#comments</comments>
		<pubDate>Sat, 09 Jan 2010 19:29:52 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[CS]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.craiget.com/?p=165</guid>
		<description><![CDATA[I keep meaning to find some interesting podcasts and online lectures. There&#8217;s a ton of material out there, but so much of it sucks. Anyway, browsing the topic &#8220;What are the best Google Tech Talks&#8221; on Stackoverflow, I found the following, which I now link for your viewing pleasure: XKCD visits Google &#8211; Very funny [...]]]></description>
			<content:encoded><![CDATA[<p>I keep meaning to find some interesting podcasts and online lectures. There&#8217;s a ton of material out there, but so much of it sucks. Anyway, browsing the topic &#8220;<a href="http://stackoverflow.com/questions/923486/what-are-the-best-google-tech-talks">What are the best Google Tech Talks</a>&#8221; on <a href="http://stackoverflow.com/">Stackoverflow</a>, I found the following, which I now link for your viewing pleasure:</p>
<p><a href="http://www.youtube.com/watch?v=zJOS0sV2a24">XKCD visits Google</a> &#8211; Very funny and interesting, but perhaps less enjoyable unless you&#8217;re an <a href="http://www.xkcd.com/">xkcd</a> fanboy like me. Jump to 21:30 where xkcd answers a joking question from Donald Knuth.</p>
<p><a href="http://www.youtube.com/watch?v=_m97_kL4ox0">PolyWorld: Using Evolution to Design Artificial Intelligence</a> &#8211; An interesting A-Life experiment/visualization. Jump to 5:35 for some really neat video of an older program that evolves different body morphologies for efficient movement in a simulated physical environment. (I think this is the <a href="http://www.karlsims.com/evolved-virtual-creatures.html">original work</a> the speaker is citing)</p>
<p><a href="http://www.youtube.com/watch?v=AyzOUbkUf3M">The Next Generation of Neural Networks</a> &#8211; The speaker flies through the intro material much too fast for me to understand with only a rudimentary knowledge of NN. Nevertheless, the demo at 21:35 is cool, as is the discussion around 31:40 of using these layered NN for document clustering and classification.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.craiget.com/2010/01/a-few-cool-videos-from-google-tech-talks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
