0

Batch Extracting MP3s from YouTube Videos

Last night I wanted to extract audio tracks from a number of YouTube videos that I’d downloaded using youtube-dl. Being only a so-so shell scripter, I’ve always resorted to ugly for-loops when manipulating multiple files. This invariably ends badly when my loop improperly handles whitespace and mangles the filenames.

No more! Skimming a tutorial last night I stumbled on something that heavy shell users already know: the -exec parameter for the find command. This allows you to specify a command to run on everything that find finds. In the case of extracing audio from MP3s, it works like this:

find . -name '*.flv' -exec ffmpeg -i '{}' '{}.mp3' ';'

This command looks in the current directory for flv files and uses ffmpeg to extract the audio to another file with the same name, plus the .mp3 extension. The funny brackets {} are substituted for the file name.

A downside to this approach – your files end up with names like .flv.mp3 instead of .mp3. If that bothers you, you can fix it with the rename command which uses regexes to rename files:

rename 's/\.flv\.mp3/\.mp3/' *.flv.mp3

Ubuntu users like myself will need to install ffmpeg and ubuntu-restricted-extras to get the necessary encoder.

There are certainly lots of other ways to encode a directory worth of files, but I think this one is pretty cool.

0

Collaborative Filtering, Hadoop and the Hazards of Copy-Paste

I’ve been working on a new App idea lately – a recommender for Android programs. Basically, it looks at what you have installed (and possibly ratings) and recommends other applications you might like by using the recommendations of other people in the same way as Amazon or the various music services – in a word – collaborative filtering.

There are different ways to do collaborative filtering, but they are all expensive when you get a lot of records to sort through. Two common approaches are 1) Calculate the similarity of users, and recommend apps liked by similar users, or 2) Calculate the similarity of apps, and recommend apps similar to ones the user likes. I am trying the second way, known as item-based collaborative filtering or the model-based approach, which allows for fast queries at the cost of an expensive offline step that re-computes the item similarities every once in awhile.

My initial tests in Python, based on the very interesting book “Programming Collective Intelligence” quickly became too slow with just a few thousand users and apps. Because there are already around 5,000 apps and a few million users of Android (with many more every day), there’s no way the script would be able to handle the future growth of the platform.

Enter MapReduce and Hadoop. The explanation is better left to the pros, but simply, MapReduce is a way of parallelizing certain types of computations across many computers and then merging the final results. With the availability of Amazon Web Services, which allows you to rent a cluster of computers by the hour, it becomes possible to run a prohibitively expensive computation once every few days for just a couple of dollars. There are several different MapReduce frameworks out there, but I choose to try Hadoop, which is available on Amazon’s services and used heavily by Yahoo and many others.

There will be a lot more to say about Hadoop as I gain more experience. But all-in-all, it is pretty fun to re-think an algorithm, even just a little bit, to make it suitable for MapReduce. I *think* I have a correct implementation of Item-Based Collaborative Filtering running on my tiny 2-node cluster and it’s pretty cool!

One snag I ran into while trying to get my cluster running using the ubiquitous WordCount example for Hadoop. Like most people, I copy-pasted the source from the Hadoop tutorial and tried to run it. It ran, great! So then instead of reading the rest of the documentation, I immediately tried to modify it. Eventually, I ended up trying to make the simplest change – to return Text instead of IntWritables from the Map operation and — WTF!?! I spent HOURS trying to figure out why there was a ClassCastException. So for other poor souls trying to modify the WordCount example, there are 3 things you need to do:

First, get the method signatures right. The Mapper has to output Text and the Reducer has to consume Text (Eclipse will help with that, of course)

Second, add the lines: “conf.setMapOutputKeyClass(Text.class);” and “conf.setMapOutputValueClass(Text.class);” to the main() method. These tell Hadoop that the Mapper is not using the default, IntWritable, for output

Third, and crucially important, remove the line “conf.setCombinerClass(Reduce.class);”. Discovering that I needed to remove that single line took me about half a day, digging through the logs and Googling everything I could think of until I discovered this thread. Because it was part of the example, I assumed it was Hadoop boiler-plate that was essential — it’s not, it’s an optimization. The Combiner is kind of like a pre-Reduce phase that saves time by combining in-memory results instead of writing them to disk and combining them later. The Combiner needs a method signature that accepts the output of the Mapper and is still suitable as input to the Reducer. Otherwise, it chokes.

So is the peril of the copy-paster who runs code without really understanding all of it ~~

0

A tale of woe, a database disaster averted

I know I’m not a great coder, but I like to think I’m at least decent or even good on occasion. The events of the past week illustrate that even that estimation may be a stretch.

When I came to work on Monday morning, I discovered that one of my database tables was exceedingly empty. For about an hour, I was convinced I had been hacked. However, after checking every last thing, I finally discovered the culprit – a leftover setup script had dropped and re-created the databases. There are so many things that went wrong, I am inclined to enumerate them:

  • I LEFT A SETUP SCRIPT ON THE SERVER
  • There was no logging in place, so it took ages to diagnose the problem and rule out hacking
  • I had left unneccesary CREATE/DROP privileges turned on when only INSERT was needed
  • I found a SQL Injection vulnerability (that had been added in a last minute tweak)
  • I didn’t have an automated backup in place
  • I almost made a manual backup on Friday afternoon but then thought “What could go wrong?”
  • I had looked at the data several times in the last week in a spreadsheet, but didn’t save the spreadsheet and it was gone from the local cache
  • The only other person who had been downloading daily had deleted the one column I needed – every single day.

So it was pretty much a conspiracy of every single possible thing going wrong. At least, I learned a good number of lessons about the value of logging and automated backups – in the future, I will be much less cavalier. Mercifully, the server administrators (who are more prudent than your author) take nightly snapshots. After prostrating myself at the altar of the Unix Gods, my database was restored after 3 days with minimal lossage.

In an unrelated matter, I am updating a website whose data access must be completely changed to implement a new protocol. Looking back at my old code, I am frankly unimpressed. The separation of data, logic, and UI is downright bad. If I had designed this well the first time (about 18 months ago), it would be a relatively simple matter of swapping out the data layer. Instead, I am now forced to go back and re-do the whole thing over, only better this time.

So it’s been a big week for humble pie. I know it’s healthy for us humans to get frequent reminders of our fallability and to learn from our mistakes, but that doesn’t make it fun.

1

Some Tricky Things About Creating a High Scores List

My project lately has been adding a Net-based high scores list to a game on the Android marketplace. While the game has a decent number of downloads, I think it gets kinda boring after a few plays through. Hopefully allowing players to compete with each other for the highest score will boost the fun/challenge (and thus the ratings and sales)

Allowing players to submit high scores to a global ranking raises quite a few issues that are absent in a non-networked high scores list. Following are some of the ideas I’ve come up with. If you’ve tried something similar, I’d love to hear your approach.

A couple of interesting issues:

1. WebView or Android UI
2. Routine networking challenges
3. Serialization
4. Storing the high scores on the server
5. Letting players choose names
6. Preventing hacks

There are two main ways to display high scores – either using a WebView to display an HTML high score list or building a regular Android UI. For now I’ve opted for the regular UI because it integrates better with the look of the app. However, that decision comes at a price — with a WebView/HTML UI, you can make changes on the webserver and it immediately affects every client. With the regular UI, if there’s a problem, you can’t push the fix. All you can do is publish an update and hope everyone downloads it.

There are two important things to remember about networking for a high scores list (or just in general) – first, the connection may be unavailable for a variety of reasons (the device may not be connected to the network, or the server could be down, etc). So the code needs to detect being unable to connect and handle it appropriately. Secondly, a connection could be slow or time-out. Together this means the connection should be handled in a separate thread or the UI will hang and Android will kill the app.

There are several reasonable ways to transfer the data. I found JSON to be the simplest encoding because the Java API is straightforward and it is only requires 1 line of PHP on the server to unpack a JSON structure into either an Object or an associate array. Nice!

Storing the high scores on the server is pretty easy – just drop them in a SQL database. I’m still struggling with the best way to handle the growth of the table. Depending on the number of players, the scores table may grow quite substantially (though, the growth should be somewhat bounded as the top scores trend higher and closer to the absolute limit). One possibility is to setup cron or some automated task to periodically wipe out scores below a certain threshold. Another possibility is to check each time a score is submitted whether anything needs to be deleted. Yet another possibility is to just keep all the scores and delete them manually from time to time, since there will be thousands or tens-of-thousands, not millions or billions. Fortunately, deciding what to keep is a server-only issue, so it is easy to keep everything now and postpone the decision based on actual usage and growth.

Another tricky issue is letting players choose names for the high score list. Since my game is appropriate for all ages, I don’t really want some bozo submitting filthy or offensive names. One approach is to not worry about it. Another approach is to try filtering the submitted names. Inevitably, any filtering is bound to miss something that’s offensive to someone. Furthermore, if the filtering is noticed, that turns it into a challenge for the player, who may now wonder: “what’s the most awful thing i can sneak past the filter”? While unlikely, that could get bad fast and having a high score list that requires daily moderation is no good. Another idea would be allowing only 3 letters for the names. That eliminates the 4-letter words, but initials-only makes a high score list less fun. Right now, I plan to wait and see if there is actually a problem before trying to fix it.

And saving the best for last — how to prevent someone from hacking the high scores list? Alas, it can’t really be done. No matter how you arrange the code, ultimately the player is responsible for truthfully submitting his own score. Because it is pretty easy to watch the data in-transit, the best you can do is to make it more complicated. One option is a checksum – send an additional parameter that is a salted hash of, say, the name and score. This is not bad and should deter half-hearted attempts, however the salt may be discoverable and any encryption or checksum is doomed to fail since the player controls the client. Another interesting option is submitting snapshot data of the game in-progress. However, that requires the server to know quite a bit about the game and what a valid state looks like. A super-crazy option would be storing all the game state on the server and only transferring player input (keypresses, screen touches) over the wire. That could actually work – if the game has an element of randomness, the only way to cheat would be to write a semi-intelligent bot. (If the game is deterministic, the player could just replay a winning series of moves) But.. the overhead of running all the logic and serving hundreds or thousands of simultaneous clients makes it unrealistic for a typical game. Since there is no monetary incentive to winning my game, I’m not too worried about this, but my optimism may be misplaced. Much more on this here.

Finally, an idea which I’ve dropped for the moment is a weekly high scores list (in addition to all-time high scores). This would make a high score achievable for a new or less determined player and prevents that inevitable stagnation as the all-time high scores get closer and closer to the maximum possible. One tricky part about a weekly list is that once again you need to rely on the client to be truthful about when each score was achieved. The current plan is to see how things go with the regular high score list, and then evaluate whether a weekly/daily/monthly list makes sense.

Eh, well. That’s what’s been on my mind lately. For something that sounds simple at first, there really are quite a lot of issues to consider in every part of the code. For now, the implementation is almost complete, so I can’t wait to see how it goes. If it works out well, I’ll try to follow up with actual code instead of random musings.

1

Another interesting python snippet

Well, I think it’s interesting anyway..

So today I was trying to express the idea “do something to all these files, unless the filename matches the list of things we don’t care about”. I have to do a BUNCH of find-and-replace kinda stuff in the next week on a couple thousand webpages, so my plan is to write a little script to make sure I don’t miss anything. Sometimes the script will turn up a false positive that I know I want to ignore..

EDIT – The following snippet doesn’t do quite what I thought it did.. more later..

There are at least three distinct and reasonable ways to do this in Python:

# skip if name matches any of these
ignore = ["thing1", "thing2"]
 
# Iterate over the strings, see if a substring matches
# This is reasonably clear, but it seems long and requires a flag
for filepath in dirwalk(path):
    found = False
    for item in ignore:
        if filepath.find(item) > 0:
            found = True
            break
    if found:
        continue
    # do stuff..
 
# The next two ways build a list of matches
# If there are any matches then skip this file
 
# This is a more functional style
for filepath in dirwalk(path):
    if filter(lambda item: filepath.find(item) != -1, ignore):
        continue
    # do stuff..
 
# Another way to do the same thing
for filepath in dirwalk(path):
    if [item for item in ignore if filepath.find(item) != -1]:
        continue
    # do stuff..

Interestingly, the last two use the same number of characters. I’m not sure which I prefer. While I suspect the final one would be considered the most Pythonic, I do have a soft spot for lambda. Eh, maybe someday when I’m not lazy I will see which is the fastest.. though that doesn’t really matter for my purposes.

0

Really? ‘ is not valid HTML?

Well, I learned something today. ' is not valid in HTML. Incredibly, IE gets it right and FF/etc gets it wrong by interpreting it as an apostrophe character. I had no idea.. the simplest correct thing to do is use: ' instead.

More at: http://stackoverflow.com/questions/419718/html-apostrophe and http://fishbowl.pastiche.org/2003/07/01/the_curse_of_apos/ and http://www.alistapart.com/articles/emen.

The A List Apart article is interesting, basically saying that everybody is doing everything wrong – technically. Yet I can’t help but feel that what people actually do trumps something written in a style guide. It’s kinda like when people say you can’t end a sentence with a preposition… but you can, and people do. Is it really bad grammar when everybody does it? Or does that just mean our rules of grammar are incorrect for modern usage? Well, linguistic and philosophical excursions aside, ' is bad. Who knew?

2

Extending Android’s Chronometer to get Elapsed Time

In revamping one of my Games on the Android Market, I wanted to add an onscreen timer. The built-in Chronometer class does 95% of what I needed, but getting that extra 5% was annoying enough that I decided to post my solution.

The OOTB Chronometer keeps tracks of time, but it can’t tell you the number of seconds elapsed since it started. This is easily remedied, as I discovered from the fine people on StackOverflow. However, simply getting the elapsed time is not enough because the Chronometer will reset itself everytime the app is Paused or Killed. The two cases are quite different. When the app is Paused, it is still alive and may resume (ex. an incoming call). When the app is Killed, it needs to store the elapsed time somewhere so it can be recovered when the app is Restored (ex. orientation change, or b/c the OS needs memory).

So here is a stripped down solution that seems to do the job:

import android.app.Activity;
import android.content.Context;
import android.os.Bundle;
import android.os.SystemClock;
import android.util.Log;
import android.view.Menu;
import android.widget.Chronometer;
 
public class CustomChronometerActivity extends Activity {
	private static final String TAG = "CustomChronometerActivity";
	private static final String MS_ELAPSED = "com.etc.etc.MsElapsed";
 
	private MyChronometer chrono;
 
	@Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
 
        //start the chronometer
        chrono = new MyChronometer(this);
        chrono.start();
        setContentView(chrono);
    }
 
	@Override
	protected void onPause() {
		Log.i(TAG, "onPause()");
		super.onPause();
		chrono.stop();
	}
 
	@Override
	protected void onResume() {
		Log.i(TAG, "onResume()");
		super.onResume();
		chrono.start();
	}
 
	@Override
	protected void onSaveInstanceState(Bundle outState) {
		super.onSaveInstanceState(outState);
		Log.i(TAG, "onSaveInstanceState()");
		chrono.stop();
		outState.putInt(MS_ELAPSED, chrono.getMsElapsed());
	}
 
	@Override
	protected void onRestoreInstanceState(Bundle savedInstanceState) {
		super.onRestoreInstanceState(savedInstanceState);
		Log.i(TAG, "onRestoreInstanceState()");
		int ms = savedInstanceState.getInt(MS_ELAPSED);
		chrono.setMsElapsed(ms);
		chrono.start();
	}
 
	class MyChronometer extends Chronometer {
 
		public int msElapsed;
		public boolean isRunning = false;
 
		public MyChronometer(Context context) {
			super(context);
		}
 
		public int getMsElapsed() {
			return msElapsed;
		}
 
		public void setMsElapsed(int ms) {
			setBase(getBase() - ms);
			msElapsed  = ms;
		}
 
		@Override
		public void start() {
			super.start();
			setBase(SystemClock.elapsedRealtime() - msElapsed);
			isRunning = true;
		}
 
		@Override
		public void stop() {
			super.stop();
			if(isRunning) {
				msElapsed = (int)(SystemClock.elapsedRealtime() - this.getBase());
			}
			isRunning = false;
		}
	}
}

Certainly there are other ways to do this – either by implementing your own timer using Threads or Handlers, or perhaps by implementing an OnChronometerTickListener and subscribing to events. I rather like this solution, but if you’re the clever sort and see some situation where this doesn’t work or some reason why it might be a bad idea, please let me know.

0

More Reasons I Love Python..

I was cleaning up some folders the other day at work where the files had been named using one of several naming schemes (or a few with no particular scheme at all). After brief consideration, I decided to do the legwork of renaming all the files with a naming scheme that actually makes sense:

Category_YYYY-MM-DD

That way, the files will stay grouped together if they get copied around to other folders, and they sort alphabetically by date. Then there’s the task for regenerating all the HTML for these baddies. Happily, Python was up to the task:

import os
from datetime import date
files = os.listdir("Path\\To\\File")
files.sort()
files.reverse()
for file in files:
    # chop the prefix, chop the suffix, split into (year, month, date), convert to int
    x = [int(x) for x in file.split("_")[-1][:-4].split('-')]
    print "<li><a href=\"/path/to/%s\">%s</a></li>" % (file, date(x[0], x[1], x[2]).strftime('%B %d, %Y'))

Well, it’s nothing like the real pros can do. But you gotta love a few links of code that save your fingers from a repetitive and typo-prone task like manually editing hundreds of links.

2

Getting your stats from the Android Marketplace with PHP/cURL

A few weeks ago I mentioned one way to get your developer stats off the Android Developer Console automatically in the post “Fetching Android Market Stats with Python, MozRepl, and BeautifulSoup“. Unfortunately, despite being very awesome, Firefox + MozRepl is not super-great for this task. When a plugin is updated, Firefox hangs on startup. That’s fine, but it kinda sucks for scripting. I’m sure there’s a way around it, but that difficulty makes a good excuse for coming back to solve this problem the right way.

Following is a PHP script that uses cURL to login to the Developer Console and grab the market stats. Unfortunately, Google’s app is written in GWT and the its Javascript is completely obfuscated. The market stats are fetched as JSON data and then somehow parsed, but I haven’t been able to figure out how exactly. If you run this script (or just look using Firebug), you’ll see that the JSON is a gigantic array. While the data of interest are clearly present in this array (total downloads, current installed base, rating, etc..), I haven’t been able to figure out how to parse it reliably. If you’ve tried this and figured it out, I’d love to know!

This script was assembled from a bunch of random PHP/cURL tutorials and may contain redundancy, unnecessary cURL settings, etc. Python fans, see the comments of my other post on this topic where a kind soul has demonstrated the same thing in Python using mechanize.

<?php
//setup a temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
 
//do google authorization
$data = array('accountType' => 'GOOGLE',
          'Email' => 'YOUR_ACCOUNT_EMAIL_HERE',
          'Passwd' => 'YOUR_ACCOUNT_PW_HERE',
          'source' => '',
          'service' => 'androiddeveloper');  
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/accounts/ClientLogin");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
 
//grab the AUTH token for later
$auth = '';
if($info['http_code'] == 200) {
    preg_match('/Auth=(.*)/', $output, $matches);
    if(isset($matches[1])) {
        $auth = $matches[1];
    }
}
 
//login to Android Market
//this results in a 302
//I think this is necessary for a cookie to be set
$ch = curl_init ("http://market.android.com/publish?auth=$auth");
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//go to the Developer Console
$ch = curl_init ("http://market.android.com/publish/Home");
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//grab the JSON data
//perm and postdata seem to have changed in the last 6 months
//if the script isn't working, try using firebug to inspect the Request when
//http://market.android.com/publish/editapp gets fetched
$perm = "81E29277804F7729E9B743A43B2EFD07";
$headers = array(
    "Content-Type: text/x-gwt-rpc; charset=utf-8",
    "X-GWT-Permutation: $perm",
    "Referer: http://market.android.com/publish/gwt/$perm.cache.html");
//not sure what x-gwt-permutation means, I think it may have to do with which version of GWT they serve based on your browser
$postdata = "5|0|4|http://market.android.com/publish/gwt/|09C42EAE15B55219550B2D800FAC1644|com.google.wireless.android.vending.developer.shared.AppEditorService|getFullAssetInfosForUser|1|2|3|4|0|";
$ch = curl_init ("http://market.android.com/publish/editapp");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//now what?!?
echo('<pre>');
$output = json_decode(substr($output, 4));
print_r($output);

If you run this script and are willing to send me your stats, that would be super-helpful. Maybe I’ll be able to get enough data to figure out why some apps have more fields than others. With only 3 apps currently on the market, I don’t have much to go on. Feel free to obscure your data, but please make the changes obvious and note whether the app is free/paid and what part of the market it appears on (games/apps and sub-category). Here is a link to my best guesses so far in an Excel worksheet: market-json

0

Finding Anagrams Is Harder Than It Should Be

So my most recent Android program is an anagram finder called Unanagram. For some reason I really like writing progams to solve word puzzles like Yahoo Word Racer and Scrabble. Anyway, while it seems like finding anagrams is a really easy thing to do, it turned out to be a little tricky. Following are some notes on the complications encountered in writing for Android and some solutions.

There is an easy algorithm to determine if two words are anagrams of each other: just sort the letters of each word alphabetically. If they match, they are anagrams. For example: “cats” and “acts” both sort alphabetically as “acst”.

Trickier though is finding ALL the anagrams for a particular set of letters. You could sort every word in the dictionary as a “key” and keep the unsorted word as the value. The trouble is that this obligates you to check every key and requires a fair amount of space as well. Also, it doesn’t really allow spaces which is vital for multi-word results. With only 16mb per program, memory-intensive algorithms are not viable.

After a fair amount of experimentation with different data structures, I decided to use my reliable friend, the Trie. This has the dual benefit of being space-efficient and having fast lookups. (A DAG might be more appropriate, but the Trie worked so well I didn’t bother investigating)

Nevertheless, a few complications arose.

First, you have to load the data structure from disk. Usually files like a dictionary would go into /assets. However, files in that directory get compressed, so they take longer to load. For a large file the time is unacceptable. For faster loading they can go in /res/raw.

Then there is actually building the data structure. A Trie is built through successive insertions which add branches to the tree as necessary. While adding 60k words takes less than a second on my laptop, it took 5 MINUTES on Android. That wasn’t gonna work. Since the application only needs lookups, and not insertions, I decided to build the Trie offline, serialize it into a pseudo-binary format and then load that in Android. While the file size increased from 600k to 1.5Mb, the loading time dropped to 6 seconds. This is still slow, but much more acceptable. By beginning the loading in a background thread as soon as the application starts, it becomes unnoticeable. Someone more clever than myself may be able to get both better compression and better load time, but that was “good enough” for my purposes.

Now because you only get 16Mb of RAM, it is necessary to build the Trie using as little memory as possible. I was able to get down to about 4.5Mb by building the Trie as a series of nodes, each with a Child pointer, a Next pointer and a Character indicating the node’s letter. This is a bit different from your typical Trie, which stores a list of child nodes. An uppercase letter was used to indicate a terminal node, rather than adding a boolean flag.

That’s great for the in-memory size of the tree, however, with a branching factor of 2, this tree has ENORMOUS depth. While again my laptop has no problem, Android hits a StackOverflow error when the recursion gets too deep. Happily, this is fixable by converting the naturally recursive traversal into an iterative algorithm with an external Stack. Yuck. But it works and requires an insignificant amount of memory. It’s almost certainly faster as well, but I didn’t check.

A final complication is that Android programs must always be ready to be Paused or Killed. Since generating ALL the anagrams for a long word can easily take over a minute, it would be bad news to re-start the search everytime the program gets killed (which happens a LOT – incoming calls, switching the screen orientation, and other programs contending for memory can all kill an app). Since the lookup is stack based, it was fairly straightforward to build a resume function which rebuilds the stack using the letters of the last word found.