Posts Tagged ‘python’

Another interesting python snippet

Thursday, July 2nd, 2009

Well, I think it’s interesting anyway..

So today I was trying to express the idea “do something to all these files, unless the filename matches the list of things we don’t care about”. I have to do a BUNCH of find-and-replace kinda stuff in the next week on a couple thousand webpages, so my plan is to write a little script to make sure I don’t miss anything. Sometimes the script will turn up a false positive that I know I want to ignore..

EDIT – The following snippet doesn’t do quite what I thought it did.. more later..

There are at least three distinct and reasonable ways to do this in Python:

# skip if name matches any of these
ignore = ["thing1", "thing2"]
 
# Iterate over the strings, see if a substring matches
# This is reasonably clear, but it seems long and requires a flag
for filepath in dirwalk(path):
    found = False
    for item in ignore:
        if filepath.find(item) > 0:
            found = True
            break
    if found:
        continue
    # do stuff..
 
# The next two ways build a list of matches
# If there are any matches then skip this file
 
# This is a more functional style
for filepath in dirwalk(path):
    if filter(lambda item: filepath.find(item) != -1, ignore):
        continue
    # do stuff..
 
# Another way to do the same thing
for filepath in dirwalk(path):
    if [item for item in ignore if filepath.find(item) != -1]:
        continue
    # do stuff..

Interestingly, the last two use the same number of characters. I’m not sure which I prefer. While I suspect the final one would be considered the most Pythonic, I do have a soft spot for lambda. Eh, maybe someday when I’m not lazy I will see which is the fastest.. though that doesn’t really matter for my purposes.

More Reasons I Love Python..

Tuesday, May 19th, 2009

I was cleaning up some folders the other day at work where the files had been named using one of several naming schemes (or a few with no particular scheme at all). After brief consideration, I decided to do the legwork of renaming all the files with a naming scheme that actually makes sense:

Category_YYYY-MM-DD

That way, the files will stay grouped together if they get copied around to other folders, and they sort alphabetically by date. Then there’s the task for regenerating all the HTML for these baddies. Happily, Python was up to the task:

import os
from datetime import date
files = os.listdir("Path\\To\\File")
files.sort()
files.reverse()
for file in files:
    # chop the prefix, chop the suffix, split into (year, month, date), convert to int
    x = [int(x) for x in file.split("_")[-1][:-4].split('-')]
    print "<li><a href=\"/path/to/%s\">%s</a></li>" % (file, date(x[0], x[1], x[2]).strftime('%B %d, %Y'))

Well, it’s nothing like the real pros can do. But you gotta love a few links of code that save your fingers from a repetitive and typo-prone task like manually editing hundreds of links.

Python, PIL and Pretty Polaroids

Saturday, April 4th, 2009

I suspect that by now everyone and their grandmother has written a script to convert photos so they look like Polaroids. Yesterday I spent a slow morning at work replacing all our slideshows (which used a super-ugly Flash control) with these:

puppies-img_0253

puppies-img_0254

There’s plenty of other neat effects that could be done.. maybe add a bit of aging or apply some filters. But I think it looks pretty good. The following script uses Python, PIL (Python Imaging Library), and a pre-drawn “polaroid” frame.

import PIL, time, glob, random, os, sys
from PIL import Image, ImageOps, ImageEnhance, ImageDraw, ImageFont
 
# Generate Polaroid-looking images
def make_polaroid(infile, outfile, text=''):
    base = (300,320)    #size of polaroid background
    polaroid = Image.open('polaroid-0.png')
    polaroid = ImageOps.fit(polaroid, base, Image.ANTIALIAS, 0, (0.5,0.5))
 
    target = (272,248); # size of empty target area on polaroid background
    img = Image.open(infile)
    img = ImageOps.fit(img, target, Image.ANTIALIAS, 0, (0.5,0.5))
 
    #enhance the image a bit
    img = ImageOps.autocontrast(img, cutoff=2)
    img = ImageEnhance.Sharpness(img).enhance(2.0)
 
    #draw the text, if any
    font = ImageFont.truetype("arial.ttf", 16)
    text_size = ImageDraw.Draw(polaroid).textsize(text, font=font)
    fontxy = (base[0]/2 - text_size[0]/2, 278)
    ImageDraw.Draw(polaroid).text(fontxy, text, font=font, fill=(40,40,40))
 
    #copy the image onto the polaroid background
    imgcorner = (14,20) #paste image onto polaroid
    polaroid.paste(img, imgcorner)
 
    #copy the whole thing onto a larger background and rotate randomly
    angle = random.randint(-10,10)
    blank = Image.new(polaroid.mode, (400,400))
    blank.paste(polaroid, (blank.size[0]/2-polaroid.size[0]/2, blank.size[1]/2-polaroid.size[1]/2))
    blank = blank.rotate(angle, Image.BICUBIC)
 
    blank.save(outfile)
 
if __name__ == "__main__":
    # Takes 1 required argument -- the desired prefix for the output filename
    if len(sys.argv) &lt; 2:
        print "Missing required positional argument 'prefix'"
        exit()
 
    # Text to appear on image, use "" if none 
    text = "Some Text, or leave blank"
 
    # Erase everything in Output folder
    for f in glob.glob('output/*'):
        os.remove(f)
 
    # Create Polaroids of each JPG in Input folder
    files = [f[6:] for f in glob.glob('input/*.jpg')]
    for f in files:
        make_polaroid('input/'+f,'output/'+sys.argv[1]+'-'+f[:-4]+'.jpg',text)
 
    # Write index.html so Output folder can be copied/renamed elsewhere
    files = [f[7:] for f in glob.glob('output/*')]
    outhtml = open('output/index.html','w')
    outhtml.write("<html><head></head><body style='background-color: #000;'><div align='center'><p>")
 
    for i in range(len(files)):
        outhtml.write("<img src='%s' />" % (files[i]))
        if (i+1) % 2 == 0:
            outhtml.write("</p>")
    outhtml.write("</div></body></html>")
    outhtml.close()

The script is a bit over-specialized to my purpose .. converting a bunch of individual folders one at a time. So you may need to hack on it a bit to suit your needs. You can download the script here: polaroid.zip. Place files you want to convert into the “input” folder. Run the script with a single argument for the output filename prefix. It will take a few seconds or minutes to run, depending on how many photos you’re converting. When it finishes, copy the “output” folder elsewhere. The file “index.html” is pre-generated to contain all the photos in the folder.

Fetching Android Market Stats with Python, MozRepl, and BeautifulSoup

Thursday, April 2nd, 2009

A few weeks ago I was quite keen on the idea of gathering stats and creating charts to track the popularity of my Android apps. Alas, despite digging around in various packages and experimenting with cURL, I could never seem to get logged in programmatically to the Android Marketplace Developer Console. So I gave up to continue working on my next app. Now I’ve come up with another reason to do some screen-scraping, so I thought I should give this another try.

Half the magic here belongs to a very cool Firefox plugin called MozRepl which lets you open a telnet connection to Firefox and interact with it via Javascript. Awesome, no?

All you have to do is ask MozRepl to go to the Developer Console, download the HTML, and run it through BeautifulSoup (the rest of the magic) to extract the data.

It turns out to be just slightly trickier because MozRepl needs to talk to Python via Telnet. I suppose this script could be setup in cron to grabs stats a couple of times each day. I think I’m just gonna run it manually every once in awhile.

import BeautifulSoup, re, time
import os, telnetlib
# Install MozRepl Plugin
# http://wiki.github.com/bard/mozrepl
# Setup MozRepl to start automatically with FF, check that port number is 4242
# Login to Developer Console once manually so login credentials get saved
 
# Create a new profile and set this accordingly
# http://support.mozilla.com/en-US/kb/Managing+profiles
profile = 'my_firefox_profile'
 
# go to Developer Console using new profile
url = 'http://market.android.com/publish/Home'
os.system("firefox -no-remote -P %s %s &" % (profile, url))
time.sleep(5) #wait a sec for FF to start
 
#connect to MozRepl and fetch HTML
t = telnetlib.Telnet("localhost", 4242)
t.read_until("repl>")
t.write("content.document.body.innerHTML")
body = t.read_until("repl>")
t.close()
 
#is there a better way to do this?
os.system("killall -9 firefox")
 
#yank stats out of HTML
now = time.strftime("%Y-%m-%d %H:%M:%S")
soup = BeautifulSoup.BeautifulSoup(body)
table = soup.find("div", { "class" : "listingTable" })
for row in table.findAll('div', {'class':'listingRow'}):
  app = row.find("div", { "class" : "listingApp" })
  rating = row.find("div", { "class" : "listingRating" })
  stats = row.find("div", { "class" : "listingStats" })
  if app and rating and stats:
    name = app.next.next.string
    total = stats.next.string.split()[0]
    active = stats.next.nextSibling.string.split()[0]
    nratings = rating.next.string[1:-1]
    stars = len(rating.findAll(attrs={'style':re.compile("scroll -78px")}))
    print now, name, total, "total", active, "active", nratings, "ratings", stars, "stars"
#that's it, now maybe save these to a CSV or a log file..

I debated whether to show my actual numbers. Here you go, enjoy:

2009-04-03 17:45:15 Measure Stuff 4 total 1 active 2 ratings 1 stars
2009-04-03 17:45:15 Measure Stuff Lite 3006 total 995 active 28 ratings 2 stars
2009-04-03 17:45:15 RGB Probe 4 total 2 active 2 ratings 1 stars
2009-04-03 17:45:15 Thumb Maze 112 total 39 active 8 ratings 3 stars
2009-04-03 17:45:15 Thumb Maze Lite 16313 total 8813 active 172 ratings 3 stars

Uh oh, those numbers are not very good at all! So far my plan to live off Android looks doomed, but maybe things will pick up in the future. Two of the apps appear twice because there is a paid version and a free one. Can you tell which is which? =). Also, I think there is something wrong with RGB Probe. I’ve gotten a couple of e-mails saying the download failed.

So I hope folks will find this script useful. Obviously, use of this code is completely at your own risk. Screen scrapers are an arguably questionable enterprise, so don’t blame me if you hose your Firefox profile or Google gets mad at you.

Also, if anyone knows the cURL incantation that will do the same thing sans Firefox, I’d love to hear it. I kept getting a 302 response and never quite figured it out. I’ve taken several suggestions based on other Google services that ’should work’, but for some reason don’t.

There are certainly pros and cons to screen scraping through the browser; I’ll only point out two advantages: First, you get ‘real’ Javascript executed right in Firefox. With many of the big data sites being Ajax-heavy, simply fetching the HTML without executing the JS only gets you halfway there. Second, it is possible to detect and block screen scrapers by looking for unusual or suspicous request patterns. I don’t know if any sites actually do this, but it could be done. For example, a simple fetch via wget looks different to a server than a fetch with Firefox and it goes beyond User-Agents. The css, images, javascript, and such will also be fetched in a particular way and a server can look for anything unusual in the order or timing with which resources are requested. Sound crazy? You’re right! It probably is and I’m not sure anybody actually does this. In fact, it very possibly wouldn’t work well at all in practice. For one, it could screw up text-only browsers. But I think it is still within the realm of possibility..

Now for balance, two downsides: First, the browser needs a window to run in. This means it is kinda slow, hijacks your computer for a few seconds, and doesn’t really lend itself to parallelization. Second, tools like cURL and wget and many language-specific libraries are practically standard.

The Amazing World of COM

Wednesday, March 4th, 2009

COM is one of those things that I’d heard about but never really needed. Well yesterday I stumbled on a script for converting Word documents to PDF using COM and that got me thinking — “Can I save myself TONS of time by writing a few scripts to do the really boring and repetitive parts of my job?”. Indeed, the answer is yes.

Since Python is my first choice when there is a choice, I was pleased to discover the win32com package. Following is a very simple script using COM to convert Powerpoint slideshows from PPT to PPS (powerpoint show):

import os, win32com.client
doc_template_name = os.path.abspath('test.ppt')
doc_final_name = os.path.abspath('test.pps')
app = win32com.client.Dispatch("PowerPoint.Application")
app.Visible = True
doc_template = app.Presentations.Open(doc_template_name)
doc_final = app.Presentations[0]
doc_final.SaveAs(doc_final_name)
doc_template.Close()
app.Quit()

This script is heavily influenced by the many wonderful examples at: http://win32com.goermezer.de/

(This was supposed to be published more than a month ago.. I just noticed it was marked as a Draft, danggit)