Brief Tour of the Standard Library – Part II

標準ライブラリツアーパート2です。この章でチュートリアルはほぼ終了です。頑張っていきます。次はApp Engineです。

Output Formatting

まずはじめは出力フォーマットを整えるためのモジュールです。

>>> import repr
>>> repr.repr(set('supercalifragilisticexpialidocious'))
"set(['a', 'c', 'd', 'e', 'f', 'g', ...])"

reprモジュールのrepr関数はたくさんの情報を「...」という省略記号を用いて表示します。面白い機能ですね。

>>> import pprint
>>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
...     'yellow'], 'blue']]]
>>> pprint.pprint(t, width=30)
[[[['black', 'cyan'],
   'white',
   ['green', 'red']],
  [['magenta', 'yellow'],
   'blue']]]

pprintモジュールはネストなどした複雑なデータ構造をインデントや改行で分かりやすく整形してくれます。

>>> doc = """The wrap() method is just like fill() except that it returns
... a list of strings instead of one big string with newlines to separate
... the wrapped lines."""
>>> print textwrap.fill(doc, width=40)
The wrap() method is just like fill()
except that it returns a list of strings
instead of one big string with newlines
to separate the wrapped lines.

textwrapモジュールは指定された幅の中で単語が途中で切れないように改行してくれます。そして最後にlocaleモジュール。こちらはいわゆるロケールに合わせて整形してくれるモジュールです。

>>> locale.setlocale(locale.LC_ALL, 'Englis_United States. 1252')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/locale.py", line 478, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

と思ったら、あれれ？Unspported localeだって。う〜ん？？？と思い、いろいろ調べてみるとこのロケールはOSに依存しているようです。つまりOSがサポートしているロケールじゃないと↑のようにエラーが発生してしまいます。ちなみに私の環境で対応しているロケールはこんな感じ。

% locale -a
C
POSIX
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
ja_JP.utf8

？？？全然違うじゃん、文字列が。ということで気を取り直して再チャレンジ。

>>> locale.setlocale(locale.LC_ALL, 'ja_JP.utf8')
'ja_JP.utf8'
>>> locale.format("%d", 1234567.8, grouping=True)
'1,234,567'
>>> conv = locale.localeconv()
>>> locale.format("%s%.*f", (conv['currency_symbol'], conv['frac_digits'], x), grouping=True)
'\xef\xbf\xa51,234,568'

Templating

続いてテンプレート機能を実現するためのモジュールです。これはなかなか強力です。テンプレートとなる文字列を作成しておき、後から部分部分を書き換えて利用するわけです。

>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

↑の例では%{village}にNottingham、$causeにthe ditch fundを当てはめています。まさしくテンプレートですね。また全ての箇所を当てはめなくてもよいように作られています。

>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unlanden swallow')
>>> t.substitute(d)    # substituteメソッドは全て埋めないと怒られますが
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/string.py", line 170, in substitute
    return self.pattern.sub(convert, self.template)
  File "/usr/lib/python2.5/string.py", line 160, in convert
    val = mapping[named]
KeyError: 'owner'
>>> t.safe_substitute(d)    # safe_substituteメソッドなら中途半端でもOK
'Return the unlanden swallow to $owner.'

またdelimiterという変数を上書き (Override!) すれば、テンプレートを自分好みにカスタムすることが出来ます。

Working with Binary Data Record Layouts

次はちょっと不思議な (？) モジュールです。私はこういうモジュール好きです。バイナリデータのフォーマットを指定するとその通りにデータを抜き出してくれます。イメージ的にはC言語で言うと、バイナリデータを構造体にそのままキャストしてぶち込む感覚でしょうかね。チュートリアルに掲載されていたサンプルを愚直に試してみます。3つのファイルをzipでまとめて圧縮しておき、各圧縮ファイルの

ファイル名
CRC32値
圧縮サイズ
解凍サイズ

を出力します。まずは適当に3つのファイルを圧縮します。

% cat test1.txt
This is first test file.
% cat test2.txt
This is second test file.
% cat test3.txt
This is third test file.
% zip test.zip test1.txt test2.txt test3.txt 
  adding: test1.txt (deflated 4%)
  adding: test2.txt (stored 0%)
  adding: test3.txt (stored 0%)
% file test.zip 
test.zip: Zip archive data, at least v2.0 to extract

次にスクリプトファイルを作成します (test_unpack.py) 。

#!/usr/bin/python2.5
#! -*- coding: utf-8 -*-
import struct

def unpack(file):
    data = open(file, 'rb').read()
    start = 0
    for i in range(3):
        start += 14

        # ここ！
        fields = struct.unpack('<IIIHH', data[start:start + 16])
        crc32, comp_size, uncomp_size, filename_size, extra_size = fields

        start += 16
        filename = data[start:start + filename_size]

        start += filename_size
        extra = data[start:start + extra_size]

        print filename, hex(crc32), comp_size, uncomp_size

        start += extra_size + comp_size

unpack時の型指定ですが、

指定	説明
	リトルエンディアンであることを示す
I	2バイトデータであることを示す
H	4バイトデータであることを示す

です。その他の指定はこちらを参照してください。ではスクリプトを実行してみます。

>>> import test_unpack
>>> test_unpack.unpack('test.zip')
test1.txt 0xd01431dfL 24 25
test2.txt 0xdab9a6eeL 26 26
test3.txt 0xe8fd7e9 25 25

おぉ〜、ちゃんとunpack出来ていますね！こういうのは実際に手を動かして試してみないと分かりませんね。

Multi-threading

最後にマルチスレッドについて。まぁ今日びのプログラミング言語でスレッドを扱えないものはありませんよね。ですが、App EngineではPythonのスレッドモジュールは利用することは出来ません。

An App Engine application cannot:
...
spawn a sub-process or thread.
...
http://code.google.com/intl/en/appengine/docs/python/runtime.html#The_Sandbox

App Engineでは基本的にApp Engineが管理するスレッド (シングルスレッド！) がHTTPリクエストを処理します。個々で勝手にスレッドを起動して重い処理を走らせてしまうとApp Engine上のリソースがいたずらに消費されてしまいます。SandBoxという性質を維持することが難しくなるのわけなのです。HTTP Requestとは別のタイミングで処理をしたければ、TaskQueueを使う必要があります。