summaryrefslogtreecommitdiff
path: root/mkpasswd
blob: 3d681b0c3fd09b88702ae84750df9b45421820fe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Generate a random, memorizable password: http://xkcd.com/936/

Example run:

    kragen at inexorable:~/devel/inexorable-misc$ ./mkpass.py 5 12
    Your password is "learned damage saved residential stages".
    That's equivalent to a 60-bit key.

    That password would take 2.5e+03 CPU-years to crack
    on my inexpensive Celeron E1200 from 2008,
    assuming an offline attack on a MS-Cache hash,
    which is the worst password hashing algorithm in common use,
    slightly worse than even simple MD5.

    The most common password-hashing algorithm these days is FreeBSD’s
    iterated MD5; cracking such a hash would take 5.2e+06 CPU-years.

    But a modern GPU can crack about 250 times as fast,
    so that same iterated MD5 would fall in 2e+04 GPU-years.

    That GPU costs about US$1.45 per day to run in 2011,
    so cracking the password would cost about US$3e+09.

I've started using a password generated this way in place of a 9-printable-
ASCII-character random password, which is equally strong.  Munroe's assertion
that these passwords are much easier to memorize is correct.  However, there is
still a problem: because there are many fewer bits of entropy per character
(about 1.7 instead of 6.6) there is a lot of redundancy in the password, and so
attacks such as the ssh timing-channel attack (the Song, Wagner, and Tian
Herbivore attack, which I learned about from Bram Cohen in the Bagdad Café in
the wee hours one morning, years ago) and keyboard audio recording attacks have
a much better chance of capturing enough information to make the password
attackable.

My countermeasure to the Herbivore attack, which works well with 9-character
password but is extremely annoying with my new password, is to type the
password with a half-second delay between characters, so that the timing
channel does not carry much information about the actual characters used.
Additionally, the lower length of the 9-character password inherently gives the
Herbivore approach much less information to chew on.

Other possible countermeasures include using Emacs shell-mode, which prompts
you locally for the password when it recognizes a password prompt and then
sends the whole password at once, and copying and pasting the password from
somewhere else.

As you'd expect, this password also takes a little while longer to type: about
6 seconds instead of about 3 seconds.

"""

import random, itertools, os, sys

def main(argv):
    try:
        nwords = int(argv[1])
    except IndexError:
        return usage(argv[0])

    try:
        nbits = int(argv[2])
    except IndexError:
        nbits = 11

    filename = os.path.join(os.environ['HOME'], 'opt', 'share', 'wordlist')
    wordlist = read_file(filename, nbits)
    if len(wordlist) != 2**nbits:
        sys.stderr.write("%r contains only %d words, not %d.\n" %
                         (filename, len(wordlist), 2**nbits))
        return 2

    display_password(generate_password(nwords, wordlist), nwords, nbits)
    return 0

def usage(argv0):
    p = sys.stderr.write
    p("Usage: %s nwords [nbits]\n" % argv0)
    p("Generates a password of nwords words, each with nbits bits\n")
    p("of entropy, choosing words from the first entries in\n")
    p("$HOME/devel/wordlist, which should be in the same format as\n")
    p("<http://canonical.org/~kragen/sw/wordlist>, which is a text file\n")
    p("with one word per line, preceded by its frequency, most frequent\n")
    p("words first.\n")
    p("\nRecommended:\n")
    p("    %s 5 12\n" % argv0)
    p("    %s 6\n" % argv0)
    return 1

def read_file(filename, nbits):
    return [line.split()[1] for line in
            itertools.islice(open(filename), 2**nbits)]

def generate_password(nwords, wordlist):
    choice = random.SystemRandom().choice
    return ' '.join(choice(wordlist) for ii in range(nwords))

def display_password(password, nwords, nbits):
    print 'Your password is "%s".' % password
    entropy = nwords * nbits
    print "That's equivalent to a %d-bit key." % entropy
    print

    # My Celeron E1200
    # (<http://ark.intel.com/products/34440/Intel-Celeron-Processor-E1200-(512K-Cache-1_60-GHz-800-MHz-FSB)>)
    # was released on January 20, 2008.  Running it in 32-bit mode,
    # john --test (<http://www.openwall.com/john/>) reports that it
    # can do 7303000 MD5 operations per second, but I’m pretty sure
    # that’s a single-core number (I don’t think John is
    # multithreaded) on a dual-core processor.
    t = years(entropy, 7303000 * 2)
    print "That password would take %.2g CPU-years to crack" % t
    print "on my inexpensive Celeron E1200 from 2008,"
    print "assuming an offline attack on a MS-Cache hash,"
    print "which is the worst password hashing algorithm in common use,"
    print "slightly worse than even simple MD5."
    print

    t = years(entropy, 3539 * 2)
    print "The most common password-hashing algorithm these days is FreeBSD’s"
    print "iterated MD5; cracking such a hash would take %.2g CPU-years." % t
    print

    # (As it happens, my own machines use Drepper’s SHA-2-based
    # hashing algorithm that was developed to replace the one
    # mentioned above; I am assuming that it’s at least as slow as the
    # MD5-crypt.)

    # <https://en.bitcoin.it/wiki/Mining_hardware_comparison> says a
    # Core 2 Duo U7600 can do 1.1 Mhash/s (of Bitcoin) at a 1.2GHz
    # clock with one thread.  The Celeron in my machine that I
    # benchmarked is basically a Core 2 Duo with a smaller cache, so
    # I’m going to assume that it could probably do about 1.5Mhash/s.
    # All common password-hashing algorithms (the ones mentioned
    # above, the others implemented in John, and bcrypt, but not
    # scrypt) use very little memory and, I believe, should scale on
    # GPUs comparably to the SHA-256 used in Bitcoin.

    # The same mining-hardware comparison says a Radeon 5870 card can
    # do 393.46 Mhash/s for US$350.

    print "But a modern GPU can crack about 250 times as fast,"
    print "so that same iterated MD5 would fall in %.1g GPU-years." % (t / 250)
    print

    # Suppose we depreciate the video card by Moore’s law,
    # i.e. halving in value every 18 months.  That's a loss of about
    # 0.13% in value every day; at US$350, that’s about 44¢ per day,
    # or US$160 per GPU-year.  If someone wanted your password as
    # quickly as possible, they could distribute the cracking job
    # across a network of millions of these cards.  The cards
    # additionally use about 200 watts of power, which at 16¢/kWh
    # works out to 77¢ per day.  If we assume an additional 20%
    # overhead, that’s US$1.45/day or US$529/GPU-year.
    cost_per_day = 1.45
    cost_per_crack = cost_per_day * 365 * t
    print "That GPU costs about US$%.2f per day to run in 2011," % cost_per_day
    print "so cracking the password would cost about US$%.1g." % cost_per_crack

def years(entropy, crypts_per_second):
    return float(2**entropy) / crypts_per_second / 86400 / 365.2422

if __name__ == '__main__':
    sys.exit(main(sys.argv))