Fork me on GitHub

Password Score



Introduction

Password Score is designed to give a realistic estimation of the strength of a password. When speaking of "strength" we must ask for a measure. A common measure for this purpose is based on information theory and called entropy. We will define the entropy of a password as follows: when $N$ is the number of guesses needed to crack a password with certainty the entropy is given by the base-2 logarithm of $N$.

A naive approach of estimating the number of guesses needed is using a brute-force approach. Given a password $p$ we take $N := n^{|p|}$ where $|p|$ is the length of $p$ and $n$ is the number of possible characters. The brute-force approach simply tries all possible combinations of $|p|$ characters. But due to human nature assuming a password to be a random sequence of characters is far to idealistic. Most of us tend to choose passwords made up of common words, names, special numbers - passwords which are easy to remember. So the naive approach highly overestimates the strength of a password.

Therefore every password cracking software uses dictionaries, lists of common passwords and names to give better performance. Password Score will search a given password for common words, passwords or names - or in general Password Score searches for patterns within the password. Other possible patterns are keyboard patterns like `qwerty` or sequences like `1234`. Instead of using random numbers we tend to use numbers which have a meaning like dates - birthdays or anniversaries of any kind.

Usage

Password Score can be used without dependencies. However, the Demonstration uses jQuery to visualize the results. Include Password Score as follows:

<!-- Main JS file (necessary). -->
<script type="text/javascript" src="dist/js/password-score.js"></script>
<!-- This file provides the default options (including dictionaries in english and german): -->
<script type="text/javascript" src="dist/js/password-score-options.js"></script>

The file password-score.js includes two classes: Score and Keyboard. See the API Documentation for details.

Pattern Matching

To use Password Score simply fetch a password and create a new Score():

var password = 'qwerty';
var score = new Score(password);

To calculate the entropy we will use the calculateEntropyScore(options) method. This method requires some configuration, for example which dictionaries to search or whether to search for keyboard patterns. This may look as follows:

var options = [
    {
        'type': 'dictionary',
        'leet': true,
        'key': 'english',
        'dictionary': englishDictionary
    },
    {
        'type': 'dictionary',
        'leet': false,
        'key': 'passwords',
        'dictionary': commonPasswords
    },
    {
        'type': 'keyboard',
        'key': 'qwerty',
        'keyboard': QWERTY
    }
];

// The calculateEntropyScore returns a score correpsonding to the calculated
// entropy of the password considering the above configuration.
console.log(score.calculateEntropyScore(options));

Based on the given options Password Score searches the password for all patterns it can find based on the given resources and calculates the entropy for all the patterns. Given a password consisting of $k$ non-overlapping patterns $p = p_1 \ldots p_k$ the entropy of $p$ is assumed to be the sum of the pattern entropies. Because we may have multiple overlapping patterns we then try to minimize the overall entropy as to underestimate the password rather than to overestimate it. Details on the different configuration options can be found int he next sections.

Password Score provides default options including several dictionaries: english, german, english country names, german country names, city names, last names, female first names, male first names, common passwords. See Data Sources for details. In additiona, Password scores is initially configured to search all of these dictionaries for Leet Speak and searches the password for Repititions, Sequences and Dates.

Beneath returning the password entropy, score.calculateEntropyScore(options) caches all found patterns and their entropies.

var options = [
    {
        'type': 'dictionary',
        'leet': true,
        'key': 'english' // Key to access found patterns.
        'dictionary': englishDictionary
     }
];
// Returns the calculated entropy.
console.log(score.calculateEntropyScore(options));
// These are the patterns which contribute to the minimum entropy, that is all
// patterns found in the given resources which - taken together - result
// in minimum entropy.
console.log(score.cache.minimumMatches);
// The patterns found within the english dictionary can be accessed using
// the 'key' confiugraiton option used above:
console.log(score.cache.english);

Dictionaries

As seen above Password Score can be configured to match against a given dictionary the following way:

var options = {
    // For using different dictionaries different keys can be used.
    'english' : {
        'type': 'dictionary',
        'key': 'english',
        'dictionary': englishDictionary
    },
};

A dictionary is simply an object where the keys represent the words, and the corresponding values represent scoring values which are used to calculate the entropy of the word when considered as pattern. Mostly these values will be a rank, that is more ocmmon words receive a higher rank than rare words, see the xample below. Dictionaries must not be common language dictionaries like english or german dictionaries. We can also use a list of common passwords or common names as dictionary:

var englishDictionary = {
    'you': 1, // Most used english word ...
    // ... 
    'housewife': 2154, // Higher rank -> less common word!
};
var commonPasswords = {
    'password': 1, // Most common password ...
    // ... 
    'd9ebk7': 8603, // Higher rank -> less common password!
};

As mentioned above the scoring value will be used to determine the entropy by simply taking the base-2 logarithm. We will use the scoring value as to differentiate between very common patterns and less common patterns - password is the most common password whereas d9ebk7 is not that common.

1337 - Leet Speak

Using a leet speak translation table, Password Score can search dictionaries for words which occur in leet speak within the password. This translation table looks like this:

leet = {
    '1': ['i', 'l'],
    '2': ['n', 'r', 'z'],
    '3': ['e'],
    '4': ['a'],
    '5': ['s'],
    '6': ['g'],
    '7': ['t'],
    '8': ['b'],
    '9': ['g', 'o'],
    '0': ['o'],
    '@': ['a'],
    '(': ['c'],
    '[': ['c'],
    '<': ['c'],
    '&': ['g'],
    '!': ['i'],
    '|': ['i', 'l'],
    '?': ['n', 'r'],        
    '$': ['s'],
    '+': ['t'],
    '%': ['x']
};

Given a word in leet speak, Password Score generates a list of all possible substitutions using the translation table. All possible substitutions are matched against a given dictionary. The entropy can be calculated by determining the number of possible leet speak versions of the dictionary word. To configure password score to apply leet speak detection on a given dictionary use:

var options = [
    {
        'type': 'dictionary',
        'leet': true, // Enables leet speak matching for this dictionary.
        'key': 'english',
        'dictionary': englishDictionary
    }
];

Keyboard Patterns

qwerty will always be within the top ten of the most common passwords simply because it is easy to remember on the corresponding keyboard. We will assume a keyboard pattern to be a path on the keyboard when considered as undirected graph. They QWERTY and QWERTZ keyboards are includes in Password Score per default. However, additional keyboard layouts may be used as follows:

var options = {
    {
        'type': 'keyboard',
        'keyboard': AZERTY // Your definition of the AZERTY keyboard ...
    }
};

Theoretically the entropy of a keyboard pattern is given by the number of possible beginnings multiplied by the number of possible next characters for each character within the pattern.

Of course, it is possible to define additional keyboard layouts, see the definition of the QWERTY keyboard below. Each key is assumed to have a maximum of six neighbors.

QWERTY: new Keyboard({
    '~': [null, null, null, '~`', '1!', null, null], // 1
    '`': [null, null, null, '~`', '1!', null, null], // 1
    '1': [null, null, '`~', '1!', '2@', null, 'qQ'], // 3
    '!': [null, null, '`~', '1!', '2@', null, 'qQ'], // 3
    '2': [null, null, '1!', '2@', '3#', 'qQ', 'wW'], // 4
    '@': [null, null, '1!', '2@', '3#', 'qQ', 'wW'], //4
    '3': [null, null, '2@', '3#', '4$', 'wW', 'eE'], //4
    '#': [null, null, '2@', '3#', '4$', 'wW', 'eE'], //4
    '4': [null, null, '3#', '4$', '5%', 'eE', 'rR'], //4
    '$': [null, null, '3#', '4$', '5%', 'eE', 'rR'], //4
    '5': [null, null, '4$', '5%', '6^', 'rR', 'tT'], //4
    '%': [null, null, '4$', '5%', '6^', 'rR', 'tT'], //4
    '6': [null, null, '5%', '6^', '7&', 'tT', 'yY'], //4
    '^': [null, null, '5%', '6^', '7&', 'tT', 'yY'], //4
    '7': [null, null, '6^', '7&', '8*', 'yY', 'uU'], //4
    '&': [null, null, '6^', '7&', '8*', 'yY', 'uU'], //4
    '8': [null, null, '7&', '8*', '9(', 'uU', 'iI'], // 4
    '*': [null, null, '7&', '8*', '9(', 'uU', 'iI'], // 4
    '9': [null, null, '8*', '9(', '0)', 'iI', 'oO'], // 4
    '(': [null, null, '8*', '9(', '0)', 'iI', 'oO'], // 4
    '0': [null, null, '9(', '0)', '-_', 'oO', 'pP'], // 4
    ')': [null, null, '9(', '0)', '-_', 'oO', 'pP'], // 4
    '-': [null, null, '0)', '-_', '=+', 'pP', '[{'], // 4
    '_': [null, null, '0)', '-_', '=+', 'pP', '[{'], // 4
    '=': [null, null, '-_', '=+', '\\|', '{[', '}]'], // 4
    '+': [null, null, '-_', '=+', '\\|', '{[', '}]'], // 4
    '\\': [null, null, '=+', '\\|', null, '}]', null], // 2
    '|': [null, null, '=+', '\\|', null, '}]', null], // 2
    'q': ['1!', '2@', null, 'qQ', 'wW', null, 'aA'], // 4
    'Q': ['1!', '2@', null, 'qQ', 'wW', null, 'aA'], // 4
    'w': ['2@', '3#', 'qQ', 'wW','eE', 'aA', 'sS'], // 6
    'W': ['2@', '3#', 'qQ', 'wW','eE', 'aA', 'sS'], // 6
    'e': ['3#', '4$', 'wW', 'eE', 'rR', 'sS', 'dD'], // 6
    'E': ['3#', '4$', 'wW', 'eE', 'rR', 'sS', 'dD'], // 6
    'r': ['4$', '5%', 'eE', 'rR', 'tT', 'dD', 'fF'], // 6
    'R': ['4$', '5%', 'eE', 'rR', 'tT', 'dD', 'fF'], // 6
    't': ['5%', '6^', 'rR', 'tT', 'yY', 'fF', 'gG'], // 6
    'T': ['5%', '6^', 'rR', 'tT', 'yY', 'fF', 'gG'], // 6
    'y': ['6^', '7&', 'tT', 'yY', 'uU', 'gG', 'hH'], // 6
    'Y': ['6^', '7&', 'tT', 'yY', 'uU', 'gG', 'hH'], // 6
    'u': ['7&', '8*', 'yY', 'uU', 'iI', 'hH', 'jJ'], // 6
    'U': ['7&', '8*', 'yY', 'uU', 'iI', 'hH', 'jJ'], // 6
    'i': ['8*', '9(', 'uU', 'iI', 'oO', 'jJ', 'kK'], // 6
    'I': ['8*', '9(', 'uU', 'iI', 'oO', 'jJ', 'kK'], // 6
    'o': ['9(', '0)', 'iI', 'oO', 'pP', 'kK', 'lL'], // 6
    'O': ['9(', '0)', 'iI', 'oO', 'pP', 'kK', 'lL'], // 6
    'p': ['0)', '-_', 'oO', 'pP', '[{', 'lL', ':;'], // 6
    'P': ['0)', '-_', 'oO', 'pP', '[{', 'lL', ':;'], // 6
    '[': ['-_', '=+', 'pP', '[{', ']}', ':;', '\'"'], // 6
    '{': ['-_', '=+', 'pP', '[{', ']}', ':;', '\'"'], // 6
    ']': ['=+', '\\|', '[{', ']}', null, '\'"', null], // 4
    '}': ['=+', '\\|', '[{', ']}', null, '\'"', null], // 4
    'a': ['qQ', 'wW', null, 'aA', 'sS', null, 'zZ'], // 4
    'A': ['qQ', 'wW', null, 'aA', 'sS', null, 'zZ'], // 4
    's': ['wW', 'eE', 'aA', 'sS', 'dD', 'zZ', 'xX'], // 6
    'S': ['wW', 'eE', 'aA', 'sS', 'dD', 'zZ', 'xX'], // 6
    'd': ['eE', 'rR', 'sS', 'dD', 'fF', 'xX', 'cC'], // 6
    'D': ['eE', 'rR', 'sS', 'dD', 'fF', 'xX', 'cC'], // 6
    'f': ['rR', 'tT', 'dD', 'fF', 'gG', 'cC', 'vV'], // 6
    'F': ['rR', 'tT', 'dD', 'fF', 'gG', 'cC', 'vV'], // 6
    'g': ['tT', 'yY', 'fF', 'gG', 'hH', 'vV', 'bB'], // 6
    'G': ['tT', 'yY', 'fF', 'gG', 'hH', 'vV', 'bB'], // 6
    'h': ['yY', 'uU', 'gG', 'hH', 'jJ', 'bB', 'nN'], // 6
    'H': ['yY', 'uU', 'gG', 'hH', 'jJ', 'bB', 'nN'], // 6
    'j': ['uU', 'iI', 'hH', 'jJ', 'kK', 'nN', 'mM'], // 6
    'J': ['uU', 'iI', 'hH', 'jJ', 'kK', 'nN', 'mM'], // 6
    'k': ['iI', 'oO', 'jJ', 'kK', 'lL', 'mM', ',;'], // 6
    'K': ['iI', 'oO', 'jJ', 'kK', 'lL', 'mM', ',;'], // 6
    'l': ['oO', 'pP', 'kK', 'lL', ':;', ',<', '.>'], // 6
    'L': ['oO', 'pP', 'kK', 'lL', ':;', ',<', '.>'], // 6
    ':': ['pP', '[{', 'lL', ':;', '\'"', '.>', '?/'], // 6
    ';': ['pP', '[{', 'lL', ':;', '\'"', '.>', '?/'], // 6
    '\'': ['[{', ']}', ':;', '\'"', null, '?/', null], // 5
    '"': ['[{', ']}', ':;', '\'"', null, '?/', null], // 5
    'z': ['aA', 'sS', null, 'zZ', 'xX', null, null], // 4
    'Z': ['aA', 'sS', null, 'zZ', 'xX', null, null], // 4
    'x': ['sS', 'dD', 'zZ', 'xX', 'cC', null, null], // 4
    'X': ['sS', 'dD', 'zZ', 'xX', 'cC', null, null], // 4
    'c': ['dD', 'fF', 'xX', 'cC', 'vV', null, null], // 4
    'C': ['dD', 'fF', 'xX', 'cC', 'vV', null, null], // 4
    'v': ['fF', 'gG', 'cC', 'vV', 'bB', null, null], // 4
    'V': ['fF', 'gG', 'cC', 'vV', 'bB', null, null], // 4
    'b': ['gG', 'hH', 'vV', 'bB', 'nN', null, null], // 4
    'B': ['gG', 'hH', 'vV', 'bB', 'nN', null, null], // 4
    'n': ['hH', 'jJ', 'bB', 'nN', 'mM', null, null], // 4
    'N': ['hH', 'jJ', 'bB', 'nN', 'mM', null, null], // 4
    'm': ['jJ', 'kK', 'nN', 'mM', ',<', null, null], // 4
    'M': ['jJ', 'kK', 'nN', 'mM', ',<', null, null], // 4
    ',': ['kK', 'lL', 'mM', ',<', '.>', null, null], // 4
    '<': ['kK', 'lL', 'mM', ',<', '.>', null, null], // 4
    '.': ['lL', ':;', ',<', '.>', '/?', null, null], // 4
    '>': ['lL', ':;', ',<', '.>', '/?', null, null], // 4
    '/': [':;', '\'"', '.>', '/?', null, null, null], // 3
    '?': [':;', '\'"', '.>', '/?', null, null, null] // 3
}, 4.571428571), // Average number of neighbors.

Dates

Per default, Password Score searches the given password for dates. However, to configure Password Score to do so manually, use:

var options = {
    {
        'type': 'dates'
    }
};

Dates are hard to catch because they may occur in many different formats. They may consist only of a day and a month or only of a month and a year. Years on its own are common, too - my first passwords contained the last two digits of my year of birth. Fortunately regular expressions can be used to scan efficiently for the different formats of dates:

'(0?[1-9]|1[012])([\- \/.])?(0?[1-9]|[12][0-9]|3[01])([\- \/.])?([0-9]{4})' // Middle-endian, four digit year.
'(0?[1-9]|[12][0-9]|3[01])([\- \/.])?(0?[1-9]|1[012])([\- \/.])?([0-9]{2})' // Little-endian, two digit year.
// ...

The number of possible dates is dependent on the format. In general we simply take $31 \cdot 12 \cdot y$ where $y$ is the number of years being considered. When assuming $y$ to be too large we will not get any difference from considering a random eight (or six) digit number with $10^8$ respectively $10^6$ possible combinations. Therefore, choosing $y$ to be around $100$ or $200$ will be a good and realistic choice.

Sequences and Repetitions

Password Score is able to search for number sequences and substrings of the alphabet (and does so per default):

var options = [
    {
        'type': 'sequences'
    }
];

Reversed sequences are checked as well. The entropy of a sequence is only influenced by the possibilities for the first character and the length.

In addition, Password score searches for repititions (also per default):

var options = [
    {
        'type': 'repetition'
    }
];

The entropy of single character repetitions is determined the same way as for sequences. When repeating multiple characters the entropy is determined by the length of the sequence to be repeated and the number of repetitions.

Data Sources

See below for the data used for the dictionaries:

For using the raw data a simple PHP script can be used to generate JSON files with the appropriate format.

References

Mainly inspired by "zxcvbn: realistic password strength estimation", Dropbox Tech Blog.


© 2013 - 2018 David Stutz - BSD 3-Clause License - Impressum - Datenschutz