When Shell Globbing Attacks

Since when does “S” fall in the range a-z? Beware of the LC_COLLATE environment variable.

somehost# mkdir STOCK
somehost# mv [a-z]* STOCK
mv: cannot move `STOCK' to a subdirectory of itself, `STOCK/STOCK' 
somehost#

It did move everything but “STOCK”. Let’s move the files back and fire up strace:

somehost# mv STOCK/* .
somehost# strace -f mv [a-z]* STOCK
...
execve("/bin/mv", ["mv", "lmhosts", "secrets.tdb", "smb.conf", "smbpasswd", "smbusers",
"STOCK", "STOCK"], [/* 28 vars */]) = 0
...

Happens at least under with BASH 2.05b1 and BASH 3.2. At the smart prodding of a coworker, let’s check the manpage for bash:

nocaseglob        If set, bash matches filenames in a case-insensitive fashion when performing
                       pathname expansion (see Pathname Expansion above).

No, “set -o” shows that’s not set.

Pattern Matching

[...]                 Matches any one of the enclosed characters. A pair of characters
                      separated by a hyphen denotes a range expression; any character that
                      sorts between those two characters, inclusive, using the current
                      locale's collating sequence and character set, is matched. If the first
                      character  following the ‘[’ is a ‘!’ or a ‘^’ then any character not
                      enclosed is matched. A ‘−’ may be matched by including it as the first or
                      last character in the set. A ‘]’ may be matched by including it as the
                      first character in the set. The sorting order of characters in range
                      expressions is determined by the current locale and the value of the
                      LC_COLLATE shell variable, if set.

                      For example, in the default C locale, ‘[a-dx-z]’ is equivalent to '[abcdxyz]’.
                      Many locales sort characters in dictionary order, and in these locales
                      ‘[a-dx-z]’ is typically not equivalent to ‘[abcdxyz]’; it might be equivalent
                      to ‘[aBbCcDdxXyYz]’, for example. To obtain the traditional interpretation
                      of ranges in bracket expressions, you can force the use of the C locale
                      by setting the LC_COLLATE or LC_ALL environment variable to the value
                      ‘C’.

Hmmmmm.

somehost# echo $LC_COLLATE
en_US.UTF-8
somehost# LC_COLLATE=C
somehost# mv STOCK/* .
somehost# mv [a-z]* STOCK
somehost#

Leave a Reply

Your email address will not be published. Required fields are marked *