64 bits is Too Many Bits

A seemingly simple move of NFS-exported home directories resulted in Apache suEXEC freaking out when referencing the new space. Here are the details and solution, because sometimes a debugging story is fun.

Friday

Setting: R&D environment of ~800 physical and virtual hosts running Windows, Linux, Solaris and using DAS, NAS, and SAN with NFS and OpenAFS. Our direct customers are scientists, researchers, and software and hardware engineers.

A small portion of our users, say 20%, have NFS-based home directories. Over the last ~18 years these have been served from varied hosts running SunOS 4.1 – 5.10. As part of our department’s ongoing weening from Sun/Oracle hardware, it was recently finally time to get this NFS data moved from the host (Sun-Fire V240). Our Isilon IQ 7200x 3-node cluster, which I’d personally had only limited interaction with (and no training with) to date, seemed the appropriate destination for the data.

After moving the data (twice[1]), changing automounter maps in LDAP, and successfully performing basic usability testing, I called it a night and ended the Planned Outage event. Because what better time to make changes than Friday at 10PM, right?

Monday

Jimbo: “Our project’s CGI scripts in Perl at http://lab.our.org/~jimbo/cgi-bin, that have been working fine for 8 years, are all giving 500 Internal Server errors. We use these scripts as part of a high-value production process.”

Let’s ignore for now everything wrong with the sentence above. I know.

I SSHed into lab.mitre.org and had a look through recent data in /var/log/httpd/lab-error_log. It showed nothing interesting.

Then I noticed that suexec_log had been written to recently. Having no idea what suexec_log even was, I looked through it to find that it’s the log for Apache httpd’s suEXEC mechanism. While I’m not entirely Apache httpd clueless, I don’t exactly deal with every nook of it on even a yearly basis. I read up on suEXEC’s role and functionality then returned to suexec_log:

cannot stat directory: (/home/jimbo/public_html/cgi-bin)

Okay, that explains the 500 HTTP errors. However:

lab.our.org# cd /home/jimbo/public_html/cgi-bin
lab.our.org# pwd
/home/jimbo/public_html/cgi-bin
lab.our.org#

From here, scratching my head for 30 minutes or so, I mucked around in the Isilon OneFS ACL for /home/jimbo, somewhat convinced I’d hit an ACL peculiarity like the previous one. That led nowhere, so it was time to examine just what the heck Apache httpd and its suexec helper were doing under the covers. Firing up strace -fq ... on the httpd PIDs while hitting one of Jimbo’s troubled CGI scripts, I saw this:

...
setgroups32(2, [1130, 67])  = 0
setuid32(21389)             = 0
getcwd("/home/jimbo/public_html/cgi-bin"..., 4096) = 32
chdir("/home/jimbo")        = 0
chdir("public_html")        = 0
getcwd("/home/jimbo/public_html"..., 4096) = 24
chdir("/home/jimbo/public_html/cgi-bin") = 0
lstat64("/home/jimbo/public_html/cgi-bin", {st_mode=S_IFDIR|0755, st_size=1955, ...}) = 0
write(2, "suexec policy violation: see sue"..., 57) = 57

According to the lstat() man page, a return code of 0 is success. The line following the lstat() call is writing to stderr that there was a suexec problem and to see our friend suexec_log for information. We already know how that goes. For some reason, lstat() was returning what we understand to be success and there was no further system call taking place before the code was logging an error line. Funky.

Pouring over the suexec.c code, this is the block delivering our line to suexec_log:

if (((lstat(cwd, &dir_info)) != 0) || !(S_ISDIR(dir_info.st_mode))) {
    log_err("cannot stat directory: (%s)\n", cwd);
    exit(115);
}

So, given this lazy code, it’s possible the lstat part of it was working fine but the POSIX S_ISDIR macro call was returning a false value. Note, too, that our strace showed lstat64() being called, which means in theory, “largefile” support is compiled in and enabled. Anyway, I messed around with a custom one-off version of the code, running it via the command-line for a bit, before I decided it was easier to just strip things down to the basics in my own new code.

Using lstat() and actually reporting what errno got set to on lstat failure lead me in the final right direction. On lab.our.org, which happens to be an old 32-bit CentOS 5.11 host, my code reported this:

Path:                   /home/jimbo/public_html/cgi-bin
File Size:              0 bytes
Number of Links:        0
File inode:             1580990986
File Permissions:       ----------
ERROR: lstat errno says "Value too large for defined data type"

As soon as I saw that, I ran the same code on one of our 64-bit hosts:

Path:                   /home/jimbo/public_html/cgi-bin
File Size:              1982 bytes
Number of Links:        4
File inode:             1580990986
File Permissions:       drwxr-xr-x
Success - Exiting 0

Searching for “Value too large for defined data type” in the EMC/Isilon support portal showed me that others had hit this as well, one even mentioning Apache httpd.

Turns out Isilon OneFS serves up 64-bit file IDs by default. Unless 32-bit applications allow for compilation with “largefile” support, and are compiled with it, they don’t know how to properly handle stat()/lstat() calls against 64-bit file-id-having files.

I hear you saying, “Right, but, your strace shows that the code was calling lstat64() which is supposed to handle things fine.”

I know, right? But, based on the instructions in the EMC/Isilon support portal, I forced 32-bit file IDs for the NFS export in question and everything immediately began functioning.

isibed-1# isi nfs exports modify 5 --return-32bit-file-ids=yes

Clearly something else is at play here. The lstat64() was in fact returning success (0) all along, but the fact remains that forcing 32-bit file IDs for the NFS export solved the problem. Maybe something in the POSIX S_ISDIR macro was failing, which doesn’t show in the strace output.

You could even go so far as to say that my one-off non-largefile-enabled code that showed the “Value too large…” error and led me to the EMC/Isilon support thread was an fortuitous goof that gave me a working solution. And I don’t 100% understand how it solved the issue.

Footnotes

  1. Prior to this debacle described in this main blog post, I’d used rsync -av /old/ /new to perform the NFS home directory migration the week before. Due to embarassingly poor testing on my part, I did not notice until the next morning that writes (and only writes) to user home directories by the users were not working. Examining the problem showed that each user did not even own his/her home directory. After an hour of digging around in various strace sessions, it became evident that only a data copy command issuing an actual chown(2) call resulted in the proper ownership of the destination data. Neither rsync or tar do this in the way OneFS wanted, but cp -rpP does.
    # cp -rpP
    chown("/newusers/testjblaine/sdfdsa", 26560, 0) = 0
    chown("/newusers/testjblaine", 26560, 0) = 0
    
    # rsync
    lchown("testjblaine", 26560, 0) = 0
    lchown("testjblaine/.sdfdsa.2Ba4BO", 26560, 0) = 0
    
    # tar
    fchownat(4, "testjblaine", 0x000067C0, 0x00000000, 0x00000000) = 0
    fchownat(4, "sdfdsa", 0x000067C0, 0x00000000, 0x00000000) = 0
    

Filed under DevOps, Linux, Sysadmin, UNIX/Linux

  • S. B.

    Clearly something else is at play here. The lstat64() was in fact
    returning success (0) all along, but the fact remains that forcing
    32-bit file IDs for the NFS export solved the problem. Maybe something
    in the POSIX S_ISDIR macro was failing, which doesn’t show in the strace output.

    Nah, glibc is trying to convert the result of the 64-bit call into the 32-bit fileid stat structure your CentOS 5.11 host’s apache is using. It can’t without losing information, so it returns a EOVERFLOW error. You can see the relevant code here: https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/xstatconv.c#L182-L217

    And I don’t 100% understand how it solved the issue.

    Forcing 32-bit file ids means the Isilon is doing some magic behind the scenes to map 64-bit fileids to 32-bit fileids for (older) NFS clients. You can’t explore more than 4 billion files in this mode, but if you have older clients it may be worth it.