{"id":1000,"date":"2011-01-02T23:13:44","date_gmt":"2011-01-03T04:13:44","guid":{"rendered":"http:\/\/www.kickflop.net\/blog\/?p=1000"},"modified":"2011-03-01T19:38:10","modified_gmt":"2011-03-02T00:38:10","slug":"tracing-linux-hostname-resolution","status":"publish","type":"post","link":"https:\/\/www.kickflop.net\/blog\/2011\/01\/02\/tracing-linux-hostname-resolution\/","title":{"rendered":"Tracing Linux Hostname Resolution"},"content":{"rendered":"<p><em>This post is a living document.  Updates since the original publish dates are noted inline as such.<\/em><\/p>\n<p><em>Update 1\/11\/2011: Nearly all of this article is bunk now, as it turns out the GNU libc developers consider getent (the basis of everything below) to be a debugging tool only.  As such, it does abnormal things.  I suggested the man page should indicate as much.  Anyway, here it is&#8230;<\/em><\/p>\n<p>Let&#8217;s examine hostname resolution on a RHEL 5.5 box on a Sunday night.  I was inspired from reading <a target=\"_blank\" href=\"http:\/\/sysadvent.blogspot.com\/2010\/12\/day-15-down-ls-rabbit-hole.html\">Down the &#8216;ls&#8217; Rabbit Hole<\/a> 2 weeks ago.  I suspect any other modern Linux distro will provide nearly identical results.<!--more--><\/p>\n<p>The short summary is:<\/p>\n<ol>\n<li>Read \/etc\/resolv.conf<\/li>\n<li>Try to use nscd<\/li>\n<li>Try to use nscd again<\/li>\n<li>Read \/etc\/nsswitch.conf<\/li>\n<li>Load libnss_files.so<\/li>\n<li>Read \/etc\/host.conf<\/li>\n<li>Try to find IPv6 address in \/etc\/hosts<\/li>\n<li>Load libnss_dns.so<\/li>\n<li>Load libresolv.so<\/li>\n<li>Perform DNS IPv6 &#8216;AAAA&#8217; query<\/li>\n<li>Try to find IPv4 address in \/etc\/hosts<\/li>\n<li>Perform DNS IPv4 &#8216;A&#8217; query<\/li>\n<\/ol>\n<p>Read on for the full trace with commentary.<\/p>\n<pre>\r\nstrace -f getent hosts www.puppetlabs.com\r\n...\r\nopen(\"\/etc\/resolv.conf\", O_RDONLY)      = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Looking at the source for <a target=\"_blank\" href=\"http:\/\/ftp.gnu.org\/gnu\/glibc\/\n\">GNU libc<\/a> 2.5 (which is what is installed on this box), it appears that \/etc\/resolv.conf is loaded in resolv\/res_init.c and the explanation is given as:<\/p>\n<pre>\r\n\/*\r\n * Resolver state default settings.\r\n *\/\r\n\r\n\/*\r\n * Set up default settings.  If the configuration file exist, the values\r\n * there will have precedence.  Otherwise, the server address is set to\r\n * INADDR_ANY and the default domain name comes from the gethostname().\r\n *\r\n * An interrim version of this code (BIND 4.9, pre-4.4BSD) used 127.0.0.1\r\n * rather than INADDR_ANY (\"0.0.0.0\") as the default name server address\r\n * since it was noted that INADDR_ANY actually meant ``the first interface\r\n * you \"ifconfig\"'d at boot time'' and if this was a SLIP or PPP interface,\r\n * it had to be \"up\" in order for you to reach your own name server.  It\r\n * was later decided that since the recommended practice is to always\r\n * install local static routes through 127.0.0.1 for all your network\r\n * interfaces, that we could solve this problem without a code change.\r\n *\r\n * The configuration file should always be used, since it is the only way\r\n * to specify a default domain.  If you are running a server on your local\r\n * machine, you should say \"nameserver 0.0.0.0\" or \"nameserver 127.0.0.1\"\r\n * in the configuration file.\r\n *\r\n * Return 0 if completes successfully, -1 on error\r\n *\/\r\n<\/pre>\n<p>Okay.  I guess.  Let&#8217;s move on.<\/p>\n<pre>\r\n...\r\nsocket(PF_FILE, SOCK_STREAM, 0)         = 3\r\nfcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0\r\nconnect(3, {sa_family=AF_FILE, path=\"\/var\/run\/nscd\/socket\"...}, 110) = -1 ENOENT\r\n(No such file or directory)\r\nclose(3)                                = 0\r\nsocket(PF_FILE, SOCK_STREAM, 0)         = 3\r\nfcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0\r\nconnect(3, {sa_family=AF_FILE, path=\"\/var\/run\/nscd\/socket\"...}, 110) = -1 ENOENT\r\n(No such file or directory)\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Why did you check nscd twice?<\/p>\n<p>GNU libc nscd\/nscd_helper.c is the only place with a connect() call referencing \/var\/run\/nscd\/socket (aka _PATH_NSCDSOCKET as defined in nscd\/nscd-client.h).  The connect() is in open_socket(), which is referenced in two places:<\/p>\n<p>One:<\/p>\n<pre>\r\n\/* Try to get a file descriptor for the shared memory segment\r\n   containing the database.  *\/\r\nstatic struct mapped_database *\r\nget_mapping (request_type type, const char *key,\r\n             struct mapped_database **mappedp)\r\n<\/pre>\n<p>Two:<\/p>\n<pre>\r\n\/* Create a socket connected to a name. *\/\r\nint\r\n__nscd_open_socket (const char *key, size_t keylen, request_type type,\r\n                    void *response, size_t responselen)\r\n<\/pre>\n<p>Here I took it upon myself to try to build the GNU libc code I was referencing.  I figured I&#8217;d build it with debug symbols and then run getent again under gdb.  The build with CFLAGS=-g spit out an error saying that it <em>must<\/em> be built with optimization.  So much for that, but I did at least throw in some syslog() calls.  For one, the two attempts to connect to an nscd socket above are in fact from both referenced functions.<\/p>\n<p><em>Update 1\/11\/2011: This shows my lack of gdb knowledge.  One doesn&#8217;t need to build in debugging symbols to see what I am trying to see.  Commenter Dave W. shows that below with his traces.<\/em><\/p>\n<pre>\r\nJan  3 03:57:33 new-host-2 getent: get_mapping() trying to open nscd socket with\r\nopen_socket()\r\nJan  3 03:57:33 new-host-2 getent: __nscd_open_socket() trying to open nscd socket\r\nwith open_socket() with open_socket()\r\n<\/pre>\n<p>Is that correct behavior?  Could it be better?  Beats me.  I&#8217;m only taking it that far, but it doesn&#8217;t seem ideal.<\/p>\n<pre>\r\n...\r\nopen(\"\/etc\/nsswitch.conf\", O_RDONLY)    = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Now we actually get somewhere.  At least we&#8217;re reading the right configuration file at this point.<\/p>\n<p>This is generated from GNU libc nss\/nsswitch.c<\/p>\n<pre>\r\nint\r\n__nss_database_lookup (const char *database, const char *alternate_name,\r\n                       const char *defconfig, service_user **ni)\r\n{\r\n...\r\n    service_table = nss_parse_file (_PATH_NSSWITCH_CONF);\r\n<\/pre>\n<p>Fine, moving on.<\/p>\n<pre>\r\nopen(\"\/lib64\/libnss_files.so.2\", O_RDONLY) = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>This is due to &#8220;files&#8221; being first in \/etc\/nsswitch.conf.  Fine.<\/p>\n<pre>\r\n...\r\nopen(\"\/etc\/host.conf\", O_RDONLY)        = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>The hell?  You already found a valid \/etc\/nsswitch.conf.  Why would you query this stupid old legacy file?<\/p>\n<p>nss\/getXXbyYY_r.c causes this read of \/etc\/host.conf<\/p>\n<pre>\r\n#ifdef NEED__RES_HCONF\r\n          if (!_res_hconf.initialized)\r\n            _res_hconf_init ();\r\n#endif \/* need _res_hconf *\/\r\n<\/pre>\n<p>Turns out this is hardcoded and not managed\/overriden in any way by configure.<\/p>\n<pre>\r\n[jblaine@new-host-2 glibc-2.5]$ grep \"#define NEED__RES_HCONF\" *\/*\r\ninet\/gethstbyad_r.c:#define NEED__RES_HCONF     1\r\ninet\/gethstbynm2_r.c:#define NEED__RES_HCONF    1\r\ninet\/gethstbynm_r.c:#define NEED__RES_HCONF     1\r\n<\/pre>\n<p><strong>???<\/strong> &#8211; feel free to provide a comment on this below.  I don&#8217;t understand the need for this nowadays when we have \/etc\/nsswitch.conf.<\/p>\n<pre>\r\n...\r\nopen(\"\/etc\/hosts\", O_RDONLY)            = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Makes sense finally, at least if this was the result of doing what our \/etc\/nsswitch.conf said (&#8220;files dns&#8221;).<\/p>\n<p><em>Update 1\/11\/2011: Oddly, this <em>first<\/em> opening of \/etc\/hosts is due to trying to resolve www.puppetlabs.com via an IPv6 address.<\/em><\/p>\n<pre>\r\nopen(\"\/lib64\/libnss_dns.so.2\", O_RDONLY) = 3\r\n...\r\nclose(3)                                = 0\r\n...\r\nopen(\"\/lib64\/libresolv.so.2\", O_RDONLY) = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Fine.<\/p>\n<pre>\r\nsocket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3\r\nconnect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(\"192.168.1.1\")}, 28) = 0\r\n...\r\nsendto(3, \"uw\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\npuppetlabs\\3com\\0\"..., 36, MSG_NOSIGNAL, NULL, 0) = 36\r\n...\r\nrecvfrom(3, \"uw\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\npuppetlabs\\3com\\0\"..., 1024, 0,\r\n{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(\"192.168.1.1\")}, [16]) = 120\r\nclose(3)                                = 0\r\n<\/pre>\n<p>DNS traffic, finally.<\/p>\n<p><em>Update 1\/10\/2011: After coming back to this little exercise tonight armed with Wireshark, I&#8217;ve found that this DNS request is for an IPv6 &#8220;AAAA&#8221; record.  Commenter Dave W. confirmed this below.  Again, this is odd to me that it would try IPv6 first.<\/em><\/p>\n<pre>\r\nopen(\"\/etc\/hosts\", O_RDONLY)            = 3\r\n...\r\nclose(3)                                = 0\r\n<\/pre>\n<p>Why?  What did this and what is the reason?<\/p>\n<p><em>Update 1\/11\/2011: This is the attempt to look it up as an IPv4 address.  The following lack of expected syslog() output is still a bit mysterious though.<\/em><\/p>\n<p>Opening \/etc\/hosts happens in 2 GNU libc functions:<\/p>\n<p>One:<\/p>\n<pre>\r\nvoid\r\n_sethtent(f)\r\n        int f;\r\n{\r\n        if (!hostf)\r\n                hostf = fopen(_PATH_HOSTS, \"r\" );\r\n        else\r\n                rewind(hostf);\r\n        stayopen = f;\r\n}\r\n<\/pre>\n<p>Two:<\/p>\n<pre>\r\nstruct hostent *\r\n_gethtent()\r\n{\r\n...\r\n        if (!hostf && !(hostf = fopen(_PATH_HOSTS, \"r\" ))) {\r\n                __set_h_errno (NETDB_INTERNAL);\r\n                return (NULL);\r\n        }\r\n...\r\n<\/pre>\n<p>Let&#8217;s assume our &#8220;problem&#8221; is _gethtent().  It&#8217;s referenced 3 places:<\/p>\n<p>One:<\/p>\n<pre>\r\nstruct hostent *\r\n_gethtbyname2(name, af)\r\n        const char *name;\r\n        int af;\r\n<\/pre>\n<p>Two:<\/p>\n<pre>\r\nstruct hostent *\r\n_gethtbyaddr(addr, len, af)\r\n        const char *addr;\r\n        size_t len;\r\n        int af;\r\n<\/pre>\n<p>Three:<\/p>\n<pre>\r\nstruct hostent *\r\ngethostent()\r\n<\/pre>\n<p>Oddly, with plenty of syslog() calls in _sethtent() and _gethtent() around where the fopen() of \/etc\/hosts happens, I cannot get them to be reached.  This odd opening of \/etc\/hosts remains a mystery.<\/p>\n<p>Moving on.<\/p>\n<pre>\r\n...\r\nsocket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3\r\nconnect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(\"192.168.1.1\")}, 28) = 0\r\n...\r\nsendto(3, \"\\256\\261\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\npuppetlabs\\3com\\0\"..., 36, MSG_NOSIGNAL, NULL, 0) = 36\r\n...\r\nrecvfrom(3, \"\\256\\261\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\npuppetlabs\\3com\\0\"..., 1024, 0,\r\n{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr(\"192.168.1.1\")}, [16]) = 66\r\nclose(3)                                = 0\r\n...\r\nwrite(1, \"74.207.250.144  puppetlabs.com w\"..., 5074.207.250.144  puppetlabs.com\r\nwww.puppetlabs.com) = 50\r\nexit_group(0)                           = ?\r\n<\/pre>\n<p>Another DNS query before we get our screen output and getent exits.  <strong>Why?<\/strong><\/p>\n<p><em>Update 1\/11\/2011: This is the IPv4 query of an &#8220;A&#8221; record finally.<\/em><\/p>\n<p>Feel free to chime in.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is a living document. Updates since the original publish dates are noted inline as such. Update 1\/11\/2011: Nearly&hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,11],"tags":[],"class_list":["post-1000","post","type-post","status-publish","format-standard","hentry","category-musings","category-sysadmin"],"_links":{"self":[{"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/posts\/1000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/comments?post=1000"}],"version-history":[{"count":47,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/posts\/1000\/revisions"}],"predecessor-version":[{"id":1228,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/posts\/1000\/revisions\/1228"}],"wp:attachment":[{"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/media?parent=1000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/categories?post=1000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kickflop.net\/blog\/wp-json\/wp\/v2\/tags?post=1000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}