Chalk this blog entry up under the “might help some poor Googler” someday column. This is a really weird Oracle installation error.
Lot’s of Clusters, Less Confusion
We have a lot of clusters here running Oracle on everything from Red Hat RHEL4 x86 and x86_64 to SuSE SLES9 x86 and x86_64. We also build clusters for certain test purposes such as analyzing how different kernels affect performance (thanks Carel-Jan), stability and so on. To keep things straight we generally build kernels and then name them with earmarks so that simple uname(1) output will tell us what the configuration is. For example, if we are running a test called “test3”, with the kernel build from the kernel-source-2.6.5-7.244.i586.rpm package, we might see the following when running the uname(1) command:
# uname -r
2.6.5-7.244-default-test3-244-0003-support
That is a long name for a kernel, but who should care? The manpage for the uname(2) call on Linux defines the arrays returned by the call as being unspecified in length:
The utsname struct is defined in <sys/utsname.h>:
struct utsname {
char sysname[];
char nodename[];
char release[];
char version[];
char machine[];
#ifdef _GNU_SOURCE
char domainname[];
#endif
};The length of the arrays in a struct utsname is unspecified; the fields are NUL-terminated.
[Blog Correction: Before updating this page I had erroneously pointed out that NUL was a misspelling. I was wrong. See the comment stream below.]
What Does This Have to Do With Oracle?
Installation! We were trying to install Oracle10g Release 2 version 10.2.0.1 on SuSE SLES9 U3 x86 and ran into the following:
$ ./runInstaller
*** glibc detected *** free(): invalid next size (fast): 0x0807aa80 ***
*** glibc detected *** free(): invalid next size (fast): 0x0807ab00 ***
*** glibc detected *** free(): invalid next size (fast): 0x0807ab28 ***
*** glibc detected *** free(): invalid next size (fast): 0x0807ab50 ***
[…much error text deleted…]
*** glibc detected *** free(): invalid next size (fast): 0x0807ab70 ***
*** glibc detected *** free(): invalid next size (fast): 0x0807ab98 ***
*** glibc detected *** free(): invalid next size (fast): 0x0807abc0 ***
*** glibc detected *** free(): invalid next size (normal): 0x0807ad88 ***
*** glibc detected *** free(): invalid next size (normal): 0x0807aef0 ***
./runInstaller: line 63: 11294 Segmentation fault $CMDDIR/install/.oui $* -formCluster
What? The runInstaller script executed .oui which in turn suffered a segmentation fault. After investigating .oui with ltrace(1) it became clear that .oui mallocs 30 bytes and then calls uname(2). In our case, the release[] array returned to .oui from the uname(2) library call was a bit large. Larger than 30 bytes for certain. In spite of the fact that the uname(2) manpage says the size of the release[] array is unspecified, .oui presumes it will fit in 30 bytes. The strcpy(P) call that followed tried to stuff the array containing our long kernel name(2.6.5-7.244-default-test3-244-0003-support) into a 30 byte space at 0x8075940. That resulted in a segmentation fault:
malloc(1024) = 0x8073118
malloc(1024) = 0x8073520
malloc(8192) = 0x8073928
malloc(8) = 0x8075930
malloc(30) = 0x8075940
uname(0xbfffba48 <unfinished …>
SYS_uname(0xbfffba48) = 0
<… uname resumed> ) = 0
strcpy(0x8075940, “2.6.5-7.244-default-test3-244-00″…) = 0x8075940
The Moral
Don’t use long kernel names on Oracle systems. And, oh, when a manpage says something is unspecified, that doesn’t necessarily mean 30.
Trivia corner:
The length of the arrays in a struct utsname is unspecified; the fields are NUL-terminated.
“Let’s overlook the misspelling of NULL (there are a lot of typos in Linux manpages).”
NUL (with one L) is not a typo – it’s the official ASCII symbol name for 0x00. See eg http://www.unicode.org/charts/PDF/U0000.pdf.
(Now I’ll go get myself a life…)
I wouldn’t fault Oracle for a normal assumption
that unames are not longer than 30 characters.
This is an example of two developers working independently — one in the Linux space, one in the Oracle space !
Hemant
Hemant,
I’m not saying this is some hideous bug. On the other hand, there is nothing “normal” about assuming a documented array of unspecified size will be 30 bytes or less at runtime. Of all the arrays returned by this call, the release[] array is the one that has historically has not had any sort of “convention” for contents.
Yes, Nigel, you are right. I have fallen prey to common usage. I feel exonerated after having searched out the fact that both forms (NUL/NULL) share the same etymology–Latin nullus. 🙂 Not really…precision is important (thus the blog post about the 30 byte thing).
We (re)learn something new everyday. Having said that, I still hold fast that there are a LOT of typos in Linux manpages.
I only say that expecting uname to be 30 characters might
be normal. After all, what is relevant to the Installer
is the uname of the server. Whether underlying it is
a bounded or unbounded array and whether Oracle should
handle an unbounded array is taking it, probably, to too
much detail.
uname is commonly “not long” and 30 characters is “long enough”.
What is the standard on non-Linux platforms ? Oracle
would try to use a standard (or be platform independent).
Hemant
“I only say that expecting uname to be 30 characters might
be normal. After all, what is relevant to the Installer
is the uname of the server. Whether underlying it is
a bounded or unbounded array and whether Oracle should
handle an unbounded array is taking it, probably, to too
much detail.”
Hemant,
We are talking about a very unimportant bug, I know, because very few people would compile a kernel with a very long kernel description field (utsname.release). However, you are completely missing the point. It is no small oversight to take the return from a system call that is documented as an unspecified length array and cram it into a 30 byte space. And for that matter, 30 is a really odd size to pull out of a hat since it isn’t even a power of 2. Regardless of how unimportant we all think it is, it is a bug. Further. You keep stating the problem is “uname” of the server. That is not the case. It would be uname -r, or more succinctly, the release[] array returned by the uname(2) call.
It you are going to access a string of undefined length but only intend to allocate 30 bytes to store the string it would seem reasonable to sub-string the value.
Yes Mark, that would be reasonable.
As an aside, I’m told by a fellow OakTable member that this is a known bug (6006775) but I can’t find it in metalink.
Finally, I have said that this is a blog entry I’d hope to chalk up under the “help some poor googler someday” column. This isn’t rocket science.
Kevin, i’m one of those poor googler. Thanx!
I get the same error for even with short name. 2.6.18-8.el5 I guess it is short name.
./runInstaller -silent -responseFile /home/oracle/sw/11g/LINUX/database/response/custom1.rsp
Starting Oracle Universal Installer…
Checking Temp space: must be greater than 80 MB. Actual 866020 MB Passed
Checking swap space: must be greater than 150 MB. Actual 1983 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2008-10-15_10-58-33AM. Please wait …./runInstaller: line 81: 6998 Segmentation fault $CMDDIR/install/.oui $*
[oracle@p8-crmdb-standby database]$ vi runInstaller
[oracle@p8-crmdb-standby database]$ uname -r
2.6.18-8.el5
I even changed the hostname it does not work:
[oracle@dbstdby database]$ ./runInstaller -silent \
> -responseFile $DISTRIB/response/custom1.rsp \
> FROM_LOCATION=$DISTRIB/stage/products.xml \
> ORACLE_BASE=/u01/app/oracle \
> ORACLE_HOME=/u01/app/oracle/product/11g/db_1 \
> ORACLE_HOME_NAME=11g
Starting Oracle Universal Installer…
Checking Temp space: must be greater than 80 MB. Actual 865127 MB Passed
Checking swap space: must be greater than 150 MB. Actual 1983 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2008-10-15_11-37-34AM. Please wait …./runInstaller: line 81: 3127 Segmentation fault $CMDDIR/install/.oui $*
[oracle@dbstdby database]$
I found some error with my installation oracle 10g on centOS 5.4, when 62% install oracle : Error in invokin target ‘ntcontab.o’ makefile ‘/home/oracle/oracle/product/10.2.0/db_4/network/lib/ins_net_client.mk’ See ‘/home/oracle/oraInventory/logs/installActions2010-06-16_07-19-27PM.log’ for details. Click ‘Help’, Retry’,’Ignore’,’Cancel’.
How problem solve with my installation, thank’s for your attention.
Best regards
A. Mulyana
So I know nobody has posted here in 3 years, but I just ran into this bug and came up with a fix you can use without depending on Oracle. What I did was to create my own uname() and put it in a shared library. I will try to outline what I have done here.
1. Create a C file with the following code in it:
#include
#include
#include
#include
int uname(struct utsname *name)
{
strncpy (name->sysname, “test”, sizeof (name->sysname));
strncpy (name->release, “test”, sizeof (name->release));
strncpy (name->version, “test”, sizeof (name->version));
strncpy (name->machine, “test”, sizeof (name->machine));
return 0;
}
2. Compile that code with the following command:
gcc new_uname.c -o libuname.so -ldl -shared -fPIC -I.
3. Make sure that programs you run are using your new library by setting LD_PRELOAD:
export LD_PRELOAD=/path/to/new/libuname.so
4. Test it out:
aaron@amcirillo-linux ~/code/uname $ ldd `which uname`
linux-vdso.so.1 (0x00007fff647ff000)
/home/aaron/code/uname/libuname.so (0x00007f9371c2a000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9371883000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f937167f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9371e2c000)
aaron@amcirillo-linux ~/code/uname $ uname
test
aaron@amcirillo-linux ~/code/uname $ uname -r
test
aaron@amcirillo-linux ~/code/uname $ uname -v
test
aaron@amcirillo-linux ~/code/uname $ uname -m
test
5. Go ahead and run the oracle installer again, but be sure to export LD_PRELOAD first so that it uses your custom uname()