Archive for the 'oracle' Category



OpenWorld 2010 Session Update. Room Change Again.

The OOW folks informed me that they needed to move our session to a different room–again. So, if you are interested here are the new details:

ID#: S315110
Title: Optimizing Servers for Oracle Database Performance
Track: Database
Date: 20-SEP-10
Time: 17:00 – 18:00
Venue: Moscone South
Room: Rm 102

Oracle Database 11g Release 2 Patchset 1 (11.2.0.2) Is Now Available, But This Is Not Just An Announcement Blog Entry.

BLOG UPDATE: I should have peeked at my blog aggregator before posting this. I just found that my friend Greg Rahn posted about the same content on his blog earlier today. Hmmm.plagerism!

Oracle Database 11g Release 2 Patchset 1 (11.2.0.2 Part Number E15732-03) is available as of today for x86 and x86_64 Linux as per My Oracle Support. This is not a blog post with a simple announcement of the software availability. I’d like to point out sometime related to this Patchset that I did not know until quite recently. I don’t apply Patchsets very often since having joined Oracle so I learned a few new things about patch application particularly as it pertains to 11.2.0.2.

Read This Before Touching 11.2.0.2!

I recommend reading MOS Note 1189783.1 – Important Changes to Oracle Database Patch Sets Starting With 11.2.0.2. There are two key topics this MOS note explains quite well:

  • The reason behind why the 11.2.0.2 Patchset download for x86_64 is 4.9 Gigabytes in size
  • More clarity on the concept of an “out-of-place upgrade”

I’ll wish I had read MOS note 1189783.1 before I trudged headlong into my first 11.2.0.1->11.2.0.2 upgrade effort!

OpenWorld 2010 Unconference Venue Is Now Open For OpenWorld Attendees Too!

In my post entitled OpenWorld 2010 Unconference Open For JavaOne And/Or Oracle Develop Registrants Only I quoted the Unconference policy which, at the time, stated Unconference attendance was only open to JavaOne and Oracle Develop folks.

I just received email stating that the policy has changed and that the new wording is as follows:

Now, Open to Oracle OpenWorld Attendees as well!
The unconference is a venue for any JavaOne, Oracle Develop or Oracle OpenWorld 2010 attendees to present their own session or workshop on a topic they’re passionate about, in an informal, interactive setting. It is a great opportunity for attendees to learn what’s on the minds of their peers in the community.

My Unconference sessions are:

Tuesday 11 AM: Lombard: Do-It-Yourself Exadata-like Performance? Kevin Closson, Performance Architect, Systems Technology Group, Oracle.

Tuesday 4PM:    Lombard: What Every Oracle Professional Wants To Ask About Exadata (Also Known as Q&A with Kevin Closson.) Kevin Closson, Performance Architect, Systems Technology Group, Oracle.

OpenWorld 2010 Unconference – Open for JavaOne And/Or Oracle Develop Registrants Only. A Poll.

It has come to my attention that the Unconference offered during this year’s OpenWorld can only be attended by registered JavaOne or OracleDevelop attendees as per the following quote:

Participation and attendance is reserved to JavaOne and Oracle Develop attendees. You have to be registered to JavaOne or Oracle Develop 2010 to attend any of those sessions.

I think it’s time for a poll. How many folks interested in Exadata plan to be paid JavaOne/OracleDevelop attendees and might, therefore, attend Unconference sessions?

Some Blog Errors Are Just Too Serious To Ignore. A Comparison of Intel Xeon 5400 (Harpertown) to Intel Xeon 5500 (Nehalem EP).

I’d like to direct readers to an important blog update/correction.

In my post entitled An Intel Xeon 5400 System That Outperforms An Intel 5500 (Nehalem EP) System? Believe It…Or Know It I blogged about an erroneous conclusion I had drawn about a test performed on these two processor models. I think the update does the blog post justice and it all serves as a good object lesson in how important Xeon topology is.  I must remember to practice what I preach (e.g., remain ever-aware of topology).

While on the topic, the following post remains as an example of the the type of workload that exhibits near-parity between Xeon 5400 and Xeon 5500:

Intel Xeon 5500 Nehalem: Is It 17 Percent Or 2.75-Fold Faster Than Xeon 5400 Harpertown? Well, Yes Of Course It Is!

Oracle Exadata Database Machine I/O Bottleneck Revealed At… 157 MB/s! But At Least It Scales Linearly Within Datasheet-Specified Bounds!

It has been quite a while since my last Exadata-related post. Since I spend all my time, every working day, on Exadata performance work this blogging dry-spell should seem quite strange to readers of this blog. However, for a while it seemed to me as though I was saturating the websphere on the topic and Exadata is certainly more than a sort of  Kevin’s Dog and Pony Show. It was time to let other content filter up on the Google search results. Now, having said that, there have been times I’ve wished I had continued to saturate the namespace on the topic because of some of the totally erroneous content I’ve seen on the Web.

Most of the erroneous content is low-balling Exadata with FUD, but a surprisingly sad amount of content that over-hypes Exadata exists as well. Both types of erroneous content are disheartening to me given my profession. In actuality, the hype content is more disheartening to me than the FUD. I understand the motivation behind FUD, however, I cannot understand the need to make a good thing out to be better than it is with hype. Exadata is, after all, a machine with limits folks. All machines have limits. That’s why Exadata comes in different size configurations  for heaven’s sake! OK, enough of that.

FUD or Hype? Neither, Thank You Very Much!
Both the FUD-slinging folks and the folks spewing the ueber-light-speed, anti-matter-powered warp-drive throughput claims have something in common—they don’t understand the technology.  That is quickly changing though. Web content is popping up from sources I know and trust. Sources outside the walls of Oracle as well. In fact, two newly accepted co-members of the OakTable Network have started blogging about their Exadata systems. Kerry Osborne and Frits Hoogland have been posting about Exadata lately (e.g., Kerry Osborne on Exadata Storage Indexes).

I’d like to draw attention to Frits Hoogland’s investigation into Exadata. Frits is embarking on a series that starts with baseline table scan performance on a half-rack Exadata configuration that employs none of the performance features of Exadata (e.g., storage offload processing disabled). His approach is to then enable Exadata features and show the benefit while giving credit to which specific aspect of Exadata is responsible for the improved throughput. The baseline test in Frits’ series is achieved by disabling both Exadata cell offload processing and Parallel Query Option! To that end, the scan is being driven by a single foreground process executing on one of the 32 Intel Xeon 5500 (Nehalem EP) cores in his half-rack Database Machine.

Frits cited throughput numbers but left out what I believe is a critical detail about the baseline result—where was the bottleneck?

In Frits’ test, a single foreground process drives the non-offloaded scan at roughly 157MB/s. Why not 1,570MB/s (I’ve heard everything Exadata is supposed to be 10x)? A quick read of any Exadata datasheet will suggest that a half-rack Version 2 Exadata configuration offers up to 25GB/s scan throughput (when scanning both HDD and FLASH storage assets concurrently). So, why not 25 GB/s? The answer is that the flow of data has to go somewhere.

In Frits’ particular baseline case the data is flowing from cells via iDB (RDS IB) into heap-buffered PGA in a single foreground executing on a single core on a single Nehalem EP processor. Along with that data flow is the CPU cost paid by the foreground process in its marshalling all the I/O (communicating with Exadata via the intelligent storage layer) which means interacting with cells to request the ASM extents as per its mapping of the table segments to ASM extents (in the ASM extent map). Also, the particular query being tested by Frits performs a count(*) and predicates on a column. To that end, a single core in that single Nehalem EP socket is touching every row in every block for predicate evaluation. With all that going on, one should not expect more than 157MB/s to flow through a single Xeon 5500 core. That is a lot of code execution.

What Is My Point?
The point is that all systems have bottlenecks somewhere. In this case, Frits is creating a synthetic CPU bottleneck as a baseline in a series of tests. The only reason I’m blogging the point is that Frits didn’t identify the bottleneck in that particular test. I’d hate to see the FUD-slingers suggest that a half-rack Version 2 Exadata configuration bottlenecks at 157 MB/s for disk throughput related reasons about as badly as I’d hate to see the hype-spewing-light-speed-anti-matter-warp rah-rah folks suggest that this test could scale up without bounds. I mean to say that I would hate to see someone blindly project how Frits’ baseline test would scale with concurrent invocations. After all, there are 8 cores, 16 threads on each host in the Version 2 Database Machine and therefore 32/64 in a half rack (there are 4 hosts). Surely Frits could invoke 32 or 64 sessions each performing this query without exhibiting any bottlenecks, right? Indeed, 157 MB/s by 64 sessions is about 10 GB/s which fits within the datasheet claims. And, indeed, since the memory bandwidth in this configuration is about 19 GB/s into each Nehalem EP socket there must surely be no reason this query wouldn’t scale linearly, right? The answer is I don’t have the answer. I haven’t tested it. What I would not advise, however, is dividing maximum theoretical arbitrary bandwidth figures (e.g., the 25GB/s scan bandwidth offered by a half-rack) by a measured application throughput requirement  (e.g., Frits’ 157 MB/s) and claim victory just because the math happens to work out in your favor. That would be junk science.

Frits is not blogging junk science. I recommend following this fellow OakTable member to see where it goes.

What’s Really Happening at OpenWorld 2010? Part II.

BLOG UPDATE: Yet another room change for Optimizing Servers for Oracle Database Performance

The OOW folks informed me that they needed to move our session to a larger room. So, if you are interested here are the new details:

ID#: S315110
Title: Optimizing Servers for Oracle Database Performance
Track: Database
Date: 20-SEP-10
Time: 17:00 – 18:00
Venue: Moscone South
Room: Rm 302

I’ll also be giving this following Unconference sessions:

Tuesday – Sept 21st

11AM
Lombard:  Do-It-Yourself Exadata-like Performance?

4PM
Lombard:  What Every DBA Wants To Ask About Exadata (also known as Q&A with Kevin Closson).


What’s Really Happening at OpenWorld 2010?

This is a quick blog entry to share a few of my plans for OOW. I’ll be co-presenter with a Wallis Pereira, Sr. Technical Program Manager in the Mission Critical Segment of Intel’s Data Center Group. Wally is a very old friend of mine and we’ll be delivering the following session.

ID#: S315110
Title: Optimizing Servers for Oracle Database Performance
Track: Database
Date: 20-SEP-10
Time: 17:00 – 18:00
Venue: Moscone South
Room: Rm 270

“Unconference”
I’ll also be offering a couple of short presentations in the “Unconference” venue on Tuesday, September 21 at 11 AM and 2 PM:

Tuesday – Sept 21st

11AM
Lombard: Do-It-Yourself Exadata-like Performance? Kevin Closson, Performance Architect, Systems Technology Group, Oracle.

2PM
Mason: What Every DBA Wants To Ask About Exadata Also Known as Q&A with Kevin Closson. Kevin Closson, Performance Architect, Systems Technology Group, Oracle.

For more information on the Unconference venue please visit OOW 2010 Unconferences

I also recommend joining me as I attend the following presentation:

Realworld Performance Group Round Table Discussion

By the way, since my Monday session is at 5:00 PM I should be done for the day afterward so any of the folks that owe me a drink can catch me after the presentation (visa versa on the drink debt of course) 🙂

Do-It-Yourself Exadata-Level Performance! Really? Part IV.

In my post entitled Do-It-Yourself Exadata-Level Performance? Really? I invited readers to visit the Oracle Mix page and vote for my suggest-a-session where I aimed to present on DIY Exadata-level performance. As the following screenshot shows I got a lot of good folks to vote on that. It must have been an interesting sounding topic!

Yes, 105 votes. I’m not positive, but that may be the largest number of votes for any suggest-a-session. Thanks for the support. The screenshot also states that back in the week of July 5 the results and notifications would be posted. I waited a few weeks after July 5, without notice, and emailed some of the Mix folks. Here’s what I got:

Oracle employees were not eligible to be selected. The Mix process is meant to give external folks another opportunity to submit their sessions for review and possible inclusion.

I wish they would have stipulated the fact that Oracle Employees need not participate in suggest-a-session. I would have saved those 105 folks the headache of voting.

So, I’m sorry to say that if the topic I suggested in my abstract was something you wanted to hear in a general session, your want is in vain. However, the syllabus for the show suggests to me that there will be plenty of content that you need to hear as per the powers that be. I think the old Stones’ lyric should change to:

You can’t always get what you want. We’ll give you what you need.

I’ll blog more about this seemingly seedy concept of DIY Exadata-level performance soon. I’ll also post about the sessions I am involved with at OOW 2010. I’m hoping my dry-spell on blogging is going to ease. I have a large amount of content to get out.

Little Things Doth Crabby Make – Part XIV. Verbose Linux Command Output Should Be Very Trite. Shouldn’t It?

Not all topics I blog about in my Little Things Doth Crabby Make series make me crabby. Often times I’ll blog something that I presume would make at least one individual somewhere, sometime crabby. This one actually did make me crabby.

Huh? Was That Verbose?
I’m blogging about the –verbose option to the Linux mdadm(8) command. Consider the command I issued in the following text box.


$ mdadm --create --verbose /dev/md11 --level=stripe -c 4096 --raid-devices=16 /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac
mdadm: failed to create /dev/md11

OK, that wasn’t very verbose. Indeed, it only reported to me that the command failed. I could have figured that out by the obvious missing RAID device after my command prompt returned to me. In my mind, verbose shouldn’t mean what but why. That is, if I ask for verbose output I want something to help me figure out why something just happened. The what is obvious—command failure results in no RAID device.

As you’ll see in the following text box I checked to make sure I was superuser and indeed I was not. So I picked up superuser credentials and the command succeeded nicely. However, even when the command succeeds the verbose option isn’t exactly chatting my ear off! That said, getting brief output from a successful execution of a command, when I stipulate verbosity, would certainly not make it as an installment in the Little Things Doth Crabby Make series.


$ id
uid=1002(oracle) gid=700(dba) groups=700(dba)
$ su
Password:
#  mdadm --create --verbose /dev/md11 --level=stripe -c 4096 --raid-devices=16 /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac
mdadm: array /dev/md11 started.

The moral of the story is if you want to do things that require superuser become superuser.

I still want why-based output when I opt for verbosity. In this case there was a clear permissions problem. The command could have at least let the errno.h goodies trickle up!

Little Things Doth Crabby Make – Part XIII. When Startup Means Shutdown And Stay That Way.

This seemed worthy of Little Things Doth Crabby Make status mostly because I was surprised to see that Oracle Database 11g STARTUP command worked this way…

Consider the following text box. I purposefully tried to do a STARTUP FORCE specifying a PFILE that doesn’t exist. Well, it did exactly what I told it to do, but I was surprised to find that the abort happens before sqlplus checks for the existence of the specified PFILE. I ended up with a down database instance.


SQL> startup force pfile=./p4.ora
ORACLE instance started.

Total System Global Area 3525079040 bytes
Fixed Size                  2217912 bytes
Variable Size            1107298376 bytes
Database Buffers         2231369728 bytes
Redo Buffers              184193024 bytes
Database mounted.

Database opened.
SQL> SQL>
SQL> HOST ls foo.ora
ls: foo.ora: No such file or directory

SQL> startup force pfile=./foo.ora
LRM-00109: could not open parameter file './foo.ora'
ORA-01078: failure in processing system parameters
SQL> show sga
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 737 Serial number: 5

This one goes in the don’t-do-stupid-stuff category I guess. Please don’t ask how I discovered this…

Running Oracle Database On A System With 40% Kernel Mode Overhead? Are You “Normal?”

Fellow Oak Table Network member Charles Hooper has undertaken a critical reading of a recently published book on the topic of Oracle performance. Some folks have misconstrued his coverage as just being hyper-critical, but as Charles points out his motive is just to bring the content alive. It has been an interesting series of blog entries. I’ve commented on a couple of these blog posts, but as I began to comment on his latest installment I realized I should just do my own blog entry on the matter and refer back. The topic at hand is about how “system time” relates to Oracle performance.

The quote from the book that Charles is blogging about reads:

System time: This is when a core is spending time processing operating system kernel code. Virtual memory management, process scheduling, power management, or essentially any activity not directly related to a user task is classified as system time. From an Oracle-centric perspective, system time is pure overhead.

To say “[…] any activity not directly related to a user task is classified as system time” is too simplistic to be correct. System time is the time processors spend executing code in kernel mode. Period. But therein lies my point. The fact is the kernel doesn’t do much of anything that is not directly related to a user task. It isn’t as if the kernel is running interference for Oracle. It is only doing what Oracle (or any user mode code for that matter)  is driving it to do.

For instance, the quote lists virtual memory, process scheduling and so on. That list is really too short to make the point come alive. It is missing the key kernel internals that have to do with Oracle such as process birth, process death, IPC (e.g., Sys V semaphores), timing (e.g., gettimeofday()), file and network I/O, heap allocations and stack growth and page table internals (yes, Virtual Memory).

In my opinion, anyone interested in the relationship between Oracle and an operating system kernel must read Section 8.1 of my friend James Morle’s book Scaling Oracle8i in spite of the fact that it sounds really out of date (by title) it goes a long way to make the topic at hand a lot easier to understand.

If this topic is of interest to you feel free to open the following link and navigate down to section 8.1 (page 417). Scaling Oracle8i ( in PDF form).

How Normal Are You?
The quote on Charles’ blog entry continues:

From an Oracle-centric perspective, system time is pure overhead. It’s like paying taxes. It must be done, and there are good reasons (usually) for doing it, […]

True, processor cycles spent in kernel mode are a lot like tax. However, as James pointed out in his book, the VOS layer, and the associated OSD underpinnings, have historically allowed for platform-specific optimizations.  That is, the exact same functionality on one platform may impose a larger tax than on others. That is the nature of porting. The section of section of James’ book starting at page 421 shows some of the types of things that ports have done historically to lower “system time” tax.

Finally, Charles posts the following quote from the book he is reviewing:

Normally, Oracle database CPU subsystems spend about 5% to 40% of their active time in what is called system mode.

No, I don’t know what “CPU subsystems” is supposed to mean. That is clearly a nickname for something. But that is not what I’m blogging about.

If you are running Oracle Database (any version since about 8i) on a server dedicated to Oracle and running on the hardware natively (not a Virtual Machine), I simply cannot agree with that upper-bound figure of 40%. That is an outrageous amount of kernel-mode overhead. I should think the best way to get to that cost level would be to use file system files without direct I/O. Can anyone with a system losing 40% to kernel mode please post a comment with any specifics about what is driving that much overhead and whether you are happy with the performance of your server?

Little Things Doth Crabby Make – Part XII.1 Please, DD, Lose My Data! I Didn’t Need That Other 4K Anyway.

I’ve never done a “dot release”  for any post in my Little Things Doth Crabby Make series but there is a first time for everything. In yesterday’s post (Part I) I blogged about how dd(1) on Linux will happily let you specify arbitrarily huge values for the bs (block size) option yet doing so will cause silent data loss. One reader commented on the blog and several emailed me to point out that the proof I specified is faulty. They are right, but that fact doesn’t change the truth. I discovered this data loss with a simple file-to-file dd operation but thought piping it to wc(1) would make the point easier to understand. The problem is doing so meant I was checking the return from wc(1) not dd(1).  No matter:


# dd if=OH.tar.gz bs=2147483648 of=newfile count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 3.57895 seconds, 600 MB/s
# echo $?
0
# bc -l
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
2147483648-2147479552
4096
# ls -l newfile
-rw-r--r-- 1 root root 2147479552 Jun 17 12:06 newfile


Little Things Doth Crabby Make – Part XII. Please, DD, Lose My Data! I Didn’t Need That Other 4K Anyway.

It’s been a while since the last installment in my Little Things Doth Crabby Make series. The full series can be found here. So, what’s made me crabby this time? Well, when a Linux utility returns a success code to me I expect that to mean it did what I told it to do. Well…

What’s 4K Between Friends?
Really? I’m just picky! If I use dd(1) to write 2GB (2 * 2^30 sort of 2GB by the way) I’m not looking for a successful transfer of (2 * 2 ^ 30) – 4096 bytes! Imagine that.

Folks, don’t trust the return code from dd(1). I’ve been burned more than once.

Is He Totally Crazy? Using dd(1) with a 2GB write size?
Sure, why not? If it doesn’t want to do what I ask it is supposed to fail the command, not lose data.

I just did this on a 2.6.18 Kernel:

# file /tmp/OH.tar
/tmp/OH.tar: POSIX tar archive
# ls -lh  /tmp/OH.tar
-rw-r--r-- 1 oradb oinstall 19G Jun 16 17:23 /tmp/OH.tar
# dd if=/tmp/OH.tar bs=2147483648 count=1 | wc -c
0+1 records in
0+1 records out
2147479552
2147479552 bytes (2.1 GB) copied, 2.52849 seconds, 849 MB/s
# echo $?
0
# bc -l
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
2147483648-2147479552
4096

Do It Yourself Exadata Performance! Really? Part III.

I just noticed that the vote count for my Oracle Mix Suggest-A-Session is up to 92! I’m flattered and thanks for the votes, folks. I promise this is the last post on this thread!

The session I aim to present has some content that I delivered to our EMEA Sales Consultants during an event we had in Berlin back in April. I did that presentation during one of our one evening sessions and was surprised to find that one of the attendees took a photo of the costume I chose.

Perhaps I should have saved that hat for viewing any World Cup games featuring team Germany?


DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 741 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.