“Feel” Your Processor Cache. Oracle Does. Part II.

That’s funny, but I had a sneaking suspicion it was going to happen, so….

In yesterday’s post entitled “Feel” Your Processor Cache. Oracle Does. Part I., I pointed out that the newest comer to the ever-growing crowd of in-memory open source database products and accelerators got it a bit wrong when they described what a level-two processor cache is. However, before I made that blog post I took a screen shot of the Csqlcache blog. Notice the description of L2:

"Before" Description: L2 Cache Soldered to the Motherboard

I just checked and it looks like they took the hint, but in what some would consider poor blogging style by simply changing it as opposed to making an edit to draw attention to the change. But that’s not why I’m blogging. The fact that someone made a blog correction is not interesting to me. Please see the “after rendition” in the next screen shot:

"After" Description: L2 On-die.

So, yes, they took the hint that expressing cache latency in wall clock time is messy, but they now cite fixed latencies for L1, L2 and memory. First off, 5-cycle L1 would be disastrous! Then citing 10-cycle L2 is truly a number out of the hat. But that is not what I’m blogging about.

The new page is citing memory latency at 5-50ns. Oh how I’d love to have a system that chewed up memory at 50ns! But what about that low bound 5ns? Wow, memory latencies at modern L2 cache speed. That would be so cool! I wonder where these Csqlcache folks get their hardware? It is definitely out-of-this-worldly.

It’s All About Cache Lines
I don’t get this bit about “granularity” in that page either. Folks, modern microprocessors map memory to cache with in nits known as cache lines. All processors that matter (no names but their initials are x64) use a cache line size of 64 bytes (8 words). In order to access any bits within a 64-byte line of memory, the entire line must be installed in the processor cache. So I think it would be a bit more concise to specify granularity at the base operational chunk the processor deals with, which is a cache line. That’s the point of the Silly Little Benchmark by the way.

The workhorse of SLB (memhammer) randomly picks a line and writes a word in the line. The control loop is tight and the work loop is otherwise light so this test creates maximum processor stalls with minimum extraneous cycles. That is, it exhibits a miserably high CPI (cycles per instruction) cost. That’s why it is called memhmmer.

I’ve got the “before” screen shot. Let’s see if it silently changes. I hate to sound critical, but these Csqlcache folks are hanging their hat on producing a database accelerator. You have to know a lot about memory and how it works to do that sort of thing well. And, my oh my how the in-memory and in-line database accelerators field is so saturated. That reminds me of the company that my SQL Server-focused counterparts at PolyServe were all excited about back in about 2005 called Xprime. It looked like a holy grail back then. I recall it even took a best new product sort of award at a large SQL Server convention.

It didn’t work very well.

	kevinclosson on Announcing SLOB 2.5.4
	Hell Dip on Announcing SLOB 2.5.4
	kevinclosson on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…
	Amey Bobade on Introducing SLOB – The S…

Kevin Closson's Blog: Platforms, Databases and Storage