Archive Page 2

Yes, You Must Use CALIBRATE_IO. No, You Mustn’t Use It To Test Storage Performance.

I occasionally get questions from customers and colleagues about performance expectations for the Oracle Database procedure called calibrate_io on XtremIO storage. This procedure must be executed in order to update the data dictionary. I assert, however, that it shouldn’t be used to measure platform suitability for Oracle Database physical I/O. The main reason I say this is because calibrate_io is a black box, as it were.

The procedure is, indeed, documented so it can’t possibly be a “black box”, right? Well, consider the fact that the following eight words are the technical detail provided in the Oracle documentation regarding what calibrate_io does:

This procedure calibrates the I/O capabilities of storage.

OK, I admit it. I’m being too harsh. There is also this section of the Oracle documentation that says a few more words about what this procedure does but not enough to make it useful as a platform suitability testing tool.

A Necessary Evil?

Yes, you must run calibrate_io. The measurements gleaned by calibrate_io are used by the query processing runtime (specifically involving Auto DOP functionality). The way I think of it is similar to how I think of gathering statistics for CBO. Gathering statistics generates I/O but I don’t care about the I/O it generates. I only care that CBO might have half a chance of generating a reasonable query plan given a complex SQL statement, schema and the nature of the data contained in the tables. So yes, calibrate_io generates I/O—and this, like I/O generated when gathering statistics, is I/O I do not care about. But why?

Here are some facts about the I/O generated by calibrate_io:

  • The I/O is 100% read
  • The reads are asynchronous
  • The reads are buffered in the process heap (not shared buffers in the SGA)
  • The code doesn’t even peek into the contents of the blocks being read!
  • There is limited control over what tablespaces are accessed for the I/O
  • The results are not predictable
  • The results are not repeatable

My Criticisms

Having provided the above list of calibrate_io characteristics, I feel compelled to elaborate.

About Asynchronous I/O

My main issue with calibrate_io is it performs single-block random reads with asynchronous I/O calls buffered in the process heap. This type of I/O has nothing in common with the main reason random single-block I/O is performed by Oracle Database. The vast majority of single-block random I/O is known as db file sequential read—which is buffered in the SGA and is synchronous I/O. The wait event is called db file sequential read because each synchronous call to the operating system is made sequentially, one after the other by foreground processes. But there is more to SGA-buffered reads than just I/O.

About Server Metadata and Mutual Exclusion

Wrapped up in SGA-buffered I/O is all the necessary overhead of shared-cache management. Oracle can’t just plop a block of data from disk in the SGA and expect that other processes will be able to locate it later. When a process is reading a block into the SGA buffer cache it has to navigate spinlocks for the protected cache contents metadata known as cache buffers chains. Cache buffers chains tracks what blocks are in the buffer cache by their on-disk address.  Buffer caches, like that in the SGA, also need to track the age of buffers. Oracle processes can’t just use any shared buffer. Oracle maintains buffer age in metadata known as cache buffers lru—which is also spinlock-protected metadata.

All of this talk about server metadata means that as the rate of SGA buffer cache block replacement increases—with newly-read blocks from storage—there is also increased pressure on these spinlocks. In other words, faster storage means more pressure on CPU. Scaling spinlocks is a huge CPU problem. It always has been—and even more so on NUMA systems. Testing I/O performance without also involving these critical CPU-intensive code paths provides false comfort when trying to determine platform suitability for Oracle Database.

Since applications to not drive random single-block asynchronous reads in Oracle Database, why measure it? I say don’t! Yes, execute calibrate_io, for reasons related to Auto DOP functionality, but not for a relevant reading of storage subsystem performance.

About User Data

This is one that surprises me quite frequently. It astounds me how quick some folks are to dismiss the importance of test tools that access user data. Say what?  Yes, I routinely point out that neither calibrate_io nor Orion access the data that is being read from storage. All Orion and calibrate_io do is perform the I/O and let the data in the buffer remain untouched.  It always seems strange to me when folks dismiss the relevance of this fact. Is it not database technology we are talking about here? Databases store your data. When you test platform suitability for Oracle Database I hold fast that it is best to 1) use Oracle Database (thus an actual SQL-driven toolkit as opposed to an external kit like Orion or fio or vdbench or any other such tool) and 2) that the test kit access rows of data in the blocks! I’m just that way.

Of course SLOB (and other SQL-driven test kits such as Swingbench do indeed access rows of data). Swingbench handily tests Oracle Database transaction capabilities and SLOB uses SQL to perform maximum I/O per host CPU cycle. Different test kits for different testing.

A Look At Some Testing Results

The first thing about calibrate_io I’ll discuss in this section is how the user is given no control or insight into what data segments are being accessed. Consider the following screenshot which shows:

  1. Use of the calibrate.sql script found under the misc directory in the SLOB kit (SLOB/misc/calibrate.sql) to achieve 371,010 peak IOPS and zero latency. This particular test was executed with a Linux host attached to an XtremIO array. Um, no, the actual latencies are not zero.
  2. I then created a 1TB tablespace. What is not seen in the screenshot is that all the tablespaces in this database are stored in an ASM disk group consisting of 4 XtremIO volumes. So the tablespace called FOO resides in the same ASM disk group. The ASM disk group uses external redundancy.
  3. After adding a 1TB tablespace to the database I once again executed calibrate_io and found that the IOPS increased 13% and latencies remained at zero. Um, no, the actual latencies are not zero!
  4. I then offlined the tablespace called FOO and executed calibrate_io to find that that IOPS fell back to within 1% of the first sample.
  5. Finally, I onlined the tablespace called FOO and the IOPS came back to within 1% of the original sample that included the FOO tablespace.
A Black Box

My objections to this result is calibrate_io is a black box. I’m left with no way to understand why adding a 1TB tablespace improved IOPS. After all, the tablespace was created in the same ASM disk group consisting of block devices provisioned from an all-flash array (XtremIO). There is simply no storage-related reason for the test result to improve as it did.

 

calibrate_after_file_add

More IOPS, More Questions. I Prefer Answers.

I decided to spend some time taking a closer look at calibrate_io but since I wanted more performance capability I moved my testing to an XtremIO array with 4 X-Bricks and used a 2-Socket Xeon E5-2699v3 (HSW-EP 2s36c72t) server to drive the I/O.

The following screenshot shows the result of calibrate_io. This test configuration yielded 572,145 IOPS and, again, zero latency. Um, no, the real latency is not zero. The latencies are sub-millisecond though. The screen shot also shows the commands in the SLOB/misc/calibrate.sql file. The first two arguments to DBMS_RESOURCE_MANAGER.CALIBRATE_IO are “in” parameters. The value seen for parameter 2 is not the default. The next section of this blog post shows a variety of testing with varying values assigned to these parameters.

calibrate_572K-iops

 

As per the documentation, the first parameter to calibrate_io is “approximate number of physical disks” being tested and the second parameter is “the maximum tolerable latency in milliseconds” for the single-block I/O.

table-of-calibrate_io_tests

As the table above shows I varied the “approximate number of physical disks” from 1 to 10,000 and the “maximum tolerable latency” from 10 to 20 and then 100. For each test I measured the elapsed time.

The results show us that the test requires twice the elapsed time with 1 approximate physical disk as it does for with 10,000 approximate physical disks. This is a nonsensical result but without any documentation on what calibrate_io actually does we are simply left scratching our heads. Another oddity is that with 10,000 approximate disks the throughput in megabytes per second is reduced by nearly 40% and that is without regard for the “tolerable latency” value. This is clearly a self-imposed limited within calibrate_io but why is the big question.

I’ll leave you, the reader, to draw your own conclusions about the data in the table. However, I use the set of results with “tolerable latency” set to 20 as validation for one of my indictments above. I stated calibrate_io is not predictable. Simply look at the set of results in the 20 “latency” parameter case and you too will conclude calibrate_io is not predictable.

So How Does CALIBRATE_IO Compare To SLOB?

I get this question quite frequently. Jokingly I say it compares in much the same way a chicken compares to a snake. They both lay eggs. Well, I should say they both perform I/O.

I wrote a few words above about how calibrate_io uses asynchronous I/O calls to test single-block random reads. I also have pointed out that SLOB performs the more correct synchronous single block reads. There is, however, an advanced testing technique many SLOB users employ to test PGA reads with SLOB as opposed to the typical SLOB reads into the SGA. What’s the difference? Well, revisit the section above where I discuss the server metadata management overhead related to reading blocks into the SGA. If you tweak SLOB to perform full scans you will test the flow of data through the PGA and thus the effect of eliminating all the shared-cache overhead. The difference is dramatic because, after all, “everything is a CPU problem.”

In a subsequent blog post I’ll give more details on how to configure SLOB for direct path with single-block reads!

To close out this blog entry I will show a table of test results comparing some key time model data. I collected AWR reports when calibrate_io was running as well as SLOB with direct path reads and then again with the default SLOB with SGA reads. Notice how the direct path SLOB increased IOPS by 19% just because blocks flowed through the PGA as opposed to the SGA. Remember, both of the SLOB results are 100% single-block reads. The only difference is the cache management overhead is removed. This is clearly seen by the difference in DB CPU. When performing the lightweight PGA reads the host was able to drive 29,884 IOPS per DB CPU but the proper SLOB results (SGA buffered) shows the host could only drive 19,306 IOPS per DB CPU. Remember DB CPU represents processor threads utilization on a threaded-processor. These results are from a 2s36c72t (HSW-EP) so these figures could also be stated as per DB CPU or per CPU thread.

If you are testing platforms suitability for Oracle it’s best to not use a test kit that is artificially lightweight. Your OLTP/ERP application uses the SGA so test that!

The table also shows that calibrate_io achieved the highest IOPS but I don’t care one bit about that!

tale-of-the-calibrate-versus-slob-tape

AWR Reports

I’d like to offer the following links to the full AWR reports summarized in the above table:

Additional Reading

Summary

Use calibrate_io. Just don’t use it to test platform suitability for Oracle Database.

Is SLOB AWR Generation Really, Really, Really Slow on Oracle Database 11.2.0.4? Yes, Unless…

If you are testing SLOB against 11.2.0.4 and find that the AWR report generation phase of runit.sh is taking an inordinate amount of time (e.g., more than 10 seconds) then please be aware that, in the SLOB/awr subdirectory, there is a remedy script rightly called 11204-awr-stall-fix.sql.

Simply execute this script when connected to the instance with sysdba privilege and the problem will be solved. 

11.2.0.4-awr-stall-fix.sql

 

Performance Data Visualization for SLOB. The SLOB Expert Community is Vibrant!

Thanks to Nikolay Savvinov (@oradiag) for his excellent post on how to wrap his scripts around the SLOB test driver (runit.sh) to capture and produce performance data visualization graphs.  I recommend a visit to his post here:

Performance Data Visualization with SLOB

 

As always, the link for SLOB is: Obtain the SLOB Kit and Helpful Information Here

Little Things Doth Crabby Make – Part XVIV: Enterprise Manager 12c Cloud Control 12.1.0.5 Install Problem.

This is a short post to help out any possible “googlers” looking for an answer to why their 12.1.0.5 EM Cloud Control install is failing in the make phase with ins_calypso.mk.

Note, this EM install was taking place on an Oracle Linux 7.1 host.

The following snippet shows the text that was displayed in the dialogue box when the error was hit:


INFO: 11/12/15 12:10:37 PM PST: ----------------------------------
INFO: 11/12/15 12:10:37 PM PST: Exception thrown from action: make
Exception Name: MakefileException
Exception String: Error in invoking target 'install' of makefile '/home/oracle/app/oracle/oms12cr5/Oracle_WT/webcache/lib/ins_calypso.mk'. See '/home/oracle/oraInventory/logs/cloneActions2015-11-12_12-10-18-PM.log' for details.
Exception Severity: 1
INFO: 11/12/15 12:10:37 PM PST: POPUP WARNING:Error in invoking target 'install' of makefile '/home/oracle/app/oracle/oms12cr5/Oracle_WT/webcache/lib/ins_calypso.mk'. See '/home/oracle/oraInventory/logs/cloneActions2015-11-12_12-10-18-PM.log' for details.

Click "Retry" to try again.
Click "Ignore" to ignore this error and go on.
Click "Cancel" to stop this installation.
INFO: 11/12/15 12:20:14 PM PST: The output of this make operation is also available at: '/home/oracle/app/oracle/oms12cr5/Oracle_WT/install/make.log'

The following shows the simple fix:


$
$ diff ./app/oracle/oms12cr5/Oracle_WT/lib/sysliblist.orig ./app/oracle/oms12cr5/Oracle_WT/lib/sysliblist
1c1
< -ldl -lm -lpthread -lnsl -lirc -lipgo --- > -ldl -lm -lpthread -lnsl -lirc -lipgo -ldms2
$
$

So if this error hath made at least one googler less crabby I’ll consider this installment in the Little Things Doth Crabby Make series all worth it.

Oracle OpenWorld 2015. Additional Attractions to Consider: OakTable World and EMC Rocks Oracle OpenWorld.

This is a quick blog entry to to share some information with readers who are attending Oracle OpenWorld 2015.

EMC Rocks Oracle OpenWorld

EMC has a concurrent event at the Elan Event Center (directly across the street from Moscone West) during OpenWorld. This event is a great opportunity to come see the most unique and powerful solutions and products EMC has to offer to folks using Oracle Database. You can register for the event at the following link or just show up with your OpenWorld badge. Link to register for EMC Rocks Oracle OpenWorld. Please visit the link to get more information about the event. I hope to see you there.

 

Oaktable World 2015

Folks who are aware of the Oaktable Network organization will be pleased to hear that Oaktable World is once again being held concurrently with Oracle OpenWorld. Please visit the following link to get more information about Oaktable Word. You’ll find that no registration is necessary and the speaker list is quite attractive. Link to information about Oaktable World 2015

Shameless Plug

Well, it is my blog after all!  I’ll be delivering another installment of my Modern Platform Topics for Modern DBAs track. This session will show how to use SLOB to study how CPU-intensive your Oracle OLTP-related *waits* are. I’ll also be showing CPU costs associated with key DW/BI/Analytics processing “underpinnings” like scan, filtration and projection on modern 2-socket servers running Linux and Oracle Database 12c. Please join us on Monday October 26 at 3PM as per the Oaktable World schedule (see below or follow the links).

 

Here is a screenshot of the EMC Rocks Oracle OpenWorld:

Screen Shot 2015-10-21 at 5.02.28 PM

The following is the schedule for technical sessions at EMC Rocks Oracle OpenWorld. I’ve highlighted the XtremIO related sessions since that is the business unit of EMC I work in.

Screen Shot 2015-10-21 at 5.24.24 PM

 

The following is the Oaktable World schedule:

Screen Shot 2015-10-21 at 4.59.00 PM

Screen Shot 2015-10-21 at 5.28.02 PM

 

Copy Data Management for Oracle Database with EMC AppSync and XtremIO

This is a quick blog entry to invite readers to view this little demonstration video I created. The topic is Copy Data Management in an Oracle Database environment. We all know the pains involved with the number of database copies needed in today’s Oracle environment. Well, how about technology with these characteristics:

  1. 100% space efficient. There is no need for any full-copy “donor” in this solution. You can create 8192 XtremIO Virtual Copies of volumes in an XtremIO array and there is no reduction in user-capacity at the storage level. For example, 512 copies of a 1TB volume with Oracle tablespaces in it takes exactly 1TB from the array.
  2. Self service. With EMC AppSync permissions can be set up so that developers can create their own copies, refresh their own copies and expire their own copies.
  3. Speed. AppSync copy operations such as creation and refresh are measured in seconds.
  4. Data Services. All XtremIO Virtual Copies enjoy data reduction services. So as users begin to make changes to their database copies the modified blocks are first treated with de-duplication and then compression.

You more than likely need XtremIO in any cose. However, now it’s also time to think about the ease of provisioning copies of Oracle databases to test/dev and other functions the XtremIO way.

It only takes minutes so please give this a view:

Focusing on Ext4 and XFS TRIM Operations – Part I.

I’ve been doing some testing that requires rather large file systems. I have an EMC XtremIO Dual X-Brick array from which I provision a 10 terabyte volume. Volumes in XtremIO are always thinly provisioned. The testing I’m doing required me to scrutinize default Linux mkfs(8) behavior for both Ext4 and XFS. This is part 1 in a short series and it is about Ext4.

Discard the Discard Option

The first thing I noticed in this testing was the fantastical “throughput” demonstrated at the array while running the mkfs(8) command with the “-t ext4” option/arg pair. As the following screen shot shows the “throughput” at the array level was just shy of 72GB/s.

That’s not real I/O…I’ll explain…

EMC XtremIO Dual X-Brick Array During Ext4 mkfs(8). Default Options.

EMC XtremIO Dual X-Brick Array During Ext4 mkfs(8). Default Options.

The default options for Ext4 include the discard (TRIM under the covers) option. The mkfs(8) manpage has this to say about the discard option :

Attempt to discard blocks at mkfs time (discarding blocks initially is useful on solid state devices and sparse / thin-provisioned storage). When the device advertises that discard also zeroes data (any subsequent read after the discard and before write returns zero), then mark all not-yet-zeroed inode tables as zeroed. This significantly speeds up filesystem initialization. This is set as default.

I’ve read that quoted text at least eleventeen times but the wording still sounds like gibberish-scented gobbledygook to me–well, except for the bit about significantly speeding up filesystem initialization.

Since XtremIO volumes are created thin I don’t see any reason for mkfs to take action to make it, what, thinner?  Please let me share test results challenging the assertion that the discard mkfs option results in faster file system initialization. This is the default functionality after all.

In the following terminal output you’ll see that the default mkfs options take 152 seconds to make a file system on a freshly-created 10TB XtremIO volume:


# time mkfs -t ext4 /dev/xtremio/fs/test
mke2fs 1.43-WIP (20-Jun-2013)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=16 blocks
335544320 inodes, 2684354560 blocks
134217728 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
81920 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
real 2m32.055s
user 0m3.648s
sys 0m17.280s
#

The mkfs(8) Command Without Default Discard Functionality

Please bear in mind that default 152 second result is not due to languishing on pathetic physical I/O. The storage is fast. Please consider the following terminal output where I passed in the non-default -E option with the nodiscard argument. The file system creation took 4.8 seconds:

# time mkfs -t ext4 -E nodiscard /dev/xtremio/fs/test
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=16 blocks
335544320 inodes, 2684354560 blocks
134217728 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
81920 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
 2560000000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

real 0m4.856s
user 0m4.264s
sys 0m0.415s
#

I think 152 seconds down to 4.8 makes the point that with proper, thinly-provisioned storage the mkfs discard option does not “significantly speed up filesystem initialization.” But initializing file systems is not something one does frequently so investigation into the discard mount(8) option was in order.

Taking Ext4 For A Drive

Since I had this 10TB Ext4 file system–and a fresh focus on file system discard (storage TRIM) features–I thought I’d take it for a drive.

Discarded the Default Discard But Added The Non-Default Discard

While the default mkfs(8) command includes discard, the mount(8) command does not. I decided to investigate this option while unlinking a reasonable number of large files. To do so I ran a simple script (shown below) that copies 64 files of 16 gigabytes each–in parallel–into the Ext4 file system. I then timed a single invocation of the rm(1) command to remove all 64 of these files. Unlinking file in a Linux file system is a metadata operation, however, when the discard option is used to mount the file system each unlink operation includes TRIM operations being sent to storage. The following screen shot of the XtremIO performance dashboard was taken while the rm(1) command was running. The discard mount option turns a metadata operation into a rather costly storage operation.

Array Level Activity During Bulk rm(1) Command Processing. Ext4 (discard mount option)

Array Level Activity During Bulk rm(1) Command Processing. Ext4 (discard mount option)

The following terminal output shows the test step sequence used to test the discard mount option:

# umount /mnt ; mkfs -t ext4 -E nodiscard /dev/xtremio/fs/test; mount -t ext4 -o discard /dev/xtremio/fs/test /mnt
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=16 blocks
335544320 inodes, 2684354560 blocks
134217728 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
81920 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
 2560000000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# cd mnt
# cat &gt; cpit
for i in {1..64}; do ( dd if=/data1/tape of=file$i bs=1M oflag=direct )& done
wait
# time sh ./cpit &gt; /dev/null 2&gt;&1 

real 5m31.530s
user 0m2.906s
sys 8m45.292s
# du -sh .
1018G .
# time rm -f file*

real 4m52.608s
user 0m0.000s
sys 0m0.497s
#

The following terminal output shows the same test repeated with the file system being mounted with the default (thus no discard) mount options:

# cd ..
# umount /mnt ; mkfs -t ext4 -E nodiscard /dev/xtremio/fs/test; mount -t ext4 /dev/xtremio/fs/test /mnt
mke2fs 1.43-WIP (20-Jun-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=16 blocks
335544320 inodes, 2684354560 blocks
134217728 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
81920 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
 2560000000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# cd mnt
# cat &gt; cpit
for i in {1..64}; do ( dd if=/data1/tape of=file$i bs=1M oflag=direct )& done
wait
#
# time sh ./cpit &gt; /dev/null 2&gt;&1 

real 5m31.526s
user 0m2.957s
sys 8m50.317s
# time rm -f file*

real 0m16.398s
user 0m0.001s
sys 0m0.750s
#

This testing shows that mounting an Ext4 file system with the discard mount option dramatically impacts file removal operations. The default mount options (thus no discard option) performed the rm(1) command in 16 seconds whereas the same test took 292 seconds when mounted with the discard mount option.

So how can one perform the important house-cleaning that comes with TRIM operations?

The fstrim(8) Command

Ext4 supports user-invoked, online TRIM operations on mounted file systems. I would advise people to forego the discard mount option and opt for occasionally running the fstrim(8) command. The following is an example of  how long it takes to execute fstrim on the same 10TB file system stored in an EMC XtremIO array. I think that foregoing the taxation of commands like rm(1) is a good thing–especially since running fstrim is allowed on mounted file systems and only takes roughly 11 minutes on a 10TB file system.

# time fstrim -v /mnt
/mnt: 10908310835200 bytes were trimmed

real 11m29.325s
user 0m0.000s
sys 2m31.370s
#

Summary

If you use thinly-provisioned storage and want file deletion in Ext4 to return space to the array you have a choice. You can choose to take serious performance hits when you create the file system (default mkfs(8) options) and when you delete files (optional discard mount(8) option) or you can occasionally execute the fstrim(8) command on a mounted file system.

Up Next

The next post in this series will focus on XFS.


EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,795 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 2,795 other followers