Announcing Obfuscated Column Data Loading with SLOB

BLOG UPDATE: SLOB 2.5.2.2 is available for download as per this blog post.

I announced availability of SLOB 2.5.2.2. It is primarily a small bug fix release but also has an undocumented new feature. I want to say a few words about the feature in this post and will update SLOB with full support (e.g., documentation and slob.conf parameter support) in SLOB 2.5.3. Some SLOB users might want this new undocumented feature as soon as possible–thus this blog post.

The new feature allows SLOB users to load unique data in SLOB tables (single table in Single Schema Model and multiple tables in the Multiple Schema Model). Default SLOB data consists of simple, repeated character strings because column data is not really that important in the SLOB Method. However, some users find that their storage compression features (e.g., enterprise-class storage arrays such as those by Dell/EMC and Netapp) achieve artificially high compression ratios with default SLOB data. For certain testing purposes, it is less desirable to test with artificially high compression ratios. Starting with SLOB 2.5.2, users can avoid this level of compression by setting OBFUSCATE_COLUMNS to TRUE in the shell environment prior to executing the SLOB data loading script (setup.sh). Again, this feature will be documented in SLOB 2.5.3.

Overview of Loading Obfuscated Data

The following is a set of screen shots that will help me explain the OBFUSCATE_COLUMNS feature. To use the undocumented figure, simply set the shell environment variable OBFUSCATE_COLUMNS to TRUE. Doing so before executing the data loader (setup.sh) will result in all column data being loaded with random data. Similarly, setting the parameter before executing the test driver (runit.sh) will result in all UPDATE statements changing columns to random values.

An Example

The best way to describe the feature is to show it in use. Figure 1 shows how one can set the OBFUSCATE_COLUMNS shell environment variable to TRUE and load data as normal with the setup.sh script.

Figure 1

Figure 2 shows how the data loader filled the Single Schema Model table with two distinct strings. As explained in the documentation, the slob.conf->LOAD_PARALLEL_DEGREE parameter controls how many concurrent data loading streams will be used to load the data. In Figure 2 I had the default number of concurrent data loading streams. With the OBFUSCATE_COLUMNS feature, each data loading stream will load uniquely generated obfuscated strings. To show a sample of the data, I randomly chose the 13th ordinal column and queried distinct values. As explained in the documentation, the SLOB Method loads only a single row per block and each row has 19 128-character VARCHAR2 columns. Obfuscated data is loaded into each column. Note: each block loaded by a data loading stream will have the same block row content–so your compression ratio might vary based on the chunk size the compressor operates on.

Figure 2

Figure 3 shows how setting slob.conf->LOAD_PARALLEL_DEGREE to 64 results in 64 distinct obfuscated strings being loaded.

Figure 3

Summary

This post offers a preview of an undocumented SLOB feature. Maybe some users will post their experiences with before and after compression levels and how that changes their testing.

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage