Examples Of Sample; Cluster Sampling - HP Neoview SQL Reference Manual

Hide thumbs Also See for Neoview SQL:
Table of Contents

Advertisement

Cluster Sampling

Cluster sampling is an option supported by the SAMPLE RANDOM clause in a SELECT statement.
A cluster, in this sense, is a logically contiguous set of disk blocks in the file in which a table is
stored. The number of blocks in a cluster is specified in the CLUSTERS subclause of the SAMPLE
RANDOM clause. For example:
SELECT * FROM customers
SAMPLE RANDOM 1 PERCENT
CLUSTERS OF 2 BLOCKS;
This query randomly selects one percent of the clusters in the CUSTOMERS table and then adds
each row in all selected clusters to the result table. In other words, think of the CUSTOMERS
table as a sequence of disk blocks, where each two blocks in the sequence is a cluster. The
preceding query selects one percent of the clusters at random and then returns all the rows in
each selected cluster.
Cluster sampling can be done only on a base table, not on intermediate results.
Cluster sampling is generally faster than sampling individual rows because fewer blocks are
read from disk. In random row and periodic sampling, the entire result table being sampled is
read, and each row in the table is processed. In cluster sampling, only the disk blocks that are
part of the result table are read and processed. Therefore, if the sampling percentage is small,
the performance advantage of cluster sampling over other sampling methods can be dramatic.
Cluster sampling is designed for large tables. It might return zero rows if there are not enough
blocks in a table to fill at least one cluster and you specify a large cluster size.

Examples of SAMPLE

Suppose that the data-mining tables SALESPER, SALES, and DEPT have been created as:
CREATE TABLE $db.mining.salesper
( empid
,dnum
,salary
,age
,sex
,PRIMARY KEY (empid) );
CREATE TABLE $db.mining.sales
( empid
,product VARCHAR (20)
,region
,amount
,PRIMARY KEY (empid) );
CREATE TABLE $db.mining.dept
( dnum
,name
,PRIMARY KEY (dnum) );
Suppose, too, that sample data is inserted into this database.
Return the SALARY of the youngest 50 sales people:
SELECT salary
FROM salesperson
SAMPLE FIRST 50 ROWS SORT BY age;
SALARY
-----------
90000.00
90000.00
28000.00
27000.12
320
SQL Clauses
NUMERIC (4) UNSIGNED NOT NULL
NUMERIC (4) UNSIGNED NOT NULL
NUMERIC (8,2) UNSIGNED
INTEGER
CHAR (6)
NUMERIC (4) UNSIGNED NOT NULL
CHAR (4)
NUMERIC (9,2) UNSIGNED
NUMERIC (4) UNSIGNED NOT NULL
VARCHAR (20)

Advertisement

Table of Contents
loading

Table of Contents