Traversing a table in SQL
Let's examine the following SQL statements:
SELECT a, b, sum(x) from T GROUP BY a, b WHERE …; SELECT c, d, max(y) from T GROUP BY c, d WHERE …; SELECT a, c, avg(y), min(z) from T GROUP BY a, c WHERE …;
T represents a large table that needs to be grouped by three methods.
GROUP BY will have to traverse the T table three times. Traversing the table three times is a lengthy operation.
To reduce the amount of traversing, you could write SQL statements like this:
CREATE TABLE T_temp as SELECT a, b, c, d, SUM(case when … then x else 0 end) sumx, MAX(case when … then y else null end) maxy, SUM(case when … then y else 0 end) sumy, COUNT(case when … then 1 else null end) county, MIN(case when … then z else null end) minz GROUP BY a, b, c, d; SELECT a, b, sum(sumx) from T_temp GROUP BY a,b WHERE…; SELECT c, d, max(maxy) from T_temp GROUP BY c,d WHERE…; SELECT a, c, sum(sumy)/sum(county), min(minz) from T_temp GROUP BY a,c WHERE …;
The SQL statements above will allow you to traverse the T table once. However, the grouping may become large, creating an even larger result.
When you have large tables (in the millions or billions of data), sorting in SQL becomes challenging.
In the following SQL statement, you will have to sort through all the data, retrieve the first five, and discard the remaining sorting results:
SELECT * FROM (SELECT x FROM T ORDER BY x DESC) WHERE rownum<=5
SQL does not support explicit set data types. In SQL, aggregate functions only return a single value, and you cannot use aggregate functions in a WHERE clause.
Sorting the top N elements in a database is a relatively straightforward operation. However, SQL operations get more challenging if you need to perform more complex operations. For example, writing a SQL statement to calculate the top N of each group after grouping is challenging:
SELECT * FROM (SELECTy, x, row_number() over (PARTITION BY y ORDER BY x DESC) rn FROM T) WHERE rn<=5
How open-source SPL can help
SPL stands for Structured Process Language. SPL is a programming language for structured data computing.
Open-source SPL supports more data types and operations and can solve many previously described problems. SPL enhances computing performance in a short time.
Calculate multiple groupings in one traversal
|3||// Define the first filtering||and grouping in the traversal|
|5||// Define the first filtering||and grouping in the traversal|
|7||// Define the first filtering||and grouping in the traversal|
|8||//End of definition||Start calculating all three types of filtering and grouping|
Calculate the top five by aggregate method
|2||\=A1.cursor@m(x).total(top(-5,x), top(5, x))|
|3||top(-5, x) calculates top 5 with the largest x，top(5, x) calculates top 5 with the smallest x|
Retrieve the top 5 of each group
|2||\=A1.cursor@m(x, y).groups(y;top(-5, x), top(5, x))|
Open-source SPL is a powerful language that can overcome SQL's shortcomings. Read more about open-source SPL language on the official website.