How To Use SPL To Optimize Data Performance

How To Use SPL To Optimize Data Performance

·

3 min read

SQL (Structured Query Language) is the primary language used to communicate with a database. However, SQL has its limitations, and this is where SPL can help you overcome those shortcomings.

Traversing a table in SQL

Let's examine the following SQL statements:

SELECT a, b, sum(x) from T GROUP BY a, b WHERE …;
SELECT c, d, max(y) from T GROUP BY c, d WHERE …;
SELECT a, c, avg(y), min(z) from T GROUP BY a, c WHERE …;

T represents a large table that needs to be grouped by three methods.

GROUP BY will have to traverse the T table three times. Traversing the table three times is a lengthy operation.

To reduce the amount of traversing, you could write SQL statements like this:

CREATE TABLE T_temp as SELECT a, b, c, d,
SUM(case whenthen x else 0 end) sumx,
MAX(case whenthen y else null end) maxy,
SUM(case whenthen y else 0 end) sumy,
COUNT(case whenthen 1 else null end) county,
MIN(case whenthen z else null end) minz
GROUP BY a, b, c, d;
SELECT a, b, sum(sumx) from T_temp GROUP BY a,b WHERE…;
SELECT c, d, max(maxy) from T_temp GROUP BY c,d WHERE…;
SELECT a, c, sum(sumy)/sum(county), min(minz) from T_temp GROUP BY a,c WHERE …;

The SQL statements above will allow you to traverse the T table once. However, the grouping may become large, creating an even larger result.

When you have large tables (in the millions or billions of data), sorting in SQL becomes challenging.

In the following SQL statement, you will have to sort through all the data, retrieve the first five, and discard the remaining sorting results:

SELECT * FROM (SELECT x FROM T ORDER BY x DESC) WHERE rownum<=5

SQL does not support explicit set data types. In SQL, aggregate functions only return a single value, and you cannot use aggregate functions in a WHERE clause.

Sorting the top N elements in a database is a relatively straightforward operation. However, SQL operations get more challenging if you need to perform more complex operations. For example, writing a SQL statement to calculate the top N of each group after grouping is challenging:

SELECT * FROM
(SELECTy, x, row_number() over (PARTITION BY y ORDER BY x DESC) rn FROM T)
WHERE rn<=5

How open-source SPL can help

SPL stands for Structured Process Language. SPL is a programming language for structured data computing.

Open-source SPL supports more data types and operations and can solve many previously described problems. SPL enhances computing performance in a short time.

Calculate multiple groupings in one traversal

A
1A1=file("T.ctx").open().cursor(a,b,c,d,x,y,z)
2cursor A1=A2.select(…).groups(a,b;sum(x))
3// Define the first filteringand grouping in the traversal
4cursor=A4.select(…).groups(c,d;max(y))
5// Define the first filteringand grouping in the traversal
6cursor=A6.select(…).groupx(a,c;avg(y),min(z))
7// Define the first filteringand grouping in the traversal
8//End of definitionStart calculating all three types of filtering and grouping

Calculate the top five by aggregate method

A
1\=file("T.ctx").open()
2\=A1.cursor@m(x).total(top(-5,x), top(5, x))
3top(-5, x) calculates top 5 with the largest x,top(5, x) calculates top 5 with the smallest x

Retrieve the top 5 of each group

A
1\=file("T.ctx").open()
2\=A1.cursor@m(x, y).groups(y;top(-5, x), top(5, x))

Conclusion

Open-source SPL is a powerful language that can overcome SQL's shortcomings. Read more about open-source SPL language on the official website.

Further reading

Did you find this article valuable?

Support Maddy by becoming a sponsor. Any amount is appreciated!