How To Use SPL To Optimize Data Performance

SQL (Structured Query Language) is the primary language used to communicate with a database. However, SQL has its limitations, and this is where SPL can help you overcome those shortcomings.

Traversing a table in SQL

Let's examine the following SQL statements:

SELECT a, b, sum(x) from T GROUP BY a, b WHERE …;
SELECT c, d, max(y) from T GROUP BY c, d WHERE …;
SELECT a, c, avg(y), min(z) from T GROUP BY a, c WHERE …;

T represents a large table that needs to be grouped by three methods.

GROUP BY will have to traverse the T table three times. Traversing the table three times is a lengthy operation.

To reduce the amount of traversing, you could write SQL statements like this:

CREATE TABLE T_temp as SELECT a, b, c, d,
SUM(case when … then x else 0 end) sumx,
MAX(case when … then y else null end) maxy,
SUM(case when … then y else 0 end) sumy,
COUNT(case when … then 1 else null end) county,
MIN(case when … then z else null end) minz
GROUP BY a, b, c, d;
SELECT a, b, sum(sumx) from T_temp GROUP BY a,b WHERE…;
SELECT c, d, max(maxy) from T_temp GROUP BY c,d WHERE…;
SELECT a, c, sum(sumy)/sum(county), min(minz) from T_temp GROUP BY a,c WHERE …;

The SQL statements above will allow you to traverse the T table once. However, the grouping may become large, creating an even larger result.

When you have large tables (in the millions or billions of data), sorting in SQL becomes challenging.

In the following SQL statement, you will have to sort through all the data, retrieve the first five, and discard the remaining sorting results:

SELECT * FROM (SELECT x FROM T ORDER BY x DESC) WHERE rownum&lt;=5

SQL does not support explicit set data types. In SQL, aggregate functions only return a single value, and you cannot use aggregate functions in a WHERE clause.

Sorting the top N elements in a database is a relatively straightforward operation. However, SQL operations get more challenging if you need to perform more complex operations. For example, writing a SQL statement to calculate the top N of each group after grouping is challenging:

SELECT * FROM
(SELECTy, x, row_number() over (PARTITION BY y ORDER BY x DESC) rn FROM T)
WHERE rn&lt;=5

How open-source SPL can help

SPL stands for Structured Process Language. SPL is a programming language for structured data computing.

Open-source SPL supports more data types and operations and can solve many previously described problems. SPL enhances computing performance in a short time.

Calculate multiple groupings in one traversal

	A
1	A1=	file("T.ctx").open().cursor(a,b,c,d,x,y,z)
2	cursor A1=	A2.select(…).groups(a,b;sum(x))
3	// Define the first filtering	and grouping in the traversal
4	cursor=	A4.select(…).groups(c,d;max(y))
5	// Define the first filtering	and grouping in the traversal
6	cursor=	A6.select(…).groupx(a,c;avg(y),min(z))
7	// Define the first filtering	and grouping in the traversal
8	//End of definition	Start calculating all three types of filtering and grouping

Calculate the top five by aggregate method

	A
1	\=file("T.ctx").open()
2	\=A1.cursor@m(x).total(top(-5,x), top(5, x))
3	top(-5, x) calculates top 5 with the largest x，top(5, x) calculates top 5 with the smallest x

Retrieve the top 5 of each group

	A
1	\=file("T.ctx").open()
2	\=A1.cursor@m(x, y).groups(y;top(-5, x), top(5, x))

Conclusion

Open-source SPL is a powerful language that can overcome SQL's shortcomings. Read more about open-source SPL language on the official website.

How To Use SPL To Optimize Data Performance

Table of contents

Traversing a table in SQL

How open-source SPL can help

Calculate multiple groupings in one traversal

Calculate the top five by aggregate method

Retrieve the top 5 of each group

Conclusion

Further reading

How To Use SPL To Optimize Data Performance

Table of contents

Traversing a table in SQL

How open-source SPL can help

Calculate multiple groupings in one traversal

Calculate the top five by aggregate method

Retrieve the top 5 of each group

Conclusion

Further reading

Did you find this article valuable?