The history of APL is littered with both missed opportunities and belated climbing onto fast-receding bandwagons. I'm sure you all can think of more than enough examples; perhaps some of you can suggest why this happens.
The purpose of this (hopefully quite brief) item is to share some experiences and lessons in another application area which appears to be growing in business importance and which is quite clearly a candidate for quite widespread APL infeed. It is called OLAP, which is short for online analytic processing. The background to the article is some recent exploration of two OLAP packes by Dogon Research.
Many businesses (especially the larger and richer ones) maintain a historic database of their activity; it tends to be oriented toward day-to-day updating of current business and is substantial in size. For example, a food company would record all sales and identify each order by a number of business indexes (type of product, manufacturing division and so forth) - this is OLTP (online transaction processing). These databases serve the purposes of recording and tracking activity quite well, but there is a further business requirement to which they are less well suited.
This is the analytic activity of seeking out trends in the data, cross-relationships and so forth. Sometimes some of this is called data warehousing, sometimes some of it is called data mining, and another (not necessarily all-embracing) term is online analytic processing (OLAP).
The main characteristic of OLAP is that it takes a multi-dimensional view of data; and the work that OLAP tools do is to make multidimensional selection, aggregation and analysis achievable by the business specialist. Sound familiar? OLAP has been around for several years, with several vendors. Typically OLAP is done using an extracted and reformulated version of the underlying transaction database, although it appears that current commercial trends are for OLAP tools to be assimilated by relational database vendors and to reach directly into the mainstream database products. Vendor events typically attract several hundred attendees.
Some of the 'philosophy' of OLAP has been coined by Ted Codd (of relational database notoriety) and there is a clear sense in which the vendors are reaching for the same sort of academic underpinning as the 'ten rules' of relational databases. One of the most relevant of the 'OLAP rules' is that data is inherently multi-dimensional, and each of the dimensions has equal standing - in other words, 'dimension' is an abstract concept in the same way as 'column' is an abstract concept for a relational table.
At heart Essbase offers a single multi-dimensional data object; you can augment this with a (single?) multi-dimensional currency object which allows data consolidation for multinational organisations. The bare-bones interface to this data object is through Excel and as such it appears to very much assume a neo-technical user.
The curious might like to examine US patent 5359724, which describes how the Essbase database is mapped onto disk; you may find the concepts somewhat familiar.
Internal processing of multi-dimensional data is through a scripting language, and there is very little evidence of an underlying multi-dimensional notation; there is a peculiar construct called 'the multi-dimensional operator' which seems to have something of the characteristics of bracket-indexing. What's most striking is that when the calculation process is described (and written into the scripting language) it is done very much in terms of a serial traversal of the database - item-by-item.
More sophisticated use of Essbase involves using an API which is documented for both C and Visual Basic; at which point all notion of multi-dimensionality is completely abandoned. The API presents data as a series of character strings, and the C or VB programmer is faced with the task of not only deciding how they want to handle multi-dimensionality in their application but also mapping the character strings into this self-defined framework.
Something which Essbase seems to be quite good at is accepting that data is likely to be sparser along some dimension combinations than others and allowing the database engine to be tuned for space and processing optimisation.
Oracle Express is based on the quite venerable PC Express multi-dimensional data application. The moment you see it the tears of nostalgia for the good old days come rolling down. At its base level it looks and behaves just like all those early-80's APL timesharing applications.
Oracle Express provides a holding mechanism for multiple multi-dimensional objects; your understanding can map what's going on here directly into (or out of) APL workspace ideas. The conceptual step that Oracle Express brings is that indices can be names rather than numbers (what would be an indirect process in APL is direct in Oracle Express - "get me the May 1997 baked beans sales for Kansas").
Since the product was purchased by Oracle a lot of work has been done on the user interface, and the most interesting aspect of this is what they market as Express Objects.
Express Objects is a sort of VB-alike, which combines the form and language characteristics of VB (it is like-VB, but not-VB) with some custom objects which are focussed on display and manipulation of the underlying multi-dimensional data. Data can be taken into the Express Objects Basic language with a relationship to their underlying multi-dimensional nature. This is much more sophisticated than Essbase, but again we see 'the mainstream' pulling up short because what the Basic programmer gets to handle is not only called a Datacube - it really is a cube. What Express Objects does is to represent higher dimensionality as a mapping onto a three-dimensional object. In the same way that Excel offers a two-and-a-half dimension extension to the old 2D spreadsheet paradigm, so Express Objects holds back from a full acceptance that the underlying data can contain many dimensions which are each of equal standing.
Oracle Express does not make the underlying data structure anywhere near as obvious or tunable as Essbase.
OLAP is a multi-dimensional concept; APL is a multi-dimensional notation.
An OLAP database engine can take care of the problems implicit in mapping multi-dimensional data onto a linear file space, handling multi-user simultaneity and other data-processing questions, relieving us of the need to solve these problems for ourselves.
OLAP data is typically sparse and, so far as I know, APL implementations have not addressed sparsity to any real degree.
At the programming level, OLAP tools offer virtually nothing in the way of a multi-dimensional data notation; indeed they seem to be very shy of this. Partly this reflects the limited perspective of both their business market and of the mainframe software world. OLAP tools look impressive if they are viewed from a spreadsheet perspective; when they are seen from a 'been there' world of multi-dimensional achievement they look like a triumph of marketing over reality.
An ideal world would combine OLAP database technology to handle bulk and other 'database' issues; the database would then present subsets of the data with multi-dimensionality retained into an appropriate algebraic vehicle (you know what I'm thinking of) for onward processing and probable return to the data repository.
We can do a lot more, in a lot less time, by applying APL (or J) technology to this sort of data than can the VB (or C) programmer who is faced with the need to construct their own multi-dimensional paradigm. To most fully exploit this advantage we should, of course, also be willing to use tools like VB in the places where they hold an advantage (quick-build user interfaces are the clear example).
The window of opportunity may be limited by vendor success in integrating their OLAP processing into mainstream relational databases.
All trademarks, explicit or implied, are acknowledged.
This page last updated 13 March 2013 (repaired links)
© Dick Bowman 1997-2013
Return to the J\APL Contents Page