Patrick Craig
NAG Ltd, Wilkinson House,
Jordan Hill Road, Oxford OX2 8DR, UK
AbstractThis paper is aimed at IRIS Explorer users who want to create their own data types. It is intended to be read in conjunction with the information given in the Creating User-defined Data Types chapter of the IRIS Explorer Module Writer's Guide. A new data type for handling statistical data is specified and the procedure for implementing and using the new type is described.
IRIS Explorer is a powerful scientific visualisation system that is currently aimed at computational physicists, chemists and engineers [1]. The IRIS Explorer data types are therefore designed to hold the data structures used by these workers. However, IRIS Explorer was never intended to be a closed system and as well as being able to create new modules using the existing IRIS Explorer types, users can create their own data types to handle unsupported data structures. The work described in this paper is part of an ongoing project to integrate the functionality of the Genstat statistical package into IRIS Explorer. Genstat is a very general statistics program that includes facilities for data management and manipulation, statistical analysis and graphical display In SS2 the new data type is described and specified as an IRIS Explorer type definition file. Section 3 describes how the type definition file is processed to produce the files required to use the type. In SS4 the automatically generated Application Programming Interface (API) for the type are described and example C code using the API functions is provided.
A data type was required that could hold the basic data structures that are used in the Genstat statistical package. These are variables that consist of an identifier (name) and a one-dimensional array of values. There are three types of variable that differ in the way the values are interpreted. A typical data set is made up of a number of variables of one or more types and each observation in the data set is represented by the values of each variable at a given position in the variable arrays. A data set can therefore be thought of as a variable by observation two-dimensional matrix. In section 2.1 the three types of Genstat variable are described and section 2.2 gives an example of how a data set could be stored in these variable types. The IRIS Explorer type definition file for the data type is described in section 2.3.
variate
The values of a variate are integer or floating point numbers. variates are normally used to store quantitative data.
text
text values are strings that are used as observation identifiers.
factor
factors are used to group points into subsets of the total data set. The values of a factor are therefore restricted to a limited set of possible levels. Each level has an identifier or label.
River Length Continent Nile 6695 Africa Amazon 6570 S.America Mississippi 6020 N.America Yangtze 5471 Asia Ob 5410 Asia 'Huang He' 4840 Asia Zaire 4630 Africa Amur 4415 Asia Lena 4269 Asia Mackenzie 4240 N.America Niger 4183 Africa Mekong 4180 Asia Yenisey 4090 Asia Murray 3717 Oceania Volga 3688 Europe
This data set shows the 15 longest rivers in the world. The three columns in this data set represent three data structures. Each column is headed by its respective identifier, subsequent rows represent observations in the data set, in this case rivers. The first column gives the name of the river and could be stored in a text structure called River. The second column would be stored in a variate called Length. The third column is an example of a factor called Continent with six levels.
The river data set could therefore be stored in the following three variables
Variable 1 Type = Text Identifier = River Values = Nile, Amazon, Mississippi, Yangtze, Ob, 'Huang He', Zaire, Amur, Lena, Mackenzie, Niger, Mekong, Yenisey, Murray, Volga Variable 2 Type = Variate Identifier = Length Values = 6695, 6570, 6020, 5471, 5410, 4840, 4630, 4415, 4269, 4240, 4183, 4180, 4090, 3717, 3688 Variable 3 Type = Factor Identifier = Continent Values = 0,1,2,3,3,3,0,3,3,2,0,3,3,4,5 Labels = Africa, S.America, N.America, Asia, Oceania, Europe
Each of the variable types could be defined as an individual IRIS Explorer data type. However, as many of the modules that would use the new data type would be able to use data in two or all of the above forms and to reduce the number of connections between modules it was decided to create a single data type that could hold all three data structures. The first step in creating a new data type in IRIS Explorer is to create a data type definition file that describes the type in a format that IRIS Explorer can understand. The type definition file for gnBase (Genstat basic type) is shown below.
#include <cx/DataCtlr.h> #include <cx/Typedefs.t> typedef enum { gn_Variate, gn_Factor, gn_Text } gnPrimType; shared typedef struct { long len "Length"; string identifier "Identifier"; gnPrimType gnType "Type"; switch (gnType) { case gn_Variate : double values[len] "Values"; case gn_Text : string values[len] "Values"; case gn_Factor : long values[len] "Values"; long levels "Levels"; string labels[levels] "Labels"; } d; } gnData; shared root typedef struct { long nVar "Num variables"; gnData data[nVar] "Data array"; } gnBase;
The gnBase structure is declared as a shared root structure with two elements, nVar, the number of variables, and data, the variable array. It is declared as a root structure so that it can be used as an input and output port data type in IRIS Explorer. The shared attribute of gnBase means that the data structure will be shared between modules and allocation and deallocation of the memory used for the structure will be controlled in IRIS Explorer by reference counting. The gnBase variable array is an array of gnData which is declared above it.
The gnData structure stores a single variable. Its elements are len, the number of values, identifier, the variable identifier, gnType, variable type, and values, the one dimensional array of values. The switch construct is used to set the type of the values array depending on variable type. The factor type has two additional elements, namely levels, the number of levels, and labels, the labels for the levels. The gnData structure is also shared because it will be shared between modules, but is not a root type because it was decided to only pass the complete gnBase structure between modules.
In this section, the process by which a new type is implemented on a UNIX operating system is described. This process has been simplified for the Windows NT operating system [3].
The type definition file is translated into the files required to use the new type by creating a text file called TYPES containing the single word gnBase in the same directory as gnBase.t and executing the IRIS Explorer makefile creation utility, cxmkmf. This creates the Makefile and executing the make command creates the files listed in section 3.1. To make the new type available to IRIS Explorer, the type has to be installed as described in section 3.2.
The C equivalent of gnBase.t, gnBase.h
#ifndef __GNBASE_H_ #define __GNBASE_H_ /* * Translated by cxtyper Tue Dec 3 17:13:31 1996 */ #include <cx/DataCtlr.h> typedef enum { gn_Variate, gn_Factor, gn_Text } gnPrimType; typedef struct gnData { cxDataCtlr ctlr; long len; char *identifier; gnPrimType gnType; union { struct { double *values; } gn_Variate; struct { char **values; } gn_Text; struct { long *values; long levels; char **labels; } gn_Factor; } d; } gnData; typedef struct gnBase { cxDataCtlr ctlr; long nVar; gnData **data; } gnBase; #endif
The cxDataCtlr elements of gnData and gnBase are used by IRIS Explorer for reference counting. The automatically generated API functions provide sufficient access to the data structures to make direct manipulation of structure elements by the programmer unnecessary.
Before installing a user defined type, the EXPLORERUSERHOME environment variable should be set to a directory in the user's file space. The make install command copies the files that are required to use gnBase to the relevant destination directories as shown below. If a directory did not exist it is created. If the files are created in $EXPLORERUSERHOME/types, the installation process will delete the .type file and gnBase will not be accessible in IRIS Explorer. The type is therefore normally built in a subdirectory of $EXPLORERUSERHOME/types before being installed.
$EXPLORERUSERHOME/types/ gnBase.type $EXPLORERUSERHOME/lib/ libgnBase.a $EXPLORERUSERHOME/include/cx/ gnBase.api.h gnBase.api.inc gnBase.h gnBase.inc gnBase.t $EXPLORERUSERHOME/man/man3/ gnBase.man3
In this section, the automatically generated Application Programmer's Interface (API) to gnBase is described (section 4.1) and examples of their use are provided in the form of user function files for modules that use the type (section 4.2).
Because the generation of the API functions is a general purpose automated process, some of the functions that are generated may be identical to others. For example, the gnBaseDataarrayLen function returns the length of the gnBase data array, i.e. the len element of gnBase, but there is also a function called gnBaseNumvariablesGet which also returns the value of this element.
The last group of API functions provide access to the elements of the gnData structure that are only relevant when the structure type is gn_Factor. The automatically generated API code for these functions performs a check to ensure that the passed structure is of type gn_Factor before accessing the structure elements. If it is of the wrong type an error is generated. For example gnDataLevelsGet contains the following code.
signed long gnDataLevelsGet( gnData *src ,cxErrorCode *ec ) { if (!src) { *ec = cx_err_error; return (signed long) 0; } if (src->gnType != gn_Factor) { *ec = cx_err_error; return (signed long) 0; } *ec = cx_err_none; return src->d.gn_Factor.levels; }
This module reads in Variate data from an ascii file and outputs it in a gnBase structure. The module has a single parameter input port connected to a file browser and a single gnBase output. The format of the ascii file is
Number of variables Number of values for first variable First Variable identifier First variable values Number of values for second variable Second Variable identifier Second variable values etc
Example data file for the Read ascii file module
3 7 Day 0 1 2 3 4 5 6 7 Temperature 10.2 12.7 15.9 13.6 14.4 11.6 12.3 7 Windspeed 25.2 20.6 20.8 22.8 15.3 14.8 15.7
User function file for the Read ascii file module
#include <cx/cxParameter.api.h> #include <cx/cxLattice.api.h> #include <cx/gnBase.api.h> #include <cx/DataAccess.h> #include <cx/DataOps.h> #include <stdio.h> #include <string.h> void MemError (gnBase *gnb) { if (gnb) cxDataRefDec(gnb); cxModAlert ("Unable to allocate memory"); return; } void ReadAscii (char *filename, gnBase **DataOut) { #define MAX 50 /* Maximum identifier length */ FILE *in; int i, j, var, len; float val; gnData **Array; cxErrorCode err; char Buffer[MAX]; char *id; /* Attempt to open file, return if file cannot be opened */ if (*filename == NULL) return; in = fopen(filename, "r"); if (in == NULL) return; /* Read number of variables and allocate new gnBase structure */ fscanf (in, "%d", &var); *DataOut = gnBaseAlloc(var); if (*DataOut == NULL) {MemError(NULL);return;} /* Get pointer to gnData array */ Array = gnBaseDataarrayGet(*DataOut, &err); /* Variable loop */ for (i = 0; i < var; i++) { /* Read length of this variate and allocate new gnData structure */ fscanf (in, "%d", &len); Array[i] = gnDataAlloc(len, gn_Variate, NULL); if (Array[i] == NULL) {MemError(*DataOut);return;} /* Read identifier and store in gnData structure */ fscanf (in, "%s", Buffer); id = (char *) cxDataMalloc (strlen(Buffer)); if (id == NULL) {MemError(*DataOut);return;} strcpy (id, Buffer); gnDataIdentifierSet (Array[i], id, &err); /* Read and store values */ for (j = 0; j < len; j++) { fscanf (in, "%f", &val); ((double *)gnDataValuesGet(Array[i], &err))[j] = val; } } fclose (in); }
This module prints out the contents of a gnBase structure. It has a single gnBase input port.
User function file for Print gnBase module
#include <cx/cxParameter.api.h> #include <cx/cxLattice.api.h> #include <cx/gnBase.api.h> #include <cx/DataAccess.h> #include <cx/DataOps.h> #include <stdio.h> #include <string.h> void PrintAscii (gnBase *DataIn) { #define FWIDTH 15 /* Field width of printed output */ FILE *in; long i, j, var; gnData **Array; cxErrorCode err; gnPrimType type; long maxlen; /* Get number of variables and gnData array pointer */ var = gnBaseNumvariablesGet(DataIn, &err); Array = gnBaseDataarrayGet(DataIn, &err); /* Write variable identifiers and store maximum variable length */ maxlen = 0; for (i = 0; i < var; i++) { printf ("%*s", FWIDTH, gnDataIdentifierGet (Array[i], &err)); if (gnDataLengthGet(Array[i], &err) > maxlen) maxlen = gnDataLengthGet(Array[i], &err); } printf ("\n"); /* Write values depending on type */ for (j = 0; j < maxlen; j++) { for (i = 0; i < var; i++) { type = gnDataTypeGet(Array[i], &err); if (j < gnDataLengthGet(Array[i], &err)) { switch (type) { case gn_Variate: printf ("%*g", FWIDTH, ((double *) gnDataValuesGet(Array[i], &err))[j]); break; case gn_Factor: printf ("%*s", FWIDTH, (char **) gnDataLabelsGet(Array[i], &err) [((long *)gnDataValuesGet(Array[i], &err))[j]]); break; case gn_Text: printf ("%*s", FWIDTH, ((char **) gnDataValuesGet(Array[i], &err))[j]); break; } } else { printf ("%*s", FWIDTH, ""); } } printf ("\n"); } }
If the input from this module came from a read ascii module that had read in the example file in SS4.2.1 the printed output would be
Day Temperature Windspeed 0 10.2 25.2 1 12.7 20.6 2 15.9 20.8 3 13.6 22.8 4 14.4 15.3 5 11.6 14.8 6 12.3 15.7
This module is an example of a gnBase filter that restricts the variate values to lie between a min and max set by the user. The usual way to create a filter module in the Module Builder [3][4] is to pass the parts of the structure that will not be affected by the filter directly from the input to the output port in the connections window and simply connect the parts of the structure to be changed to the function arguments. In this case just the type and values would need to be passed to the function arguments. However, gnBase differs from other IRIS Explorer types in that it contains a double pointer to a reference counted structure (gnData). The module builder is not currently able to create module data wrapper code for such a structure. Instead of casting the pointer as (gnData **), it attempts to cast it to (gnData), which fails. In effect, this means that the complete gnBase structure must be passed to the function arguments.
The module has gnBase input and output ports and two parameter input ports, min and max, that are connected to sliders or dials.
User function file for Filter module
#include <cx/cxParameter.api.h> #include <cx/cxLattice.api.h> #include <cx/gnBase.api.h> #include <cx/DataAccess.h> #include <cx/DataOps.h> #include <stdio.h> #include <string.h> void Filter (gnBase *DataIn, gnBase **DataOut, double min, double max) { long i, j, var; double *val; gnData **Array; cxErrorCode err; gnPrimType type; /* Create duplicate of input gnBase structure */ *DataOut = gnBaseDup(DataIn); if (*DataOut == NULL) return; /* Get number of variables and gnData array pointer */ var = gnBaseNumvariablesGet(DataIn, &err); Array = gnBaseDataarrayGet(*DataOut, &err); /* Variable loop */ for (i = 0; i < var; i++) { /* If this variable is a variate, restrict values */ type = gnDataTypeGet(Array[i], &err); if (type == gn_Variate) { for (j = 0; j < gnDataLengthGet(Array[i], &err); j++) { val = &(((double *)gnDataValuesGet(Array[i], &err))[j]); if (*val < min) { *val = min; } if (*val > max) { *val = max; } } } } }
It has been demonstrated that a new data type can be successfully incorporated into IRIS Explorer. The new data type was taken from an application that was previously completed unrelated to IRIS Explorer. Due to the flexibility of IRIS Explorer typing, the type could be specified to exactly match the required data structure. The automatically generated API functions provide the programmer with a means to manipulate all parts of the data structure, without having to know about the underlying type definition. Examples of how the API functions could be used within modules were provided.
The inability of the module builder to interpret a double pointer to a shared structure within another shared structure meant that module data wrapper code could only be generated by the module builder when the complete data structure was passed between ports and function arguments. This means that when writing filter modules, the programmer has to copy the parts of the data structure that remain unchanged within the user function, rather than leaving this to the module data wrapper.
1. IRIS Explorer User's Guide (1995). The Numerical Algorithms Group Ltd
2. Genstat 5 Release 3 Reference Manual (1993). Genstat 5 Committee of the Statistics Department Rothamsted Experimental Station. Oxford University Press.
3. IRIS Explorer Module Writer's Guide (NT) (1997). The Numerical Algorithms Group Ltd
4. IRIS Explorer Module Writer's Guide (1997). The Numerical Algorithms Group Ltd