Skip to content
HPCCSystems Solutions Lab
HPCCSystems Solutions Lab

TABLE

TABLE is the most commonly-used data aggregation functions in ECL. It creates a new dataset in memory while workunit is running. The new table inherits the implicit rationality the recordset has (if any), unless the optional expression is used to perform aggregation. There are two types of Table:

Vertical Number of records in the input dataset is equal to generated table, which means no aggregation is involved.

CrossTab There is at least one field using an aggregate function with the keyword Grouping Condition as its first parameter. The number of records produced is equal to the number of distinct values of the expression.

Syntax

/*** Stand-alone record definition ***/
out_record_def := RECORD
              dataset.field;
              dataset.field;
              ...
              field_name := Agg_Func(GROUP, dataset.field);
              field_name := Agg_Func(GROUP, dataset.field);
              field_name := COUNT(GROUP);
              ....
END;

attr_name := TABLE(dataset,
                    out_record_def,
                    grouping-conditions
                    [, flags]
                    );
ValueDefinition
out_record_defRecord definition that will contain both the grouping condition results and any new attributes computed as part of the aggregation
dataset.fieldfield(s) from input dataset
field_nameNewly defined fields
attr_nameThe name by which the table will be invoked
TABLERequired
datasetInput dataset to create the table from
grouping_conditionOne or more comma-delimited expressions. Please see Grouping Condition for more information
flagsOptional flags that can alter the behavior of TABLE
/*** Implicit record definition ***/
attr_name := TABLE(dataset,
                    {
                      field,  //Calling specific field from input dataset
                      field,
                      ...
                      field_name := Agg_Func(GROUP, dataset.field);
                      field_name := Agg_Func(GROUP, dataset.field);
                      field_name := COUNT(GROUP);
                      ....
                    },
                    grouping_conditions
                    [, flags]
                    );

ValueDefinition
attr_nameThe name by which the table will be invoked
TABLERequired
fieldfield(s) from input dataset
field_nameNewly defined fields
grouping_conditionOne or more comma-delimited expressions. Please see Group for more information
flagsOptional flags that can alter the behavior of TABLE

Grouping Condition

  • One or more comma-delimited expressions
  • An expression could simply be an attribute name within the dataset; this is the most common usage
  • An expression could be a computed value, such as (myValue % 2) to group on even/odd values
  • All records within dataset that evaluate to the same set of condition values will be grouped together
  • Each group will result in one output record
  • Functions evaluated within outrecorddef will operate on the group

Optional Flags

Flags can alter the behavior of TABLE. Commonly used flags are MERGE and LOCAL

FlagDefinition
FEWIndicates that the expression will result in fewer than 10,000 distinct groups. This allows optimization to produce a significantly faster result.
MANYIndicates that the expression will result in many distinct groups.
UNSORTEDSpecifies that you don’t care about the order of the groups. This allows optimization to produce a significantly faster result.
LOCALSpecifies the operation is performed on each node independently; the operation maintains the distribution of any previous DISTRIBUT.
KEYEDSpecifies the activity is part of an index read operation, which allows the optimizer to generate optimal code for the operation.
MERGESpecifies that results are aggregated on each node and then the aggregated intermediaries are aggregated globally. This is a safe method of aggregation that shines particularly well if the underlying data was skewed.
SKEWIndicates that you know the data will not be spread evenly across nodes.

GROUP

The GROUP keyword is used within output format parameter (RECORD Structure) of a TABLE definition. GROUP replaces the recordset parameter of any aggregate built-in function used in the output to indicate the operation is performed for each group of the expression. This is similar to an SQL “GROUP BY” clause.

Demo Dataset

Pickup_DateFareDistance
1/1/202125.115.5
1/2/202140.15,7.2
1/3/202125.366.5
1/2/202112023
1/3/20213060.75
2/2/20212571
1/2/2021102.2
3/10/20214512.23

/*
TABLE Example:
TABLE is used with aggregations
*/

// Input layout
Fare_Layout  :=  RECORD
    STRING Pickup_Date;
    REAL   Fare;
    REAL   Distance;
END;

// Input dataset
FareDS := DATASET([
                   {'1/1/2021', 25.1, 15.5}, {'1/2/2021', 40.15,7.2},
                   {'1/3/2021', 25.36, 6.5}, {'1/2/2021', 120, 23},
                   {'1/3/2021', 30, 60.75}, {'2/2/2021', 25, 71},
                   {'1/2/2021', 10, 2.2}, {'3/10/2021', 45, 12.23}],
                   Fare_Layout);

// Defining all fields for the table
AvgRide_Layout := RECORD
   fareDS.pickup_date;                   // Calling specific field from input dataset
   avgFare   := AVE(GROUP, fareDS.fare); // Calculating avg fare per each group
   totalFare := SUM(GROUP, fareDS.fare); // Calculating total fare per each group
END;

crossTabDs := TABLE(FareDS,           // Input dataset. please see dataset above
                     AvgRide_Layout,  // Result table definition
                     pickup_date      // Grouping field
                     );

OUTPUT(crossTabDs, NAMED('crossTabDs'));
Try Me

Demo Dataset

PersonIDFirstNameLastNameisEmployedAvgIncomeEmpGroupNum
1102FredSmithFALSE1000900
3102FactSmithTRUE200000100
1012JoeBlowTRUE11250200
2085BlueMoonTRUE185000500
3055SilverJoFALSE5000900
1265DarlingJoTRUE5000100
1265DarlingAlexTRUE5000100
5265BlueSilverTRUE75000200
7333JaneSmithFALSE50000900
6023AlexSilverTRUE102000200
1024NancyMoonTRUE201100700

/*
TABLE Example:
Cross table example.
*/

AllPeople_Layout := RECORD
  UNSIGNED  PersonID;
  STRING15  FirstName;
  STRING25  LastName;
  BOOLEAN   isEmployed;
  UNSIGNED  AvgIncome; 
  INTEGER   EmpGroupNum;
END;


AllPeopleDS := DATASET([ 
                       {1102,'Fred','Smith', FALSE, 1000, 900},
                       {3102,'Fact','Smith', TRUE, 200000, 100},
                       {1012,'Joe','Blow', TRUE, 11250, 200},
                       {2085,'Blue','Moon', TRUE, 185000, 500},
                       {3055,'Silver','Jo', FALSE, 5000, 900},
                       {1265,'Darling','Jo', TRUE, 5000, 100},
                       {1265,'Darling','Alex', TRUE, 5000, 100},
                       {5265,'Blue','Silver', TRUE, 75000, 200},
                       {7333,'Jane','Smith', FALSE, 50000, 900},
                       {6023,'Alex','Silver',TRUE, 102000, 200},
                       {1024,'Nancy','Moon', TRUE, 201100, 700}],
                       AllPeople_Layout);

VerticalSlice := Table(AllPeopleDS,
                        {
                          LastName,
                          isEmployed
                        },
                        LastName, isEmployed);
OUTPUT(VerticalSlice, NAMED('VerticalSlice'));


AvgIncome := TABLE(AllPeopleDS,
                    {
                      LastName,
                      AvgHouseIncome := AVE(GROUP, AvgIncome)
                    },
                    LastName);

OUTPUT(AvgIncome, NAMED('AvgIncome'));
Try Me