Skip to content
HPCCSystems Solutions Lab
HPCCSystems Solutions Lab

Filter

Data filtering is the process of choosing a smaller part of your data set and using that subset for further processing. It’s recommended to filter down to the desire dataset before any processing. When using filter on STRING values keep in mind that STRING values are case sensitive. For example Sun, sun, SUN are not the same.

SQL vs. ECL

Filter is similar to SELECT. In ECL the filtering fields are in ( ).

// SQL 
SELECT name, address FROM PeopleDS WHERE name = 'Jo';

// ECL
OUTPUT(peopleDS(name = 'Jo'));

Syntax

attr_name := dataset_name (filtering condition(s));
ValueDefinition
attr_nameThe name by which the function will be invoked.
dataset_nameThe dataset to perform action on.
Filtering condition(s)field or fields and required filtering conditions. Logical operators can be used to execute multiple filters.

Demo Dataset

PersonIDFirstNameLastNameisEmployedavgHouseIncome
102FredSmithFALSE0
012JoeBlowTRUE11250
085BlueMoonTRUE185000
055SilverJoFALSE5000
265DarlingJoTRUE5000
333JaneSmithFALSE50000

Example


/*
FILTER Example:
Showing different examples of FILTER function 
based on different fields or logical operator.
*/

// Creating record layout
Emp_layout := RECORD
    INTEGER  PersonID; 
    STRING   FirstName; 
    STRING   LastName; 
    BOOLEAN  IsEmp;
    INTEGER  RoundedIncome;
END; 

// Creating an inline dataset
Emp_DS := DATASET([
                {102,'Fred','Smith',FALSE,0},
                {012,'Joe','Blow',TRUE,11250},
                {085,'Blue','Moon',TRUE,185000},
                {055,'Silver','Jo',FALSE,5000},
                {265,'Darling','Jo',TRUE,5000},
                {333,'Jane','Smith',FALSE,50000}],
                Emp_layout);

// Filter Smith last name
GetSmith := Emp_DS(LastName='Smith');
OUTPUT(GetSmith, NAMED('GetSmith'));

// Notice that following filter will return an empty dataset
OUTPUT(Emp_DS(LastName='smith'), NAMED('Case_Sensitive'));


// Filter unemployed with income using logical operators
IsWorking := Emp_DS(IsEmp = FALSE AND
                    RoundedIncome > 0);

OUTPUT(IsWorking, NAMED('IsWorking'));

// Capturing everyone that is employed 
// Following filter is the same as: 
// Emp_DS(IsEmp = TRUE)
OUTPUT(Emp_DS(IsEmp), NAMED('Employed'));

Try Me

Logical Operators

OperatorDescription
=Equal
>Greater than
<Less than
>=Greater than or equal
<=Less than or equal
<>Not equal
!=Not equal
ANDLogical AND
ORLogical OR
INTo specify multiple possible values for a field/column.
NOT INTo specify multiple possible values that are not in a field/column.
BETWEENBetween a certain range.


Example


/*
Filter Example
*/

/*
Filter Example
*/
StrokRec := RECORD
    STRING   ID;	
    STRING   Gender;	
    INTEGER  Age;	
    BOOLEAN  Hypertension;	
    BOOLEAN  Heart_Disease;	
    STRING   Ever_Married;	
    STRING   Work_Type;	
    STRING   Residence_Type;	
    STRING   Avg_Glucose_Level;	
    STRING   BMI;	
    STRING   Smoking_status;	
    BOOLEAN  Stroke;
END;


StrokDS := DATASET('~raw::healthcare-dataset-stroke-data.csv', StrokRec, CSV(HEADING(1)));
// Filtering men over age of 80
Over80 := StrokDS(Age >= 80);
OUTPUT(Over80, NAMED('Over80'));
Try Me

Example


/*
Filter Example
*/

/*
Filter Example
*/
StrokRec := RECORD
    STRING   ID;	
    STRING   Gender;	
    INTEGER  Age;	
    BOOLEAN  Hypertension;	
    BOOLEAN  Heart_Disease;	
    STRING   Ever_Married;	
    STRING   Work_Type;	
    STRING   Residence_Type;	
    STRING   Avg_Glucose_Level;	
    STRING   BMI;	
    STRING   Smoking_status;	
    BOOLEAN  Stroke;
END;


StrokDS := DATASET('~raw::healthcare-dataset-stroke-data.csv', StrokRec, CSV(HEADING(1)));
// Over 80 years old men with heart disease 
OUTPUT(StrokDS(Gender = 'Male' AND Age >= 80 AND Heart_Disease), NAMED('Males'));
Try Me