Skip to content
HPCCSystems Solutions Lab
HPCCSystems Solutions Lab

DEDUP

The DEDUP function removes duplicates from a dataset based on the defined conditions. The result is dataset with unique values for selected fields.

Note To use DEDUP you dataset must be sorted.

Syntax

DEDUP(dataset, [, condition])
ValueDefinition
DEDUPRequired.
datasetInput dataset to process.
conditionA comma-delimited list of expressions or key fields in the dataset that defines “duplicate” records.

Demo Dataset

StudentIDNameCityStateZipCodeDepartment
300SarahDallasTe30000
400MattMedical
305LizAtlantaGA30330
305LizsmyrnaGA30330
100ZoroAtlantaGA30330
100ZorosmyrnaGA30330
800SandyScience
604DannyNewyorkNY40001
409DanNewyorkNY40001
300SarahDallasTX30000

Example


/*
DEDUP Example:
Deduping the input dataset based on different fields.
Keep in mind that for DEDUP your dataset must be sorted.
*/

Student_Rec := RECORD
  INTEGER   StudentID;
  STRING    Name;
  STRING    City;
  STRING2   State;
  STRING5   ZipCode;
  STRING    Department;
END;

Student_DS := DATASET([
              {300,	'Sarah', 'Dallas',	'Te',	'30000',	'Art'},
              {400,	'Matt',	 	'',		     '',  '',       'Medical'},
              {305,	'Liz',	 'Atlanta',	'GA',	'30330',  'Math'},	
              {305,	'Liz',	 'smyrna',	'GA',	'30330',  ''},	
              {100,	'Zoro',	 'Atlanta',	'GA',	'30330',  ''},	
              {100,	'Zoro',  'smyrna',	'GA',	'30330',  ''},	
              {800,	'Sandy', '',		     '',  '',       'Science'},
              {604, 'Danny', 'Newyork',	'NY',	'40001',  ''},	
              {409,	'Dan',   'Newyork',	'NY',	'40001',	'Medical'},
              {300,	'Sarah', 'Dallas',	'Te',	'30000',	'Math'}],
              Student_Rec);


// Above dataset is already sorted.

DupMe := DEDUP(SortDS, StudentID, Name);
OUTPUT(DupMe, NAMED('DupMe'));

DupExp := DEDUP(SortDS, Name, Department);
OUTPUT(DupExp, NAMED('DupExp'));
Try Me