Skip to content
HPCCSystems Solutions Lab
HPCCSystems Solutions Lab

Big Data

Big data refers to a large amount of data that is hard to process and manage using traditional data management systems.

Big data is defined by three common properties called the 3Vs: Volume, Variety and Velocity. While there are other Vs involved such as Value and Veracity, the 3Vs are the most famous ones.

Velocity is the measurement of how fast data is coming into the system, it’s processed and it’s transferred to desire destination. The higher the velocity rate, the faster data is processed.

Variety refers to different type of data. Big data is often comprised of all different kinds of data, each of which needs to processed separately.

Volume is the size of the dataset. Larger datasets could require different processes or infrastructure.

Big Data Types


Structured

Structured data is data that is clearly defined and formatted following organization standards and (possibly) relational database rules. Since data is formatted and clearly defined, querying this kind of data is easier and faster.

Exp: Relational database tables, address books.


Unstructured

Unstructured data refers to the data that lacks any specific form or structure. Processing data is difficult, time consuming and prone to errors. Unstructured data is stored in its original format and remains that way until needed. Keep in mind that in order to meaningfully work with data, some kind of structure must be imposed on it.

Exp: Pictures, videos, audios.


Semi-structured

Contains both Structure and Unstructured data. Semi-structured data has internal tags and markings to describe the data elements.

Exp: Emails; data saved in CSV, JSON, or YAML formats.