3 Data Storage
In this chapter, we describe some of the common ways in which data is stored, under what format, and with what structure.
Learning Objectives
- Get exposed to a handful of common file-formats for data tables.
- Explain the difference between field delimeter files, and fixed width files.
- List three of the most common types of delimeters for delimited-based files.
3.1 Data Tables
Data is commonly dislayed in tabular form. This tabular form means that there are rows and columns to display data.
In rows items are arranged from left to right.
In columns items are are arranged from top to bottom.
Therefore tables generally have the following format
Generally speaking the columns in a table represent variables of the dataset while the rows contain the observations or attributes for each record or entity. When data is stored like this it allows for the following:
- each row typically represents a single comlete record or entity
- rows allow us to compare different data points related to the same entity
- each column represents a specific characteristic, attribute, or data field
- columns allow for the analysis of a single data field across all records
Most data in tabular form can be stored in spreadsheets (.xls, x.lxs), but it’s best to store in a text file (.txt,.csv).
The difference between rows and columns? For databases, a row usually describes the properties or fields of a single entity. A column will represent a field that is common to all entities, not just one.