2 How to Think About Data

When a data analyst thinks about data they will likely view that data in a table format with some visualization of that data

A data analyst likely thinks of mathematical and statistical abstractions. This may involve putting things in terms of relationships or associations, perhaps theoretical mathematical models that take various forms; linear, quadratic, non-parametric.

In general, we should have the following in our minds when thinking of variables in our data

quantitative (numerical) vs. qualitative (categorical)
continuous vs. discrete
ordinal vs nominal
scales: ratio, interval
dependent vs. independent
descriptors (predictors) vs. response
input vs. output
correlations
theoretical models (linear, quadratic, etc.)

Let’s define some of these terms

Quantitative:: Relating to, measuring, or measured by the quantity of something rather than its quality. This data is expressed in numbers and can be measured. Quantitative data is also referred to as numerical data.

Quantitative data is grouped into two main types; continuous and discrete.

Example: The folllowing are all examples of quantitative data

weight in pounds
length in centimeters
dollar value of a company’s stocks

Qualitative:: Relating to, measuring, or measured by the quality of something rather than its quantity. This data is usually expressed as labels, groups, or categories. Qualitative data is also referred to as categorical data.

Qualitative data is grouped into two main types; nominal or ordinal.

Continuous:: Suppose $f$ is a function that’s defined on a set $X$ of real numbers and $x_o \in X$. The $f$ is continuous at $x_o$ if

\[ \lim_{x \to x_0} f(x) = f(x_0). \]

Then we say that $f$ is continuous on the set $X$ if it is continuous at each number in $X$.

What this definition is saying is that a function has no discontinuous points such as a jump or a break along a given interval.

Jump discontinuity

\[ \lim_{x \to x_0} f(x) \; \text{does not exist} \]

Removable discontinuity

\[ \lim_{x \to x_0} f(x) \neq f(x_o) \]

Infinite discontinuity

\[ \lim_{x \to x_0} f(x) = \infty \]

Discrete:: A numerical type of data that refers to distinct elements. In other words, objects or elements that are unconnected. Discrete elements almost always finite.
Ordinal:: A categorical type of data that represents a group or category that does have an order or ranking.
Example:: The following are some examples of ordinal data

Education level (high school, bacehelor’s, master’s, etc.)
Socioeconomic status (low, middle, upper)

Nominal:: A categorical type of data that represents a group or category that does not have any order or ranking.
Example:: The following are examples of nominal data

Hair color (black, brown, blonde, etc.)
Vehicle types (SUV, coupe, sedan, crossovers, etc.)

--- title: "How to Think About Data" format: html --- When a data analyst thinks about data they will likely view that data in a table format with some visualization of that data ![](../images/output-images/scatter-plot-1.png) A data analyst likely thinks of mathematical and statistical abstractions. This may involve putting things in terms of relationships or associations, perhaps theoretical mathematical models that take various forms; **linear**, **quadratic**, **non-parametric**. In general, we should have the following in our minds when thinking of variables in our data - quantitative (numerical) vs. qualitative (categorical) - continuous vs. discrete - ordinal vs nominal - scales: ratio, interval - dependent vs. independent - descriptors (predictors) vs. response - input vs. output - correlations - theoretical models (linear, quadratic, etc.) Let's define some of these terms Quantitative: : Relating to, measuring, or measured by the quantity of something rather than its quality. This data is expressed in numbers and can be measured. **Quantitative** data is also referred to as **numerical** data. **Quantitative** data is grouped into two main types; **continuous** and **discrete**. Example : The folllowing are all examples of quantitative data - weight in pounds - length in centimeters - dollar value of a company's stocks Qualitative: : Relating to, measuring, or measured by the quality of something rather than its quantity. This data is usually expressed as labels, groups, or categories. **Qualitative** data is also referred to as **categorical** data. **Qualitative** data is grouped into two main types; nominal or ordinal. Continuous: : Suppose $f$ is a function that's defined on a set $X$ of real numbers and $x_o \in X$. The $f$ is **continuous** at $x_o$ if $$ \lim_{x \to x_0} f(x) = f(x_0). $$ Then we say that $f$ is **continuous on the set** $X$ if it is continuous at each number in $X$. What this definition is saying is that a function has no discontinuous points such as a *jump* or a *break* along a given interval. 1. Jump discontinuity $$ \lim_{x \to x_0} f(x) \; \text{does not exist} $$ 2. Removable discontinuity $$ \lim_{x \to x_0} f(x) \neq f(x_o) $$ 3. Infinite discontinuity $$ \lim_{x \to x_0} f(x) = \infty $$ Discrete: : A numerical type of data that refers to distinct elements. In other words, objects or elements that are unconnected. Discrete elements almost always *finite*. Ordinal: : A categorical type of data that represents a group or category that does have an order or ranking. Example: : The following are some examples of ordinal data - Education level (high school, bacehelor's, master's, etc.) - Socioeconomic status (low, middle, upper) Nominal: : A categorical type of data that represents a group or category that does not have any order or ranking. Example: : The following are examples of nominal data - Hair color (black, brown, blonde, etc.) - Vehicle types (SUV, coupe, sedan, crossovers, etc.)