Up to 80% of all AI projects is about collecting data:
- What data is Required?
- What data is Available?
- How to Select the data?
- How to Collect the data?
- How to Clean the data?
- How to Prepare the data?
- How to Use the data?
What is Data?
Data can be many things. With Artificial Intelligence it must be a collection of facts:
|Measurements||Size. Height. Weight.|
|Words||Names and Places.|
|Descriptions||It is cold.|
Intelligence Needs Data
Human intelligence needs data:
A real estate broker needs data about sold houses to estimate prices.
Artificial intelligence needs data:
A computer program also needs data to estimate prices.
The most common data to collect are Numbers and Measurements.
Often data are stored in arrays representing the relationship between values.
This table contains house prices versus size:
Quantitative vs. Qualitative
Quantitative data are numerical:
- 55 cars
- 15 meters
- 35 children
Qualitative data are descriptive:
- It is cold
- It is long
- It was fun
Census or Sampling
A Census is when we collect data for every member of a group.
A Sample is when we collect data for some members of a group.
If we wanted to know how many Americans smoke cigarettes, we could ask every person in the US (a census), or we could ask 10 000 people (a sample).
A census is Accurate, but hard to do. A sample is Inaccurate, but is easier to do.
A Population is group of individuals (objects) we want to collect information from.
A Census is information about every individual in a population.
A Sample is information about a part of the population (In order to represent all).
In order for a sample to represent a population, it must be collected randomly.
A Random Sample, is a sample where every member of the population has an equal chance to appear in the sample.
A Sampling Bias (Error) occurs when samples are collected in such a way that some individuals are less (or more) likely to be included in the sample.