This article aims to show you how to either create a random population or import a dataset then take a random sample using R
.
What is a Sample?
So when you have a population of something, you'll start to notice that the population has certain characteristics. The characteristics (or parameters) could include the average (mean
) of the population, the standard deviation
of the population or something else.
In certain situtations, you might not know what those population parameters are so the way we try to estimate one is by taking a sample and study it.
Taking a Sample
When you take a sample, you need to know how many items to take. This is called the sample count and we will refer to as n
. From there, we can calculate a statistic that will be used to estimate a parameter.
Generate a Population, Take a Sample
Create a Random Population in R
We use the matrix object to create a random matrix.
# Set the seed of R's random number generator, which is useful for creating simulations or random objects that can be reproduced.
# set.seed(5)
# Create a matrix object
nCols = 5
nRows = 3
population <- matrix( runif(nCols * nRows), ncol = nCols )
# Print the Population
print.listof(list(population))
Here's how to only show the first row of apopulation.
# Print the First Row of the Population
first_row <- population[1,]
print(first_row)
Here's how to show the first column of a population.
# Print the First Column of the Population
first_col <- population[,1]
print(first_col)
Take a Sample from the Population
# Create a Random Population
nCols = 5
nRows = 3
population <- matrix( runif(nCols * nRows), ncol = nCols )
# Print the Population
print.listof(list(population))
# Generate a Random Sample from the Population
n <- 5
random_sample = sample(population, n)
# Print a Random Sample from the Population
sprintf("Randon Sample of %s item is %1.7f", length(n), random_sample)
Import a Population, Take a Sample
In this example, we are importing CSV data from Github and taking a random sample.
# This will allow you to reproduce the same random results I do.
set.seed(10)
# 2. Load CSV Data
df <- read.csv('https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/women.csv', header=T )
# 3. Get a Random Sample of the Data
num_of_rows = 10
my_sample = df[sample(nrow(df), num_of_rows), ]
print(my_sample)
Resources
- Sampling Distributions on Khan Academy.
- Generating Random Samples from Other Distributions
Statistics: Calculating Probabilities using ES6
Suppose you wanted to calculate the percentage of men that weigh inbetween 140 and 170 lbs. This is possible if you carry a few data points such as...