Mathematics Lesson 71 – Sampling Methods | Dataplexa

Sampling Methods

In statistics and data analysis, we rarely study an entire population. Instead, we study a smaller group called a sample.

Sampling methods define how we choose this sample so that it correctly represents the whole population.

This lesson is extremely important for school mathematics, competitive exams, survey design, business analytics, data science, and machine learning.


Why Sampling Is Necessary

Studying an entire population is often:

  • Too expensive
  • Too time-consuming
  • Sometimes impossible

For example, we cannot test every bulb produced in a factory or interview every citizen of a country. Sampling makes analysis practical.


Population vs Sample

A population includes all individuals or items we want to study.

A sample is a smaller subset selected from the population.

The goal of sampling is to make conclusions about the population using information from the sample.


Key Requirement of a Good Sample

A good sample must be:

  • Representative of the population
  • Free from bias as much as possible

Poor sampling leads to wrong conclusions, even if calculations are correct.


Types of Sampling Methods

Sampling methods are broadly classified into:

  • Probability sampling
  • Non-probability sampling

This classification is very important for exams.


Probability Sampling (Overview)

In probability sampling, every member of the population has a known, non-zero chance of being selected.

This type of sampling allows statistical inference and error estimation.

It is preferred in scientific and data-driven studies.


Simple Random Sampling

In simple random sampling, each member of the population has an equal chance of being selected.

Selection is purely random, similar to drawing names from a box.

This method is conceptually simple and often used in exams.


Example: Simple Random Sampling

Suppose a class has 50 students, and we randomly select 10 students using a lottery.

Each student has the same probability of selection. This is simple random sampling.

It works best when the population is fairly homogeneous.


Advantages and Limitations of Simple Random Sampling

Advantages:

  • Easy to understand
  • Minimizes selection bias

Limitations:

  • May not represent subgroups well
  • Not efficient for very large populations

Systematic Sampling

In systematic sampling, we select every k-th element from the population.

The first element is chosen randomly, and the rest follow a fixed pattern.

This method is simple and fast.


Example: Systematic Sampling

Suppose a factory produces 1,000 items and we want to inspect 100 items.

We choose every 10th item after a random start.

This is systematic sampling.


Advantages and Risks of Systematic Sampling

Advantages:

  • Easy to implement
  • Ensures even coverage

Risk:

  • Hidden patterns in data can introduce bias

This risk is often tested conceptually in exams.


Stratified Sampling

In stratified sampling, the population is divided into groups called strata.

A random sample is then taken from each stratum.

This ensures all important subgroups are represented.


Example: Stratified Sampling

Suppose a school has:

  • 60% boys
  • 40% girls

A stratified sample preserves this ratio when selecting students.

This method is very powerful and widely used.


Why Stratified Sampling Is Effective

Stratified sampling:

  • Improves accuracy
  • Reduces sampling error
  • Ensures subgroup representation

It is preferred when population has clear categories.


Cluster Sampling

In cluster sampling, the population is divided into clusters, usually based on geography or location.

Instead of sampling individuals, entire clusters are randomly selected.

This method is cost-effective.


Example: Cluster Sampling

Suppose a city has 100 schools.

We randomly select 10 schools and study all students in those schools.

Each school is a cluster.


Difference Between Stratified and Cluster Sampling

Aspect Stratified Cluster
Groups formed Homogeneous within strata Heterogeneous within clusters
Sampling From every group Entire groups selected
Cost Higher Lower

This comparison is frequently asked in exams.


Non-Probability Sampling (Overview)

In non-probability sampling, not every member has a known chance of selection.

These methods are easier to use but may introduce bias.

They are common in surveys and quick studies.


Convenience Sampling

Convenience sampling selects whoever is easiest to reach.

Example:

  • Surveying people in a nearby mall

This method is fast but unreliable.


Judgment (Purposive) Sampling

In judgment sampling, the researcher selects samples based on expertise or judgment.

Example:

  • Interviewing experienced doctors only

It is useful in expert studies, but not statistically generalizable.


Sampling Bias

Sampling bias occurs when some members of the population are more likely to be selected than others.

Bias leads to misleading conclusions, even with large sample sizes.

Avoiding bias is a core goal of sampling.


Sample Size Considerations

Larger samples generally:

  • Reduce random error
  • Improve reliability

However, larger samples also increase cost.

Sampling quality matters more than size alone.


Sampling in Competitive Exams

Exams commonly test:

  • Types of sampling
  • Differences between methods
  • Bias and representativeness

Clear definitions and comparisons are crucial.


Sampling in Business & Surveys

Businesses use sampling to:

  • Understand customer behavior
  • Test products
  • Measure satisfaction

Correct sampling leads to reliable decisions.


Sampling in Data Science

In data science, sampling is used for:

  • Train-test split
  • Cross-validation
  • Handling large datasets

Sampling directly affects model performance.


Sampling in Machine Learning

Machine learning depends heavily on sampling:

  • Balanced datasets
  • Random mini-batches
  • Bias reduction

Poor sampling can lead to biased models.


Common Mistakes to Avoid

  • Assuming convenience samples represent population
  • Ignoring subgroups
  • Confusing stratified and cluster sampling

Always think about representation and bias.


Practice Questions

Q1. Which sampling method ensures subgroup representation?

Stratified sampling

Q2. Which method selects entire groups?

Cluster sampling

Q3. Is convenience sampling unbiased?

No

Quick Quiz

Q1. Does probability sampling allow statistical inference?

Yes

Q2. Can large biased samples still be misleading?

Yes

Quick Recap

  • Sampling selects a subset from a population
  • Probability sampling is scientifically reliable
  • Stratified and cluster sampling serve different purposes
  • Bias is the biggest danger in sampling
  • Sampling is critical in statistics, DS, and ML

With sampling methods understood, you are now ready to learn Hypothesis Testing Basics, where data is used to make decisions.