Statistical Notions

Statistical Notions

C H A P T E R 3 Statistical Notions LEARNING OBJECTIVES After careful consideration of this chapter, you should be able: • • • • • • • • • To To To ...

61KB Sizes 0 Downloads 105 Views

C H A P T E R

3 Statistical Notions LEARNING OBJECTIVES After careful consideration of this chapter, you should be able: • • • • • • • • •

To To To To To To To To To

understand the significance of statistics. describe the meaning of data. explain the concept of variable. understand the classes of variables. distinguish between qualitative and quantitative variables. distinguish between continuous and discrete variables. compare the meaning of two concepts: population and sample. explain the methods of data collection. comprehend the sampling mechanism.

3.1 INTRODUCTION Statistical techniques are useful during research activities, covering designing, gathering data, testing the data, summarizing the outcomes, and eventually assisting in making smart decisions. Therefore, statistics is widely used by researchers to comprehend the behavior of the chosen variables (parameters) under investigation. Thus, professionals should study statistics to understand the common idea and related statistical terms that are needed for their research to save effort and time and reduce expenses, in addition to implementing forecasts. Moreover, scientists should understand the principles of statistics to comprehend the behavior of the variables treated by different statistical methods to get a clear picture of the research project and to obtain a deeper view of the project. In summary, choosing a suitable statistical test during the work stages will help in providing smart decisions employing minimal time, effort and expense. The concept of statistics and the two branches of statistics (descriptive and inferential) are presented in this chapter; moreover, the basic concepts of data, sample, population, random variable, and the techniques for gathering information (data) are given in this chapter as well.

3.2 THE CONCEPT OF STATISTICS It is well recognized that professionals are unable to undertake worthy studies without employing statistical techniques for designing, testing, and extracting useful inferences. Thus, professionals should comprehend the concept of statistics to be able to perform high-quality studies and gain excellent outcomes. Various definitions of statistics have been given by various writers. Thus, we can define statistics as the science of designing projects (studies), gathering data, arranging, summarizing, testing, explaining, and drawing useful inferences based on the gathered information (data). In general, statistics can be divided into two main areas: the first area is called descriptive statistics and the second, inferential statistics.

Applied Statistics for Environmental Science with R https://doi.org/10.1016/B978-0-12-818622-0.00003-4

29

© 2020 Elsevier Inc. All rights reserved.

30

3. STATISTICAL NOTIONS

Descriptive statistics represents the area of statistics that includes description of the methods of gathering, arranging, summarizing and displaying the gathered information (data). Inferential statistics represents the area of statistics that includes description of hypothesis testing, methods of estimation, and the investigation of the connections (relationships) between chosen variables under study and prediction.

3.3 COMMON CONCEPTS It is better to define several terms that are helpful and important to comprehend the subjects in this book. A population refers to a set of observations (objects, individuals, and members) of interest to the scientists; the observations could be any subject under study such as measurements, humans, or other subjects. We can recognize two types of populations: limited and unlimited. A limited population is, for example, kinds of treatment in the landfill leachate (limited number), and unlimited population, for example, the number of bacteria in a landfill leachate. A sample refers to a portion (subset) of objects (observations, item, individuals, and members) drawn (chosen) from a larger group called the population. Scientists in all areas can investigate the characteristics of populations using representative samples, because samples are easier to examine than the larger group (population). Moreover, samples are employed to save effort and time and reduce expenses. For example, investigating the heavy metal concentrations in sediment can be performed by choosing several points from a particular river (the population represents the river in the research region). The number of points in this river form a sample and have properties similar to the sediment in the river. Data refers to a group of observations (measurements) that have been produced and gathered from an experiment; for example, survey returns, measurements, and test outcomes. We can use a random variable to describe data according to the type of data. Random Variable refers to any characteristic of interest that can be measured and counted. The value of the variable is likely to change or vary from one test to another or may change in value over time. A variable is usually represented by an uppercase letter, such as X, Y, etc., and the value of the variable is usually represented by a lowercase letter, such as x, y, etc. In summary, we can identify two main kinds of random variables: qualitative variables and quantitative variables.

3.3.1 Qualitative Variables The qualitative variable is also known as a categorical or attribute variable (that are not numerical). This kind of variable assumes values that are tags, rubrics, or brands that are not helpful for computing various measures in statistics, such as the mean or variance, and others that employ numerical values. However, we can give numbers to appear numeric when we key in the data for identification aims, but the numbers are meaningless (identification objective). Of the kind of oxidation processes employed for stabilized leachate treatment, seven oxidation processes are chosen, including O3, Fenton, Fenton followed by O3, persulfate, persulfate followed by O3, simultaneous O3/Fenton, and O3/persulfate, and are considered as examples of qualitative variable. The second model that represent a qualitative variable includes the outcomes of an inspection classified either negative or positive.

3.3.2 Quantitative Variables A quantitative variable refers to a measureable quantity (takes numerical values) and can be arranged in order. For example, measures of heavy metals concentration in sediment, such as cadmium (Cd), iron (Fe), copper (Cu), and chromium (Cr), represent models of quantitative variables. Quantitative variables are classified into two types: discrete variable and continuous variable. 3.3.2.1 Discrete Variable A variable that can take on a countable or finite number of distinct values is called a discrete variable. In other words, discrete variables are numeric variables that can assume values such as 0, 1, 2, etc., (there is a gap between the values) and can be counted; for example, the number of sampling points in a river, number of visits to a landfill leachate.

3.5 SAMPLING METHODS

31

3.3.2.2 Continuous Variable A numeric variable that can assume any value within any two particular values (interval) is called a continuous variable. The values of this variable are gained by measuring, for example, pressure, temperature, or any readings obtained from a machine.

3.4 DATA GATHERING Scientists are required to perform studies to gain information that helps them to respond to a research problem to get a better picture or to find out something new. Thus, the objectives of the project and the type of data required should be understood by researchers, how the data is to be collected, and when the data is to be collected will help scientists to accomplish their goal and extract intelligent results. The data needed for the project relies on the type of variables in the study, and the origin.

3.4.1 Approaches for Gathering Data Collecting data can be carried out using several approaches. Four common approaches can be recognized in the environmental field: 1. 2. 3. 4.

Experimental work method Archives method Survey method Automated instrument method

We usually use a sample to collect the required data, and the type of data imposes the approach of choosing the individuals (items, sample data). 1. The experimental work method The experimental work method is widely used in the field of environmental science. We carry out experiment in the field or laboratories to collect the required information that helps us recognize the attitude of the chosen variable involved in the investigation. For example, we analyze water samples obtained from a river for their physiochemical parameters such as chemical oxygen demand (COD), biochemical oxygen demand (BOD), and total suspended solids (TSS); this test requests a sample (water) to be chosen and then examined in the laboratory for physiochemical parameters. Or, a scientist may wish to investigate the concentration of heavy metals such as lead (Pb), zinc (Zn), cadmium (Cd), copper (Cu), mercury (Hg), and chromium (Cr) in sediment gained from two rivers. Samples are gathered from each river and analyzed for the chosen heavy metals. 2. The historical records Historical records are records and statistics such as archived reports, statistics, studies, and indices that are saved (stored) in the archives of offices or any other bureau. In general, historical data is inexpensive and provides a description of past events. 3. The survey approach The survey approach is a non-experimental approach to collect data about individuals (observations, subjects) in a population. This approach demands a proper questionnaire to include the required data to be collected for the research; data are collected from a group of people or a community. The questions should be comprehensible, suitable, and linked to the topic under investigation. We can collect data by several methods, such as mailed questionnaires, telephone interviews, personal interviews, and online interviews. 4. The automated instrument method The last approach for collecting data is the automated instrument approach. This approach requires providing instruments (tools) to record the chosen variables and collect the desired information. By using this method, the data is produced automatically without varying the setting of the instruments.

3.5 SAMPLING METHODS Most projects are performed employing a sample (subset) to gather facts (data) regarding the chosen variables under study. It is well recognized that collecting data employing samples gives rise to reduced time, cost, and effort,

32

3. STATISTICAL NOTIONS

and helps the scientist to gain all the needed data regarding the behavior of the chosen variable as the population is investigated. Thus, a sampling method has been defined as the method employed by scientists to choose the individuals (observations) from the population under study to be contained in the representative sample. There are various methods to choose a sample from the population of concern that meets the study target. We must select representative and random samples to guarantee independence between different individuals (observations). Each individual must be given an equal chance to be chosen in the sample. The sampling methods studied in this book are systematic sampling, simple random sampling, cluster sampling, and stratified sampling.

3.5.1 Simple Random Sampling Simple random sampling is an essential method and is one of the most commonly applied sampling methods, granting each individual (item, observation) in the population the same opportunity of being chosen in the sample. Individuals in the sample are chosen by assigning a number to every individual in the population; then, numbered cards (each card indicates a defined individual) are put in a box and shuffled, and the desired number of individuals are chosen. Alternatively, we can use random-number tables or a computer to generate random numbers to choose the individuals of the sample.

EXAMPLE 3.1 CHECKING PRODUCTS Companies should test their product before shipping to customers, and they should check whether the output is ready for market (the produced units are similar). The ultimate output in the store of an environmental company represents the population of concern. Checking the appropriateness of the output requires choosing units randomly to represent the population of the company; these units form the sample to be checked before shipping, and a decision about the entire output is made based on the data delivered by the sample. The product is sent to market only if the analysis of the sample shows it to be suitable. Otherwise, the product should not be sent to market.

3.5.2 Systematic Sampling Systematic sampling is the second type of sampling method; we should choose the individuals (observations, units) of the sample based on a random starting point selected from N/n, (where N refers to the population size and n refers to the sample size), and then each mth unit is chosen according to a fixed periodic interval until we get the required sample size.

EXAMPLE 3.2 VARIOUS BATCHES An environmental company needs to choose a sample of their product and check it for capability. The output in the store contains various batches, and all batches generated by the company should be represented in the sample. We should select a systematic sample; a number should be assigned to all units in the sequence, and then a sample is chosen according to the interval between various numbers. As an example, if 10,500 units are generated in various batches in the store (the output is stored based on the batches), a sample of 250 units is required to be examined. The first unit should be selected randomly according to (10,500/250) ¼ 42; then, for every 42mth (m ¼ 1, 2, etc., up to 250) units, one unit is chosen. If a random start is chosen from the first forty two units (between 1 and 42)—say, number 8 is selected to be the first unit in the sample—then add 42 to 8 to obtain the second unit (number 50) and so on until the last unit that constitutes the sample is chosen. In this case, the sample would contain the units whose number is (8, 50, 92, etc.).

3.5.3 Stratified Sampling Stratified sampling is applied when the population under study is heterogeneous. The observations (individuals, units) of the chosen population should be separated into different layers, called strata, based on some major (main) properties. The observations of each layer must be similar and distinct from other layers. A simple random sample should be drawn from each layer (stratum) to be included in the sample.

FURTHER READING

33

EXAMPLE 3.3 PERFORMANCE OF A COMPANY An environmentalist tries to get better performance of a company. She proposes a modern plan (schedule) that helps to improve performance. She needs to understand how employees feel about the proposed plan. It is well known that various classes of employees work in the company, such as technicians, engineers, administrators, and so on. The sample should cover all classes in the company. Thus, we used stratified sampling to cover all classes, which requires choosing a random sample from each class (stratum) to be included in the sample. The chosen random samples were combined into one sample to represent the required sample that enables the environmentalist to study the ideas from all classes.

3.5.4 Cluster Sampling The forth sampling technique is called cluster sampling, which is employed when the population is partitioned into parts (sections), called clusters, for geographic reasons. Clusters are usually chosen randomly and then the entire observations (units, individuals) are employed in the chosen clusters. Cluster sampling method is different from stratified sampling because in cluster sampling the whole cluster is used, while we select a random sample in stratified sampling.

EXAMPLE 3.4 LANDFILL LEACHATE There are many landfill leachate locations distributed around Malaysia. An environmentalist wishes to investigate the behavior of the employees toward a certain issue, and he is unable to survey all landfills. Thus, cluster sampling should be employed at choosing landfills randomly, say four centers, and meeting all the employees at the selected landfills.

Further Reading Alkarkhi, A.F.M., Alqaraghuli, W.A.A., 2019. Easy Statistics for Food Science with R, first ed. Academic Press. Alkarkhi, A.F.M., Low, H.C., 2012. Elementary Statistics For Technologist. Universiti Sains Malaysia Press, Pulau Pinang. Allan, G.B., 2007. Elementary Statistics: A Step by Step Approach. Mcgraw-Hill. Donald, H.S., Robert, K.S., 2000. Statistics: A First Course. Mcgraw-Hill. Mario, F.T., 2004. Elementary Statistics. Pearson.