ADVERTISEMENTS:
In this article we will discuss about the presentation methods of statistical data.
1. Tabulation:
Tables are devices for presenting data simply from masses of statistical data. Tabulation is the first step before data is used for analysis. Tabulation can be in form of Simple Tables or Frequency distribution table (i.e., data is split into convenient groups).
ADVERTISEMENTS:
2. Charts and Diagrams:
They are useful methods in presenting simple statistical data. Diagrams are better retained in the memory than statistical tables.
The methods used are:
(a) Bar Charts:
ADVERTISEMENTS:
They are merely a way of presenting a set of numbers by the length of a bar. The bar chart can be simple, multiple or component type.
(b) Histogram:
It is a pictorial diagram of frequency distribution. It consists of a series of blocks. The class intervals are given along the horizontal axis and the frequencies along the vertical axis.
(c) Frequency Polygon:
A frequency distribution may be represented diagramatically by the frequency polygon. It is obtained by joining the mid-points of the histogram blocks.
(d) Line Diagram:
Line diagram are used to show the trend of events with the passage of time.
(e) Pie Charts:
Instead of comparing the length of a bar, the areas of segments of a circle are compared. The area of each segment depends upon the angle.
ADVERTISEMENTS:
(f) Pic to Gram:
Pictogram is a popular method of presenting data to the “man in the street” and to those who cannot understand orthodox charts. Small pictures or symbols are used to present the data.
3. Statistical Maps:
When statistical data refer to geographic or administrative areas, it is presented either as “Shaded Maps” or “Dot Maps” according to suitability.
4. Statistical Averages:
The term “average” implies a value in the distribution, around which the other values are distributed. It gives a mental picture of the central value.
The types of averages used are:
(i) The Mean (Arithmetic Mean):
To obtain the mean, the individual observations are first added together, called summation or ‘S’ then divided by the number of observations. Means is denoted by the sign X̅ (called “X bar”).
(ii) The Median:
It is an average of a different kind, which does not depend upon the total and number of items. To obtain the median, the data is first arranged in an ascending or descending order 0 of magnitude, and then the value of the middle observation is located.
(iii) The Mode:
It is the most frequent item in series of observations.
5. Measures of Dispersion:
(a) The Range:
The range is by far the simplest measure of dispersion. It is defined as the difference between the highest and lowest figures in a given sample. If we have grouped data, the range is taken as the difference between the midpoints of the extreme categories.
(b) The Mean Deviation:
It is the average of the deviations from the arithmetic mean.
(c) The Standard Deviation:
It is the most frequently used measure of deviation. In simple terms, it is defined as “Root-Means- Square-Deviation”. It is denoted by Greek letter 6.
It is calculated by formula:
When the sample size is more than 30, the above basic formula may be used without modification. For smaller samples, the above formula tends to underestimate the standard deviation, and therefore needs correction i.e., use n-1 instead of n.
The meaning of standard deviation can only be appreciated fully when we study it with reference to “normal curve”. The larger the standard deviation the greater the dispersion of values about the mean.
(d) Normal Distribution:
The normal distribution or normal curve is an important concept in statistical theory. The shape of the curve will depend upon the mean and standard deviation which in turn will depend upon the number and nature of observation.
It is important to note that:
i. The area between one S.D. on either side of the mean will include approximately 68% of the values in the distribution.
ii. The area between two S.D. on either side of the mean will cover most of the values i.e., approximately 95% of the values.
iii. The area between three S .D. will include 99.7% of the values. These limits on either side of the mean are called “confidence limits”.
(e) Standard Normal Curve:
Although there is an infinite number of normal curves depending upon the mean and S.D., there is only standard normal curve. It is a smooth, bell shaped perfectly symmetrical curve based on an infinitely large number of observations. The total area of the curve is 1, its mean is 0 and its S.D. is 1. The mean, median and mode all coincide. The distance of a value (x) from the mean (X̅) of the curve in units of S.D. is called “relative deviate or standard normal variate” and is usually denoted by Z.
The standard normal deviate or Z is given by formula:
6. Sampling:
When a large proportion of individuals or items or units have been studied we take a sample.
The commonly used sampling methods are:
(a) Simple Random Sample:
This is done by assigning a number to each of the units (the individuals) in the sampling frame. Random numbers are a haphazard collection of certain numbers, arranged in a running manner to eliminate personal selection of unconscious bias in taking out the sample. This technique provides the greatest number of possible samples.
(b) Systematic Random Sample.
This is done by picking every 5th or 10th units at regular intervals. By this method, each unit is the sampling frame would have the same chance of being selected, but the number of possible samples is greatly reduced.
(c) Stratified Random Sample:
The sample is deliberately drawn in a systematic way so that each portion of the sample represents a corresponding strata of the universe. This method is particularly useful where one is interested in analysing the data by a certain characteristic of the population e.g., Hindus, age-groups etc.
Sampling Errors:
If we take repeated samples from the same population or universe, the results obtained from one sample will differ to some extent from the results of another sample. This type of variation from one sample to another is called sampling error. It occurs because data were gathered from a sample rather than from the entire population of concern. The factors which influence the sampling error are size and variability of individual readings.
Non-Sampling Errors:
Errors may occur due to inadequate calibrated instruments, due to observer variation as well as due to incomplete coverage achieved in examining the subjects selected and conceptual errors.
Standard Error:
If we take a random sample from the population, and similar samples over and over again, we will find that every sample will have a different mean. The S.D. of the means is a measure of the sample error and is given by the formula σ√n which is called the standard error or the standard error of the mean.
7. Tests of Significance:
These are:
(a) Standard Error of the Mean:
S.D. of the mean is also called the standard error and the distribution of the sample means about the true mean of the universe. This is called to set confidence limits and find out level of significance.
(b) Standard Error of Proportion:
In this, instead of means, proportions and its universe are used in a sample.
This is calculated by the formula:
where p and q are proportions and n = size of the sample.
(c) Standard Error of Difference Between Two Means:
To compare the results between two groups (e.g., control group and experimental group), the difference between the means of two group is compared to indicate that the samples represent two different universe. This is done by calculating the standard error of difference between the two means.
The formula is: S.E. (d):
(d) Standard Error of Difference Between Proportions:
Instead of means, sometimes one has to test the significance of difference between two proportions or ratios to find out if the difference between the two proportions or ratios have occurred by chance.
In this case, we calculate the standard error of difference between two proportions:
(e) Chi-Square Test:
Chi-square (x2) test offers an alternate method of testing significance of difference between two proportions. It has advantage that it can be used when more than two groups are to be compared.
(i) Test the ‘Null Hypothesis’:
First, one has to set up a hypothesis, called the Null Hypothesis that there was no difference between the findings of the two groups.
(ii) Applying the χ2 Test:
(iii) Finding the Degree of Freedom:
d.f. – (c-1) (r-1)
(c = number of columns, r = number of rows)
(iv) Probability Tables:
By knowing χ2 and d.f. value, find out a probability from published tables.
(f) Correlation and Regression:
Inferential Statistics:
These assess the meaning of the data e.g.,:
i. Correlation Coefficient:
Measures the statistical relationship between two sets of variables, without assuming that either is dependent or independent. C.C. of 1.0 implies exact similarity and C.C. of 0.0 means no relationship.
ii. Regression Coefficient:
Measures relationship between two sets of variables but assumes that one is dependent and the other is independent.
iii. Parametric Statistics:
Assume a normal distribution (e.g., the student’s test). Non parametric statistics use data which are not normally distributed (e.g., chi square test).
iv. Factor Analysis:
Looks for the minimum number of dimensions which can be used to define a group. This will generate dimensions (e.g., psychotic neurotic). Factors are an expression of the relationship between attributes, not between individuals.
v. Cluster Analysis:
Can only generate clusters not dimensions.
Reliability:
The extent to which there is repeatability of an individual’s score or other’s test result.
It is of following types:
(a) Test Retest Reliability:
High correlation between scores on the same test given on two occasions.
(b) Alternate Form Reliability:
High correlation between two forms of the same test.
(c) Split Half Reliability:
High correlation between two halves of the same test.
(d) Inter Rater Reliability:
High correlation between results of two or more raters of the same test.
Validity:
The extent to which a test measures what it is designed to measure:
(a) Predictive Validity:
Ability of the test to predict outcome.
(b) Content Validity:
Whether the test selects a representative sample of the total tests for that variable.
(c) Construct Validity:
How well the experiment test the hypothesis underlying it.
Reliability Paradox:
A very reliable test may have low validity precisely because its results do not change i.e., it does not measure true changes.
Measurement in Psychiatric Research:
Aims may be:
i. To identify psychiatric cases
ii. To diagnose psychiatric disorder accurately
iii. To assess severity and change in severity.
Information is called by document studies (case notes, journal articles, census etc.), mail questionnaires (cheap and easy but low response rate, hence sample bias), self-rating questionnaires be answered inaccurately) and observer rated interview (structured, semi structured or informal, allow great flexibility and accuracy but are expensive and need training).
Source of Errors:
(a) Response Set:
Subject always tends either to agree or disagree with questions.
(b) Bias towards Centre:
Subject tends to choose the middle response and shun extremes.
(c) Social Acceptability:
Subject chooses the acceptable answer rather than the true one.
(d) Halo Effect:
Answers are chosen to ‘fit’ with previously chosen answers; responses become what is expected by the observer.
(e) Hawthorne Effect:
Researchers alter the situation by their presence.
Variables:
Variables are any constructs or events which research studies.
These are of following types:
(a) Independent Variable:
The antecedent condition manipulated by the experimenter (e.g., drug levels).
(b) Dependent Variable:
The variable used to measure the effect of the independent variable (e.g., recovery rates).
(c) Confounding variable:
Any extraneous variable whose potential influence on the dependent variable has not been controlled for. A source of error (e.g., age and sex imbalance).
(d) Controlled Variable:
A variable whose influence is kept constant by the experiment (e.g., other medications).
(e) Uncontrolled Variable:
A variable which is not manipulated or held constant, though it may be measured (e.g., life events).