ADVERTISEMENTS:
After reading this article you will learn about the intelligence tests and measurement of intelligence.
It is perhaps not too much of an exaggeration to say that psychologists have spent more effort and time on developing tests of intelligence than on understanding what is intelligence. This is evidenced by the fact that there are varieties and varieties of intelligence tests, which have been developed over the past nine decades. In many instances tests have been revised again and again.
Further, there are instances where tests have been developed as an attempt to overcome limitations of earlier tests, and still have not been found to be very satisfactory. But, behind all these, there appear to be some basic and common assumptions regarding the nature of intelligent acts, if not intelligence.
ADVERTISEMENTS:
In general there appears to be an informal consensus that intelligence is involved more in actions that are not learnt and not repetitive, but in those actions calling for solution to a problem, adaptation to environmental conditions and more so those actions where diversity of result is possible.
There is nothing like ‘the intelligent response’. We can talk of only less intelligent and more intelligent acts. While there may or may not be something called absolute quantum of intelligence, at the operational or measurement levels, one can only think of the measured intelligence of an individual as a relative measure, higher than or lesser than or equal to those of others or the average performance of the group to which he belongs.
Tests of intelligence apart from helping in the assessment of intelligence have also helped us to modify and change the very concept and definition of intelligence and test the same. One may also say that the intelligence testing movement played no mean a role in bringing psychology from the isolation of the laboratory to actual action. Intelligence tests have played a very important role in the development of applied psychology.
Beginning of Intelligence Testing:
Alfred Binet is considered to be the father of “intelligence tests” as he developed his first test in Paris at the beginning of the twentieth century. However, the credit should also go to Thorndike for developing his own intelligence test, sometimes even claimed to be earlier. But one should also remember that the real pioneer in this field of mental testing was Galton. But perhaps, we will do well to start the description of the issue of intelligence testing movement, in a conventional manner, with the Binet tests.
ADVERTISEMENTS:
The moment came in the year 1903 when the school authorities in Paris were faced with the problem of certain students repeatedly failing to achieve the expected level of performance at the examination. No medical reason or any environmental factor was found associated with this.
The authorities wondered whether there could be something called mental ability which the children might be lacking. Alfred Binet and Theodore Simon were asked to study this problem. Binet first developed a set of problems, which could be solved or tasks carried out by any normal school going child.
They identified a list of 30 such problems, involving simple attending, following of direction, comprehension, etc. It was found that many of the poor learners in the classroom were unable to carry out these tasks. A simple example of the task was ‘unwrapping a chocolate’. Initially these tests were used only as a screening device to identify those children with poor mental ability who could be sent to special schools.
However, soon the possibility of developing this list of problems into a comprehensive and well developed scale that would be helpful in measuring the intelligence of normal children, and also predict their future performance was realised. It was this that led to the first Binet-Simon Scale of Intelligence.
It again consisted of a number of items, appropriate to different chronological age group. Thus there emerged a scale including item appropriate to very young age to higher ages. It was found that majority of the children of a particular chronological age were able to complete successfully all the items belonging to the earlier age level groups and also half of the items belonging to their own particular chronological age level. Of course, some could go beyond and some could not reach the level.
A large number of children were tested and there emerged the concept of ‘mental age’ corresponding to chronological or physical age. Thus a normal child will have a mental age corresponding to his chronological age. A five year old child would have a mental age of five. This ushered in the concept of mental age. Thus in the case of a normal average child the mental age would be the same as his chronological age.
A child whose intelligence level is lower would have a mental age lower than his or her chronological age, and a very bright child, a mental age higher than his or her chronological age. The first scale of Binet was developed in 1908, and again revised in 1911.
It was at this point that the term Intelligence Quotient or IQ was introduced by William Stem. Stem suggested that the performance of any one or any intelligence test would be very well indicated by the ratio of the mental age to the chronological age, which for reason of convenience may be expressed as a percentage.
Mental age / Chronological age ×100
ADVERTISEMENTS:
It may be seen that in the case of a normal average child, the mental age and the chronological age coincide indicating that the IQ would be 100. The IQ will be less than 100 if the mental age would be less than the chronological age.
Thus a child of 8 years whose mental age is 7, will have an IQ of 87.5 (7/8 of 100). In the reverse instance where the child is bright and the mental age is 8 and the chronological age is 7, the IQ will be around 114.5 (8/7 of 100). However in actual measurement, the average IQ is fixed as a range, varying between 90-110.
Further Development of Intelligence Testing:
The initiative of Binet in developing the first test of intelligence soon led to a rapid growth and development of a variety of intelligence tests. While the original tests of Binet were mainly developed in the context of the inability of a few students failing to reach a prescribed level of achievement in school examinations, very soon intelligence tests found their applications in other spheres of activities also.
However the first major development after Binet’s original work was the adaptation and standardization of these tests by L.M. Terman of the Stanford University in USA. The adapted forms developed by Terman became the central characters around which the intelligence testing revolved for a long time and to some extent even now continues to be so. These tests are popularly known as the Stanford-Binet tests. We may now take a brief look at the work done by Terman.
The Stanford-Binet Tests:
Terman in 1916, translated Binet’s French tests into the English language. He added a number of tests and standardized them. The tests were administered to a large population including adults and the concept of IQ was adopted to indicate a person’s level of intelligence.
In developing the tests for use with adults, Terman and his associates deviated from Binet’s original attempt. Binet, as may be recalled, primarily looked at intelligence as a factor in relation to performance of children, and his original tests were not meant to be used with adults.
Secondly Binet did not assume that intelligence was a fixed amount of ability or quality or trait which influenced all the problem-solving operations engaged in by an individual. The Stanford-Binet tests on the other hand assumed that every individual has a fixed IQ or quantum of intelligence like height or weight and proceeded to measure the same.
The tests were revised again in 1935 and later in 1964, but the basic idea of an individual being endowed with a fixed amount of intelligence never changed. Further this may not be explicit. Somehow currency was given to the idea that an individual’s IQ was inherited. This led to a lot of controversy in later years. Another psychologist Goddard developed his own version of the Binet tests.
On the basis of studies using his test Goddard came to the conclusion that most migrants like Jews, Hungarians and Russians had a much lower intelligence than a native American. It did not take into account the fact that many of the items which were added or even in the process of translation tended to reflect American culture, which placed non-American groups at a disadvantage and a native American at an advantage. The tests were highly centered around the American culture and those who were from other societies and cultures, were not able to perform as well. Such a limitation was soon pointed out as ‘culture-centeredness’.
Another limitation to these tests was the fact that performance in these tests mostly depended on a familiarity and proficiency in the English language. This led to a doubt as to whether one was actually measuring intelligence or language ability. Yet another problem was related to the concept of ‘mental age’.
If the assumption is made that in a normal individual mental age and chronological age are identical, then the question arises as to whether intelligence continues to grow or stops growing at a particular age, Terman and his associates assumed that intelligence stopped growing at the age of 16 and as a result in the case of adults, 16 was used as the denominator instead of the actual chronological age.
Thus if an individual who is thirty years old takes the test, and his mental age was found to be 15, his IQ would be:
MA/ C A × 100 Or 15/16 × 100 = 94
This situation was found to be strange and pointed out the limitation of the concept of mental age and the IQ derived from it. Further, there was not enough evidence to prove at what rate intelligence grows and whether and when it stops growing. The Stanford-Binet test therefore suffered from a number of limitations like cultural bias, undue dependence on language and verbal items and also difficulty in measuring adult intelligence through the concept of mental age.
Yet another difficulty was that the Stanford-Binet tests were individual tests, administered to one individual at a time and therefore not very suitable where a large number of individuals were tested. These limitations of the Stanford-Binet tests led to the development of a variety of tests to be used in different situations and to meet different needs. However, one cannot deny that these tests marked an important landmark in the development of psychology.
The Army Tests:
An important landmark in the development of intelligence test, was the decision of the US Armed Forces to make use of intelligence tests to select people for the American army and entrust them with various responsibilities. When America entered the First World War there was an urgent need to recruit a large number of people and it was decided to use intelligence scores as a criterion along with others to eliminate those with poor intelligence.
Since recruitment to the army involved immigrants from other countries, and also the administration of the tests to a large number of people, a group of psychologists under the leadership of RM Yerkes developed a new set of intelligence tests. Two forms of the test, the Army Alpha and the Army Beta were designed, the Alpha involving items which required a high degree of proficiency in the English language and the Beta mostly of non- language content requiring not a very high level of proficiency in the English language.
The comparability of the two forms was established. The tests were so developed that they could be administered to large groups of people at the same time not requiring to be administered individually as was the case with the Stanford-Binet tests. Further, the concept of mental age and scoring according to mental age were given up and a system known as point scoring was introduced. The introduction of point scoring was a significant step.
The tests were administered to a large number of people. However, these tests were useful only to screen out those with very poor intelligence and not to assess an individual and later tag on to him a score of intelligence or an IQ. While the army tests met some limitations of the Stanford-Binet, namely being dependent on the English language and also length of time consumed, the administration of these tests in larger groups and with very poor testing conditions in crowded rooms was found to generate a lot of strain in the people.
The test scores were generally found to be lower compared to what people obtained under more relaxed and congenial conditions. Another limiting factor was the emphasis on speed of performance rather than level or capacity.
The general level of performance was found to be low. Thus the army intelligence tests might have helped to screen out people. How valid and correct the screening was, is anybody’s guess. The tests adopted a point scoring system which to some extent solved the problem in relating to the age at which intelligence stops growing, etc.
Non-Verbal Tests and Performance Tests:
Items of the Stanford-Binet tests involved a lot of language content and therefore not suitable for use with children and also adults like immigrants whose level of proficiency in the English language was low.
Thus it became necessary to develop tests which could overcome this limitation. Though the Army Beta form tried to achieve this, the difficulty could not be overcome completely. In view of this a large number of tests were developed involving very little of language.
These tests were of ‘several’ types. There were the non-verbal tests, which used numbers, pictures, and other figures. The tests were printed on paper and involved answering on paper. They were paper pencil tests; some of the tests used were as follows. Picture arrangement where a series of pictures representing different stages of a particular activity were given in a scattered manner and the person taking the tests had to arrange them in the correct order.
Other tests were Digit-Symbol tests – a set of symbols were assigned different numbers like 1 2 3 etc. and the candidate had to substitute the digits into symbols or vice-versa. A third type of item was identification of necessary parts in a series of pictures. Another type was matrices, where a series of geometric diagrams were given, and one part was left complete. The candidate had to identify the missing part from a set of four alternative answers given below earlier.
One of the best known tests of this type is the Raven’s Progressive Matrices where such matrices were prepared with increasing level of difficulty requiring a higher level of intelligence as one proceeds from one level to another. There is a simple coloured form of the matrices meant for children, a normal form meant to those within normal range of intelligence and an advanced form requiring a still higher level of intelligence.
While the emergence of nonverbal tests certainly helped to overcome the limitation of undue emphasis on language, still they did require a certain degree of school culture or paper-pencil culture. This still left behind some limitation in measuring the intelligence of those who had no ‘school culture’ or young children who had not gone to schools. To meet the requirement of testing children of very low age levels, specific scales like the Merril-Palmer scale were developed which could test children as young as 2 or 3 years old.
Cattell developed a scale of intelligence tests which could measure the intelligence of even infants just a few months old. Most of the items in these scales tend to measure sensory motor activities like the ability to respond to sounds or objects which were projected. One wonders whether these really measured intelligence, as Binet or Terman had visualized.
An important development was the emergence of performance tests which involved activities that had to be performed by an individual with very little demand on language excepting perhaps to make a person understand the instructions as to what he or she is expected to do.
Two well-known performance scales of intelligence were those developed by Pintner and Peterson and the other one by Coxe at the Cornwell University. Another battery was developed by Alexander. These batteries or scales included items which required the person taking the test to actually carry out some activities.
Some of the activities which are popularly used are as follows:
(a) Picture Assembly:
A figure like that of a horse made of wood is split into different parts and the person taking the test had to put the parts together to make the complete picture. The speed of performance has been taken into account.
(b) Block Design:
Here a number of blocks of different colours are given and the individual has to arrange blocks as per the designs given to him on a separate card. These designs increased in the level of difficulty from very simple to complex and also those involving few blocks and are simple designs, to those which involved many blocks and are complex. Here again time taken is a factor.
(c) Pass Along:
This test involves a number of wooden blocks of two colours red and blue. The task involved is that the person who is administered the test, arranges one red block and a number of blue blocks with one colour, blue on one side and red blocks on the other. The person taking the test should move the red block to the opposite side without lifting the blue ones shuffling movements of the blocks and in the shortest possible time.
In most of these tests, apart from the ability to complete the task, there is also a time limit. These time limits are arrived at on the basis of previous experimentation and the limits are based on the average time taken by a large number of people, usually fixed at 90% of the people who are pre-tested. It is not that the 90% should complete the test successfully but should be able to complete the tests. Of course, often these norms for setting up of time limits vary.
There have also been other interesting kinds of tests developed. The reader certainly may recall that we were highlighting the fact that early tests of intelligence, particularly the Stanford-Binet tests were unduly loaded with verbal or language content and it was this factor that led to the emergence of non-verbal and performance tests.
It was therefore, felt that intelligence being an ability involving thought processes, verbal processes could not be totally ignored. In view of this, tests were developed which while being nonverbal and pictorial in content could however measure verbal intelligence. A notable test of this type is the Ammons Full Range Picture Vocabulary Test, which makes use of pictures to measure the vocabulary or language ability of the person.
In this section an attempt has been made to introduce the reader to some of the tests which were developed by others to overcome the major limitations of the original Stanford-Binet test and also because the use of intelligence tests came to be extended to other contexts like ward and could not limit itself to the context of failing school children.
The Wechsler Scales:
Perhaps the most sustained and major effort at developing a scale of intelligence has been that of David Wechsler and his associates at the Bellevue hospital. In view of this, the test came to be known as Wechsler Bellevue Scale and later simply Wechsler Scale. Perhaps the Wechsler scale has been the one which has been continuously revised at regular intervals.
The real provocation for the emergence of the Wechsler scale has been the fact that the Stanford-Binet tests were not adequate to measure adult intelligence directly. This was because the tests employed the concept of mental age and there was no definite finding regarding the age at which intelligence stops growing.
Secondly, the Stanford-Binet tests depended very heavily on the language factor. Wechsler scales were an alternative to overcome such limitations. The Wechsler scale is again a battery of tests consisting of two major parts, a verbal sub-scale including verbal items like vocabulary, analogy, meaning, etc.
The verbal scale consists of 6 sub-tests. The other major part the performance subscale included 5 sub-tests, thus making a total of 11 sub-tests. The test provides guidelines to calculate a verbal IQ and also a performance IQ. A comparison of these two would help us to see if the language factor has been a handicap. Of course, an overall IQ can also be arrived at based on all the sub-tests.
In India also attempts have been made to adapt intelligence tests. One of the best known Indian test is the Bhatia Scale of Performance Tests which is made use of widely and includes performance items like pass along tests, block designs and a few other tests.
Attempts have also been made to adapt the Stanford-Binet Scale by Mallen. Another interesting test is that of Chatterjee who developed a nonverbal test of verbal intelligence. But by and large it may be seen that the intelligence tests developed in India have been mostly adaptations of western type with attempts here and there to use Hindi in the place of English wherever language use is inevitable. One of the earliest attempts was that of Kamat to adapt the Stanford-Binet test.
While going through the above account of the development of intelligence testing the student might have noticed a few broad trends which are indicated as below:
Many tests today use both verbal and non-verbal or performance items depending on the purpose of testing and the subject whose intelligence is being measured. Thus if tests are used for admission to educational programmes for selection of higher level administrative, academic and managerial positions then there is a greater probability of verbal items being used.
Some of the verbal items included are:
(a) General information – This involves questions like who was the first Indian to go into space? Who was the General who led the Indian Army in 1978 war with Pakistan? Who was the first Indian to be awarded Nobel Prize in Science?
(b) Vocabulary or word measuring – What is the meaning of leisure? What is the meaning of the word ‘obvious’?
(c) Verbal comprehension – This requires the ability to read a passage and understand the same. At the end of this sets of questions are given for which answers have to be provided.
(d) Verbal Analogies – Here a few items are given to find out which two elements have a relationship and has to be understood by the subject. On the basis of this he or she has to fill up a blank in the given sentence.
E g. 15th of August is to 15th of September. As 15th of September is to…………………
Here it is seen that there is a gap of 31 days between 15th of August and 15th of September and therefore the correct answer is 14th of September. But many people may not see this and give the answer 15th September.
(e) Synonyms and antonyms – Here a few words are given and the individual is expected to give a correct word which is similar in meaning or opposite as required.
E g. What is the similarity between light and a burning candle?
The answer could be both give illumination or both involve expenditure of energy.
What is the opposite of generosity? It can be miserliness, stinginess etc.
In most of these cases, the items are of a multiple-choice variety, wherein four or five alternatives are given after each question, one of them being the correct answer. The person taking the test has to identify the correct answer. Such an arrangement helps people to direct their thought processes while trying to identify the correct answer instead of going on a wild goose chase.
However, it is very difficult to provide alternatives, other than the correct answer, which also have a fairly high degree of probability of being correct. This has been found to be a very difficult process. If this is not done, then the items become useless. The alternatives should not be glaringly improbable answers.
They should be very similar to the correct answer and yet the latter should be distinct from the other alternatives and be acceptable as the ultimate correct answer. One criticism against this scheme of ‘multiple-choice answering’ is that people may arrive at the right answer by merely guessing.
This probability is there but can be reduced by a careful choice of the alternatives which would discourage wild guessing. Statistical techniques have also been developed to enable corrections for answers that might have resulted from pure guessing on the part of the persons taking the test.