Using the Internet for Psychological Research: Personality Testing on the World Wide Web

By: Buchanan, Tom, Smith John L., British Journal of Psychology

February 1999 Vol. 90, Issue 1

Presented by:

Alexia DaSilva, Ann Dorlet, Stacey Smith and Jennifer Traver






























            This article looked at different aspects of using the Internet for psychological research and testing.  Throughout the years the Internet has been used more and more for psychological research.  There are many benefits to using the Internet.  The Internet provides access to a large sample size of participants without the cost.  The need for a laboratory and materials is cut down along with the time that it takes to do the research.  Because of the large number of people with access to the studies, special characteristic groups can be located using sources such as Usenet newsgroups.  There are still the ethical guidelines for doing for doing research on the Internet.  The majority of the tests online are questionnaire based.  Personality tests and surveys seem to be the most commonly found of these questionnaires based tests.  There are many different ways in which these tests can be implemented including e-mail, newsgroups and the World Wide Web (WWW).  Batinic (1997) found that the least effective method was through e-mail where as Anderson and Gansneder (1995) have had better experiences but still believed that the WWW is the best technique for administering these tests. Typically, a WWW study is a page containing a fill-out form and once this form is filled out the participant selects the option to submit their answers for scoring or have them e-mailed to the experimenter.  A problem with studies such as these is that their validity has yet to be adequately assessed.  

            In exploring the Internet to see how much research has been done on researching the Internet and its validity for research, a search using the Deja News archive yielded 14,016 references while a PsychLIT database of a psychology journal yielded eight references, with only four directly being related to Internet –mediated research.  This shows that the validity of Internet research needs further testing.

            Past research on this topic includes Smith and Leigh (1997) in which they compared an online and pencil/paper test results on sexual fantasies.  They compared the demographic characteristics of the two groups and found that while there was a difference in age and gender they did not significantly differ in terms of sexual orientation, marital status, ethnicity, education or religiosity.  From these results, they concluded that Internet samples are as representative of he general population as traditional student samples.  In a comparison of the two groups on a subset of the questions they found no significant differences in the individual items.  These findings were consistent were Ellis and Symons (1990) and thus they determines that the Internet can serve as both a primary participant pool as well as a supplement to locally recruited participants, suggesting that the two types of participants can be combined to form one sample.  The present researchers, however, find this conclusion to be premature.  It is suggested that participants be recruited from different websites to determine if there is a difference in responses.  So, while there is potential for research, there are also threats to validity which have yet to be assessed.

            Computerized testing was developed before testing on the WWW.  Many computerized tests and assessments are designed for use in personality evaluation and are simply translations of pencil and paper tests to be viewed on screens and responded to by typing.  Computerized tests are popular because it automates the task of test administration, scoring and even interpretations (in some cases).  However, some people have questions about the interpretation of results via computer.

            The equivalence between the two tests has come into question.  Bartram and Bayliss (1987) found that the two versions of the test are equivalent in that the tests are not speeded up and they do require some form of multiple-choice or forced answered format though, there are instances in which the tests were not found to be equivalent.  Some studies have found a higher level of self disclosure with computerized testing possibly being due to an increased anonymity of participant.  Some studies have found patterns of responses in each version.  This is attributed to computer anxiety.  Meier (1994) and Cohen et al. believe that the equivalence of computerized and pencil/paper tests need to be demonstrated, not just assumed. 

            There are many threats to the reliability and validity of Web-based tests.  In regards to the nature of the sample, it is impossible to truly know the participants are and what their true demographics are.  With a large, unknown sample, many confounding variables may have been introduced and they are impossible to control for.  Participants of Internet-based research may have a different motivation for taking a test.  They may be taking it out of pure interest or curiosity, while a person taking a pencil/paper test may be required to do so.  Internet participants are therefore likely to be true volunteers.  The difference in motivation for taking a test may have an effect on the results in the study.  The environmental factors surrounding the test taker also have an effect.  Computerized tests have no control of over the environment that the test is taken therefore the results may have been affected by noise or distraction, while pencil/paper test takers are in a controlled environment with everyone experiencing the same distractions.  Technological factors also have a role in varying results.  Different browsers software packages are likely to be used, each differently configured, and running on different platforms with different displays, therefore test takers have different presentations of the tests.  Also, the speed in which pages are loaded and displayed will differ and those with slower connections may get frustrated with the time delays and quit the test.  Another problem with computerized tests is that participants may take the test more than once scoring differently each time.  Also, there is the ability to go back and change answers after a score is given to change ones score to the desired one.  This could skew the results to the test in general.

            Establishing reliability and validity must take place when developing a new test for the Internet.  Computerized tests have shown favorable reliability and validity as translations of pencil and paper tests and initially Internet tests will most likely also be translations of pencil/paper tests.  Tests results must correlate to .9 or above on two versions of the test for it to be considered reliable (Kline, 1993).  Validation of the test is more difficult.  There are two major ways an Internet test differs from the paper and pencil version: the format of the presentation and the nature of the participants and the circumstances under which the test is likely to be taken.  Validity for the test must include that same presentation, same participant composition, and the same circumstances in test taking.  Because of the anonymity of the participants it is virtually impossible to ensure this.  Test-retest reliability is not possible if the participant is unknown and cannot be physically located.  Instead, a confirmatory factor analysis is used to test equivalence.

            The present research included a selection of an appropriate scale give n the choices of psychometric properties, length and format, previous publication and the construct being measured.  Four requirements of the instrument being used needed to be met.  The first requirement was that it should have well-established and satisfactory psychometric properties.  Reliability indices should be known and should be high (for comparison of other formats).  The second requirement is the test should be suitable for the intended mode of administration.  The test need not be speeded and the length of the test should be brief to reduce likelihood of fatigue and boredom.  The third requirement is that the test should be published and available to the public and for whom the authors granted permission for research.  The fourth requirement involves the subject matter and feedback of the test.  It is required that meaningful feedback is given to the test taker in a sensitive and responsible manner.  The easiest way to accomplish this is to provide a score and sufficient information for the test taker to interpret it.  Because of this, the test itself should measure something innocuous.    

            The Self Monitoring Scale (SMS) was used within the study.  It is referred to as a “popular measure of personality” (Buchanan & Smith, 1999).  The scale is a measure of the tendency to observe and regulate expressive behaviors and self-presentation.  The reliability of the SMS-R appears satisfactory, with a reported coefficient alpha of .70. The construct validity of the measure is generally considered to be well established.  A revised version of the SMS,  the  SMS-R, which is used within this study, was developed for improved factorial purity (Buchanan & Smith, 1999).  Self-monitoring is best described in terms of three rotated factors: ‘other-directedness’, ‘extraversion’, and ‘acting ability’.

The researchers main purpose was to see whether or not a Web-mediated version of the test behaves in the same way as the pencil-and-paper version, as well as whether the same factorial structure can be found in both.  The Web-based questionnaire was developed accordingly to the guidelines suggested by Szabo & Frenkl (1996). Data would be acquired through an HTML form and passed to a program that processed the input, saved the data on file, and then would provide feedback to the participant.  Anonymity is assured, and the participants are asked to complete the test only once.  Participants are presented with an interactive form bearing the questions from the SMS-R, and a set of instructions adapted from those used with the paper-and-pencil version.  Participants are to answer either “true” or “false” to the questions, and demographic details are also requested. After the participant has completed the test, a “Results and Debriefing” page is then displayed.  The score achieved is printed out, along with guidelines on how to interpret it. 

            After the test had been developed and extensively tested, the procedure consisted of recruiting participants by means of messages posted to Usenet newsgroups selected for their relevance to the study.  The messages were posted four times, at two-weekly intervals from December 23, 1996, to February 3, 1997. Once the researchers felt that they had received sufficient responses, no further messages were posted.  The researchers then posted a message outlining the goals of the study and thanking the participants.  After data screening and processing, the total number of ‘unique and valid’ responses was 963. A comparison group was also recruited.  These participants were undergraduate students at the University of Sunderland, and were asked to complete the paper-and-pencil version of the SMS-R. The instructions online, and in-person, were similar including proper debriefing and score interpretation.

            In the results section, the researchers reported that a coefficient alpha for the Internet sample was found to be 0.75. This compares ‘favorably’ with the value of 0.70 consistently reported for the paper-and-pencil version of the test. The coefficient alpha for the comparison group was also determined, and was found to be 0.73. Within the discussion section, the researchers reported that the psychometric properties of the Internet-based version of the SMS-R seem to compare favorably with its conventional equivalent. Since the model consisted of three intercorrelated latent variables,or factors,  a confirmatory factor analysis was used to show whether the model fit the data.  The similarity of the item loadings upon the factors suggested that the two formats of the test were measuring substantially the same thing.  The higher correlation suggests that the Internet-version may actually provide a better measure. So why might a Web-based test appear to provide a better measure of a personality trait that its conventional equivalent? (Buchanan & Smith, 1999).  The researchers state many reasons why this might be.  One, increased levels of honesty and self-revelation may be found when computerized assessments are employed.  This might facilitate more accurate measurement of a construct.  Another reason may have to do with the heterogeneity of the sample.  The factor solutions derived form particular groups may not be entirely representative of the wider population. It is possible that the Internet sample, which includes many non-students and people from more diverse backgrounds, is even more heterogeneous.

Buchanan & Smith (1999) also reported on problems within the study, as well as potential problems with Internet-based testing in general.  Further research was suggested on some main issues within the study. The question of whether Web-based tests can be considered both valid and reliable is an important issue.  While the study can tell us something about the reliability of the internet-mediated test, it cannot tell us anything about its construct validity.  The sampling strategy that is used, as well as the population from which an Internet sample is recruited, can affect the generalizability of the results.  The researchers recruited participants mainly from psychologically oriented newsgroups.  This may have affected the external validity of the study.  Also, dishonesty among the experimenters, as well as the participants must always be taken into consideration.  It is also important for people to realize that they should not receive the results without proper interpretation of the results and debriefing.  This is a major problem that is faced with internet-based tests. 





























Buchanan, Tom, & Smith, John L. (1999). Using the Internet For Psychological

Research: Personality Testing on the World Wide Web. British Journal of

Psychology, 90.                                                               


I.                    Why Psychological Testing Is on the Internet

A.     The many benefits to the internet

1.      Easy access to a great deal of information.

2.      Low cost

a.       No lab space or materials.

b.      No use of a professional’s time.

3.      It is automatic in both scoring and interpretation.

B.     The downside to internet use

1.      Batinic (1999) says it is the worst form of testing.

2.      Validity has not been properly assessed.

II.                 Internet Validity

A.     Exploring the web

1.      Deja News archive contained 14016 references for articles.

2.      PsycLIT database contained eight articles with only four relevant to this research. (same keywords used for both searches)

B.     Past research

1.      Smith & Leigh (1997) compared online and paper/pencil test results for sexual fantasies.

a.       Seventy-two online participants recruited from a psychology site and fifty-six psychology students.

b.      Differences in age and gender.

c.       Little difference in marital status, ethnicity, education, as well as religiosity and sexual orientation.

2.      Researchers thought internet sample more representative of greater population than psychology students.

3.      Internet can be a substitute for in-person research and can also provide the main subject pool.

4.      Present researchers feel it is too early to come to these conclusions.

5.      They should recruit from different websites other than psychology.

C.     Computerized testing

1.      Mostly designed for translation of paper personality tests into computerized tests.

2.      Questionable equivalence between the two versions of tests.

a.       Need to avoid speeded testing.

b.      Need multiple choice and forced choice items.

3.      Studies found increased levels of self-disclosure due to increased anonymity.

4.      Each test needs to be tested for its equivalence to the paper version.



D.     Web reliability and validity issues

1.      We can never truly know who our participants are and their demographics.

2.      A larger sample, though, can mean increased heterogeneity and increased representativeness.

3.      Unfortunately, there may be too many confounding (unknown) variables.

4.      Internet tests use participants that may be truly interested rather than forced student for course credit. (“true participants”).

5.      There is no control over the participants’ environments during test-taking.

6.      Issue of one person taking the test many times.

7.      Different computers have different presentations and delays may cause frustration.

E.      Establishing reliability and validity

A.     Computerized test have shown favorable reliability and validity to paper versions.

a.       Test how a person scores on each version.

b.      Must have a correlation of .9 or above to be reliable.

B.     Validity must include same presentation, same participant composition, and the same circumstances in test-taking.

C.     Test-retest reliability not available if we do not know our participants.

D.     Use confirmatory factor analysis to test equivalence.

III.               Present researchers’ method

A.     Participants

1.      Sample 1 used 963 internet participants ages 11 to 67 with 491 male and 472 female.

2.      Sample 2 used 224 undergraduate volunteers ages 18 to 53 with 35 male and 176 female.

B.     Procedure

1.      Internet participants recruited through Usenet newgroups.

2.      Snyder’s Self-Monitoring Scale was translated to internet format and tested for equivalence using confirmatory factor analysis.

3.      University of Sunderland students recruited through their classes and given paper/pencil version.

4.      Instructions online and in-person were similar including proper debriefing and score interpretation.

IV.              Results

A.     Coefficient alpha for sample 1 was .75 and sample 2 was .73.

B.     Confirmatory factor analysis with chi square statistic showed a poor fit, but may be misleading.

C.     The internet sample had a higher GFI, AGFI, and RMS than paper sample.

D.     No significant difference in means between groups according to an independent t-test.


V.                 Conclusions and discussion

A.     Reliability for the internet version was even higher than that of previously reported studies.

B.     Each version measured basically the same thing.

C.     Likely that internet research is possible and can be fruitful.

D.     Need to validate instruments on the internet harshly and continue tests of reliability and validity.

E.      Problem with random tests made by unprofessionals.

F.      Problem of demographic information.

G.     Only available to those with internet access.

H.     Participants were recruited for psychology sites which may mean that sample 1 could compose the same type of people as the psychology students.