Context: People make personality attribution using images and stranger’s behavior. Here, the
basis of these personality attributions on strangers are investigated to explore their impacts and
influence to decision making processes. The study also examines how to judge other peoples
personality accurately.
Aim: The aim of this study is to identify best personality assessment tool that demonstrates
construct of validity and reliability.
Materials and Methods: A group of 4 researchers investigated the impact of making personality
attributions in decision-making processes. Three sets of behavior were used to assess Captain
Kirk and Captain Picard personality (arrogance).
Results: In Table 1, Captain’s Kirk group means of the first two traits 12.5 and12, are relatively
higher than that of Captain Picard’s group of mean scores 10.75 and 12.5. However, on the last
trait, the group of mean score for Captain Picard is higher (21.25) than that of Captain Kirk
Conclusions: The study findings concludes that people do not always make accurate judgments
about strangers because they just either guess or assess their behavior based on their social
norms, expectations or environmental situations at that moment.

Examining accuracy for Systematic observation approach of Personality Assessment


Personality refers to a person’s consistent patterns of thinking, feeling, and behavior. It is
driven primarily by person’s motivations, need or desire, which in turn directs behavior (Larsen
& Buss, 2014). The study of personality psychology began many decades ago. The earlier
theories or concepts argued that people’s personalities can be expressed through their physical
appearance. This included the use of phrenology approach by the German physician Franz
Joseph Gall (1758-1828) which was based on the concepts that the patterns of the bumps on the
peoples’ skulls could measure an individual personality effectively. This was followed by the
somatology approach by William Herbert Sheldon where personality assessment was done using
people’s body types. It was argued that the rounder and fat physiques (endomorphs) were bold
and assertive whereas the thinner physiques (ectomorphs) are intellectual and introverted. Other
concepts include physiognomy- which assumes that personality can be assessed using an
individual’s facial characteristics. However, these methods have not been validated by scientific
research and are therefore discredited in contemporary personality psychology (Anon, 2012).
On a daily basis, we knowingly or unknowingly tend to judge other people. The
judgment we make on other people’s personality reflects a significant part of their social world.
Therefore, the accurate personality assessment is crucial because it influences their opportunities,
expectancies and their reputation in general (Dumont, 2010). According to Funder, “In the end,
we become what other people perceive or misperceive us to be” (Funder, 2013, p. 176). This
indicates that people’s judgment on the individuals can negatively or positively affect other
people’s opportunities. This illustrates the need to make accurate personality assessment. There
are several more approaches for personality assessments that have been developed. These

include the use of self-report questionnaires, structured interviews, projective techniques,
objective tests and systematic observation of people. According to Dumont, the easiest strategy
to assess people’s personality is a systematic observation in naturalistic situations, and to keep a
record of the individual behaviors of the person on interest (Dumont, 2010, p.345). This paper
explores the effectiveness of using behavioral observations in assessing personality to determine
if this approach demonstrates the constructs of validity and reliability.
A group of four researchers watched scenes from Star Trek Movies to determine the
validity of personality attributions for Captain James, T. Kirk, and Captain Jean – Luc Picard
(Herringer, 2000). The aim of this investigation was to understand the role of behavioral
observations in personality psychology research and how to determine the validity of this
personality assessment method. The study hypothesis is “perceived personality traits will be
reflected in observed behavior. Behavioral data findings will confirm that Captain Kirk is more
arrogant and Captain Picard is less proud.”
The video lasted for 51.58 minutes. While watching the two scenes, each researcher
independently rated the behavioral indicators of the three sets of traits developed for the purpose
of this study. Three set of behavioral indicator traits were used to assess Captain Kirk and
Captain Picard personality (arrogance). These sets of behavior included; a) number of times a
captain interrupts another person’s conversations, b) number of occasions a captain belittles
someone else experiences or opinion and c) number of times a captain talk about or praise
themselves. Data analysis was conducted by calculating mean, standard deviations, and inter-
observer reliability of the data collected by independently. The independent ratings as observed
are as illustrated by Table 1 & 2.

Table 1 and 2 presents the independent ratings conducted by the 4 group members. The
table includes the group means and standard deviations on each subset of a trait as well as the
inter-observer reliability scores. The study hypothesis was that Captain Kirk is more arrogant
than Captain Picard. The hypothesis is testing by looking at the group means for behaviors of
these two captains. In Table 1, Captain’s Kirk group means of the first two traits 12.5 and12, are
relatively higher than that of Captain Picard’s group of mean scores 10.75 and 12.5. However, on
the last trait, the group of mean score for Captain Picard is higher (21.25) than that of Captain
Kirk (15.75). Also, the inter-observer reliability score for all the traits for both captains was
unacceptable. Therefore, the null hypothesis that Captain Kirk is more arrogant than Captain
Picard is rejected. The purpose of standard deviations is to indicate how varied or uniform the
data collected is. Therefore, a score close to 0 indicates that the data values are close to the
expected value. However, the Standard deviation in both cases is high in all of the three traits,
which indicates that the data points are spread over a wide range of values, that is, the data is
skewed and may not be reliable.
The importance of inter-observer reliability during data analysis cannot be overlooked.
Inter-observer reliability is the percentage agreement among the raters. To calculate this, the
number of ratings in agreement and the total number of ratings is determined. The total number
of ratings in an agreement is divided by a total number of ratings. The fraction obtained is then
converted to a percentage. A good inter-observer reliability should have scores close to 100%.
However, there is an inter-observer reliability benchmark used to analyze the research studies
score as shown in Table 1 below (Wongpakaran et al., 2013);

Table 1: Inter-observer reliability score benchmarks

This indicates that for ratings between 4-7 categories, the minimal agreement is 75% whereas
90% indicates high agreement. Therefore, in Captain Kirk’s inter-observer reliability score of all
the three traits are lower the minimal benchmark agreement indicating that the inter-observer
reliability is unacceptable or unreliable (see Table 1). Similar observations are made in Captain
Picard inter-observer reliability score (see Table 2).
Table 2 Captains Kirk Rating

Captain Kirk
Frequency of behaviors
Behaviour 1 Behaviour 2 Behaviour 3
Group member 1 13 14 24
Group member 2 14 8 9
Group member 3 10 11 13
Group member 4 12 15 17
Group means (SD) 12.25 (1.70) 12 (3.2) 15.75 (6.4)
Inter-observer Reliability 0.2 0 0


Table 3 Captain Picard rating

Captain Picard
Frequency of behaviors
Behaviour 1 Behaviour 2 Behaviour 3
Group member 1 12 13 26
Group member 2 7 5 17
Group member 3 15 17 22
Group member 4 9 15 22
Group means (SD) 10.75 (3.5) 12.5 (5.3) 21.25 (3.8)
Inter-observer Reliability 0 0 0.5


The means of the three behaviors indicates that there is no significance difference in
Captain Kirk arrogance in comparison to that of Captain Picard (see Table 2& 3). Also, the inter-
observer reliability scores of the two captains indicate the acceptable score for the first, second
and third traits have low inter-observer reliability scores. Therefore, when an analysis is done on
the group of scores and inter-observer reliability scores, the study findings fails to supports the
hypothesis that “perceived personality traits will be reflected in observed behavior. Behavior data
will confirm that Captain Kirk is more arrogant and Captain Picard is less arrogant.”
There are various factors that influence accurate personality assessment through
observations. For instance, the social expectancy effect where an incorrect belief or assumptions
held by the rater or observer makes them act (in this case score) in a manner that elicits biased
analysis (Jamieson et al., 2016). In this case, some of the observer’s results could have been
influenced partly by their social expectations, which could have made them rate either of the two
captains in a biased manner. Another factor is the observer drift, a cognitive phenomenon that

involves a gradual shift from the original response by the observer, which makes the observer
make inconsistency recording. This raises the issue of observer accuracy vs. observer agreement
(Hall, Goh, Mast, & Hagedorn, 2015). Most people judge other people’s personality based on
their constructions of reality. Therefore, in personality psychology, there are no accurate or
inaccurate interpretations of personality assessment because all interpretations are just “social
constructions’ (Funder, 2013, p. 177). However, the concepts critical realism holds more water
in personality psychology. This entails critically evaluating the personality attributes presented
and then gathering all the information that can help the assessor final judgment.
According to Funder, other variables that are likely to affect the accuracy of personality
assessment include; a) the good judge- possibility that some people judgments are more accurate,
b) the good target – possibility that some people can be easily judged than others, c) good trait –
possibility that some behaviors can be easily judged accurately and d) good information- ability
that one is well informed when making personality assessment. Therefore, personality
assessment accuracy is determined by the quality and quantity of the information and its
relevance to the traits being studied (Funder, 2013).
This leads to the concept of Realistic Accuracy Model (RAM). According to this model,
for a personality attribute to be judged accurately, the following four factors should be present.
To start with, the person being assessed must do something relatable with the attribute. Secondly,
the behavioral information should prevail, so that it can be detected by the judge, then the judge
should utilize the information appropriately. However, each of the four factors posses’s hurdles
that must be overcome so as to make accurate judgments (Borkenau, Mosch, Tandler, &Wolf
2014; Funder, 2013). This indicates that making an accurate judgment is tough. However, it can
be improved through four different ways. Previously, the improvements have focused on making

the judges or observers to think more and to use the good logic to prevent inferential errors.
Although necessary, the efforts focus only on utilization phase only. Therefore, to improve
personality assessment accuracy, then one should develop an interpersonal environment where
the judged persons can be themselves. Importantly, one must minimize tensions and other
distractions that could make one miss relevant information (Funder, 2013).
The study limitation is in this research includes a poor operational definition of the term
“arrogance.” Appropriate conceptual analysis and operational definitions are important as it
influences the study’s validity. In this case, it can be argued the low inter-observer reliability
scores was caused by disagreements in operational definitions. Also, the researchers observed the
video separately. Therefore, their ability to rate the captain’s behaviors could have been
influenced by different experiences or circumstances. These findings indicate that observing
individual actions alone is not adequate to judge someone’s personality.
In summation, it is evident that most people make personality attribution using observed
information and as guided by their criticism realism of their social constructs. This paper has
indicated ways personality attributions on can impacts and influence peoples decision making
processes. Although first impressions sometimes can be accurate, it is evident that the accuracy
of personality assessment cannot be based on individual observations alone, and that to increase
inspection method personality predictive validity, one should consider assessing information or
judgments from other sources. Also, the RAM outlines the four model of accuracy and suggests
some approaches in which one can judge others accurately.

Anon. (2012). Introduction to psychology. M- Libraries.

