Branching Paths: A Novel Teacher Evaluation Model for Faculty Development
James P. Bavis and Ahn G. Nu
Department of English, Purdue University
ENGL 101: Course Name
Dr. Richard Teeth
Jan. 30, 2020
A large body of assessment literature suggests that studentsâ€™ evaluations of their teachers
(SETs) can fail to measure the construct of teaching in a variety of contexts. This can
compromise faculty development efforts that rely on information from SETs. The disconnect
between SET results and faculty development efforts is exacerbated in educational contexts
that demand particular teaching skills that SETs do not value in proportion to their local
importance (or do not measure at all). This paper responds to these challenges by proposing an
instrument for the assessment of teaching that allows institutional stakeholders to define the
teaching construct in a way they determine to suit the local context. The main innovation of this
instrument relative to traditional SETs is that it employs a branching â€œtreeâ€ structure populated
by binary-choice items based on the Empirically derived, Binary-choice, Boundary-definition
(EBB) scale developed by Turner and Upshur for ESL writing assessment. The paper argues
that this structure can allow stakeholders to define the teaching construct by changing the order
and sensitivity of the nodes in the tree of possible outcomes, each of which corresponds to a
specific teaching skill. The paper concludes by outlining a pilot study that will examine the
differences between the proposed EBB instrument and a traditional SET employing series of
multiple-choice questions (MCQs) that correspond to Likert scale values.
Keywords: college teaching, student evaluations of teaching, scale development, EBB
scale, pedagogies, educational assessment, faculty development
Branching Paths: A Novel Teacher Evaluation Model for Faculty Development
According to Theall (2017), â€œFaculty evaluation and development cannot be considered
separately … evaluation without development is punitive, and development without evaluation is
guesswork” (p. 91). As the practices that constitute modern programmatic faculty development
have evolved from their humble beginnings to become a commonplace feature of university life
(Lewis, 1996), a variety of tactics to evaluate the proficiency of teaching faculty for development
purposes have likewise become commonplace. These include measures as diverse as peer
observations, the development of teaching portfolios, and student evaluations.
One such measure, the student evaluation of teacher (SET), has been virtually
ubiquitous since at least the 1990s (Wilson, 1998). Though records of SET-like instruments can
be traced to work at Purdue University in the 1920s (Remmers & Brandenburg, 1927), most
modern histories of faculty development suggest that their rise to widespread popularity went
hand-in-hand with the birth of modern faculty development programs in the 1970s, when
universities began to adopt them in response to student protest movements criticizing
mainstream university curricula and approaches to instruction (Gaff & Simpson, 1994; Lewis,
1996; McKeachie, 1996). By the mid-2000s, researchers had begun to characterize SETs in
terms like â€œâ€¦the predominant measure of university teacher performance [â€¦] worldwideâ€
(Pounder, 2007, p. 178). Today, SETs play an important role in teacher assessment and faculty
development at most universities (Davis, 2009). Recent SET research practically takes the
presence of some form of this assessment on most campuses as a given. Spooren et al.
(2017), for instance, merely note that that SETs can be found at â€œalmost every institution of
higher education throughout the worldâ€ (p. 130). Similarly, Darwin (2012) refers to teacher
evaluation as an established orthodoxy, labeling it a â€œvenerated,â€ â€œaxiomaticâ€ institutional
practice (p. 733).
Moreover, SETs do not only help universities direct their faculty development efforts.
They have also come to occupy a place of considerable institutional importance for their role in
personnel considerations, informing important decisions like hiring, firing, tenure, and
promotion. Seldin (1993, as cited in Pounder, 2007) finds that 86% of higher educational
institutions use SETs as important factors in personnel decisions. A 1991 survey of department
chairs found 97% used student evaluations to assess teaching performance (US Department of
Education). Since the mid-late 1990s, a general trend towards comprehensive methods of
teacher evaluation that include multiple forms of assessment has been observed
(Berk, 2005). However, recent research suggests the usage of SETs in personnel decisions is
still overwhelmingly common, though hard percentages are hard to come by, perhaps owing to
the multifaceted nature of these decisions (Boring et al., 2017; Galbraith et al., 2012). In certain
contexts, student evaluations can also have ramifications beyond the level of individual
instructors. Particularly as public schools have experienced pressure in recent decades to adopt
neoliberal, market-based approaches to self-assessment and adopt a student-as-consumer
mindset (Darwin, 2012; Marginson, 2009), information from evaluations can even feature in
department- or school-wide funding decisions (see, for instance, the Obama Administrationâ€™s
Race to the Top initiative, which awarded grants to K-12 institutions that adopted value-added
models for teacher evaluation).
However, while SETs play a crucial role in faulty development and personnel decisions
for many education institutions, current approaches to SET administration are not as well-suited
to these purposes as they could be. This paper argues that a formative, empirical approach to
teacher evaluation developed in response to the demands of the local context is better-suited
for helping institutions improve their teachers. It proposes the Heavilon Evaluation of Teacher,
or HET, a new teacher assessment instrument that can strengthen current approaches to
faculty development by making them more responsive to teachersâ€™ local contexts. It also
proposes a pilot study that will clarify the differences between this new instrument and the
Introductory Composition at Purdue (ICaP) SET, a more traditional instrument used for similar
purposes. The results of this study will direct future efforts to refine the proposed instrument.
Methods section, which follows, will propose a pilot study that compares the results of the
proposed instrument to the results of a traditional SET (and will also provide necessary
background information on both of these evaluations). The paper will conclude with a discussion
of how the results of the pilot study will inform future iterations of the proposed instrument and,
more broadly, how universities should argue for local development of assessments.
Effective Teaching: A Contextual Construct
The validity of the instrument this paper proposes is contingent on the idea that it is
possible to systematically measure a teacherâ€™s ability to teach. Indeed, the same could be said
for virtually all teacher evaluations. Yet despite the exceeding commonness of SETs and the
faculty development programs that depend on their input, there is little scholarly consensus on
precisely what constitutes â€œgoodâ€ or â€œeffectiveâ€ teaching. It would be impossible to review the
entire history of the debate surrounding teaching effectiveness, owing to its sheer scopeâ€”such
a summary might need to begin with, for instance, Cicero and Quintilian. However, a cursory
overview of important recent developments (particularly those revealed in meta-analyses of
empirical studies of teaching) can help situate the instrument this paper proposes in relevant
Meta-analysis 1. One core assumption that undergirds many of these conversations is
the notion that good teaching has effects that can be observed in terms of student achievement.
A meta-analysis of 167 empirical studies that investigated the effects of various teaching factors
on student achievement (Kyriakides et al., 2013) supported the effectiveness of a set of
teaching factors that the authors group together under the label of the â€œdynamic modelâ€ of
teaching. Seven of the eight factors (Orientation, Structuring, Modeling, Questioning,
Assessment, Time Management, and Classroom as Learning Environment) corresponded to
moderate average effect sizes (of between 0.34â€“0.41 standard deviations) in measures of
student achievement. The eighth factor, Application (defined as seatwork and small-group tasks
oriented toward practice of course concepts), corresponded to only a small yet still significant
effect size of 0.18. The lack of any single decisive factor in the meta-analysis supports the idea
that effective teaching is likely a multivariate construct. However, the authors also note the
context-dependent nature of effective teaching. Application, the least-important teaching factor
overall, proved more important in studies examining young students (p. 148). Modeling, by
contrast, was especially important for older students.
Meta-analysis 2. A different meta-analysis that argues for the importance of factors like
clarity and setting challenging goals (Hattie, 2009) nevertheless also finds that the effect sizes
of various teaching factors can be highly context-dependent. For example, effect sizes for
homework range from 0.15 (a small effect) to 0.64 (a moderately large effect) based on the level
of education examined. Similar ranges are observed for differences in academic subject (e.g.,
math vs. English) and student ability level. As Snook et al. (2009) note in their critical response
to Hattie, while it is possible to produce a figure for the average effect size of a particular
teaching factor, such averages obscure the importance of context.
Meta-analysis 3. A final meta-analysis (Seidel & Shavelson, 2007) found generally
small average effect sizes for most teaching factorsâ€”organization and academic domainspecific learning activities showed the biggest cognitive effects (0.33 and 0.25, respectively).
Here, again, however, effectiveness varied considerably due to contextual factors like domain of
study and level of education in ways that average effect sizes do not indicate.
These pieces of evidence suggest that there are multiple teaching factors that produce
measurable gains in student achievement and that the relative importance of individual factors
can be highly dependent on contextual factors like student identity. This is in line with a welldocumented phenomenon in educational research that complicates attempts to measure
teaching effectiveness purely in terms of student achievement. This is that â€œthe largest source of
variation in student learning is attributable to differences in what students bring to school – their
abilities and attitudes, and family and communityâ€ (McKenzie et al., 2005, p. 2). Student
achievement varies greatly due to non-teacher factors like socio-economic status and home life
(Snook et al., 2009). This means that, even to the extent that it is possible to observe the
effectiveness of certain teaching behaviors in terms of student achievement, it is difficult to set
generalizable benchmarks or standards for student achievement. Thus is it also difficult to make
true apples-to-apples comparisons about teaching effectiveness between different educational
contexts: due to vast differences between different kinds of students, a notion of what
constitutes highly effective teaching in one context may not in another. This difficulty has
featured in criticism of certain meta-analyses that have purported to make generalizable claims
about what teaching factors produce the biggest effects (Hattie, 2009). A variety of other
commentators have also made similar claims about the importance of contextual factors in
teaching effectiveness for decades (see, e.g., Bloom et al., 1956; Cashin, 1990; Theall, 2017).
The studies described above mainly measure teaching effectiveness in terms of
academic achievement. It should certainly be noted that these quantifiable measures are not
generally regarded as the only outcomes of effective teaching worth pursuing. Qualitative
outcomes like increased affinity for learning and greater sense of self-efficacy are also important
learning goals. Here, also, local context plays a large role.
SETs: Imperfect Measures of Teaching
As noted in this paperâ€™s introduction, SETs are commonly used to assess teaching
performance and inform faculty development efforts. Typically, these take the form of an end-ofterm summative evaluation comprised of multiple-choice questions (MCQs) that allow students
to rate statements about their teachers on Likert scales. These are often accompanied with
short-answer responses which may or may not be optional.
SETs serve important institutional purposes. While commentators have noted that there
are crucial aspects of instruction that students are not equipped to judge (Benton & Young,
2018), SETs nevertheless give students a rare institutional voice. They represent an opportunity
to offer anonymous feedback on their teaching experience and potentially address what they
deem to be their teacherâ€™s successes or failures. Students are also uniquely positioned to offer
meaningful feedback on an instructorsâ€™ teaching because they typically have much more
extensive firsthand experience of it than any other educational stakeholder. Even peer
observers only witness a small fraction of the instructional sessions during a given semester.
Students with perfect attendance, by contrast, witness all of them. Thus, in a certain sense, a
student can theoretically assess a teacherâ€™s ability more authoritatively than even peer mentors
While historical attempts to validate SETs have produced mixed results, some studies
have demonstrated their promise. Howard (1985), for instance, finds that SET are significantly
more predictive of teaching effectiveness than self-report, peer, and trained-observer
assessments. A review of several decades of literature on teaching evaluations (Watchel, 1998)
found that a majority of researchers believe SETs to be generally valid and reliable, despite
occasional misgivings. This review notes that even scholars who support SETs frequently argue
that they alone cannot direct efforts to improve teaching and that multiple avenues of feedback
are necessary (Lâ€™hommedieu et al., 1990; Seldin, 1993).
Finally, SETs also serve purposes secondary to the ostensible goal of improving
instruction that nonetheless matter. They can be used to bolster faculty CVs and assign
departmental awards, for instance. SETs can also provide valuable information unrelated to
teaching. It would be hard to argue that it not is useful for a teacher to learn, for example, that a
student finds the class unbearably boring, or that a student finds the teacherâ€™s personality so
unpleasant as to hinder her learning. In short, there is real value in understanding studentsâ€™
affective experience of a particular class, even in cases when that value does not necessarily
lend itself to firm conclusions about the teacherâ€™s professional abilities.
However, a wealth of scholarly research has demonstrated that SETs are prone to fail in
certain contexts. A common criticism is that SETs can frequently be confounded by factors
external to the teaching construct. The best introduction to the research that serves as the basis
for this claim is probably Neath (1996), who performs something of a meta-analysis by
presenting these external confounds in the form of twenty sarcastic suggestions to teaching
faculty. Among these are the instructions to â€œgrade leniently,â€ â€œadminister ratings before testsâ€
(p. 1365), and â€œnot teach required coursesâ€ (#11) (p. 1367). Most of Neathâ€™s advice reflects an
overriding observation that teaching evaluations tend to document studentsâ€™ affective feelings
toward a class, rather than their teachersâ€™ abilities, even when the evaluations explicitly ask
students to judge the latter.
Beyond Neath, much of the available research paints a similar picture. For example, a
study of over 30,000 economics students concluded that â€œthe poorer the student considered his
teacher to be [on an SET], the more economics he understoodâ€ (Attiyeh & Lumsden, 1972). A
1998 meta-analysis argued that â€œthere is no evidence that the use of teacher ratings improves
learning in the long runâ€ (Armstrong, 1998, p. 1223). A 2010 National Bureau of Economic
Research study found that high SET scores for a courseâ€™s instructor correlated with â€œhigh
contemporaneous course achievement,â€ but â€œlow follow-on achievementâ€ (in other words, the
students would tend to do well in the course, but poor in future courses in the same field of
study. Others observing this effect have suggested SETs reward a pandering, â€œsoft-ballâ€
teaching style in the initial course (Carrell & West, 2010). More recent research suggests that
course topic can have a significant effect on SET scores as well: teachers of â€œquantitative
coursesâ€ (i.e., math-focused classes) tend to receive lower evaluations from students than their
humanities peers (Uttl & Smibert, 2017).
Several modern SET studies have also demonstrated bias on the basis of gender
(Anderson & Miller, 1997; Basow, 1995), physical appearance/sexiness (Ambady & Rosenthal,
1993), and other identity markers that do not affect teaching quality. Gender, in particular, has
attracted significant attention. One recent study examined two online classes: one in which
instructors identified themselves to students as male, and another in which they identified as
female (regardless of the instructorâ€™s actual gender) (Macnell et al., 2015). The classes were
identical in structure and content, and the instructorsâ€™ true identities were concealed from
students. The study found that students rated the male identity higher on average. However, a
few studies have demonstrated the reverse of the gender bias mentioned above (that is, women
received higher scores) (Bachen et al., 1999) while others have registered no gender bias one
way or another (Centra & Gaubatz, 2000).
The goal of presenting these criticisms is not necessarily to diminish the institutional
importance of SETs. Of course, insofar as institutions value the instruction of their students, it is
important that those students have some say in the content and character of that instruction.
Rather, the goal here is simply to demonstrate that using SETs for faculty development
purposesâ€”much less for personnel decisionsâ€”can present problems. It is also to make the
case that, despite the abundance of literature on SETs, there is still plenty of room for scholarly
attempts to make these instruments more useful.
Empirical Scales and Locally-Relevant Evaluation
One way to ensure that teaching assessments are more responsive to the demands of
teachersâ€™ local contexts is to develop those assessments locally, ideally via a process that
involves the input of a variety of local stakeholders. Here, writing assessment literature offers a
promising path forward: empirical scale development, the process of structuring and calibrating
instruments in response to local input and data (e.g., in the context of writing assessment,
student writing samples and performance information). This practice contrasts, for instance, with
deductive approaches to scale development that attempt to represent predetermined theoretical
constructs so that results can be generalized.
Supporters of the empirical process argue that empirical scales have several
advantages. They are frequently posited as potential solutions to well-documented reliability and
validity issues that can occur with theoretical or intuitive scale development (Brindley, 1998;
Turner & Upshur, 1995, 2002). Empirical scales can also help researchers avoid issues caused
by subjective or vaguely-worded standards in other kinds of scales (Brindley, 1998) because
they require buy-in from local stakeholders who must agree on these standards based on
their understanding of the local context. Fulcher et al. (2011) note the following, for instance:
Measurement-driven scales suffer from descriptional inadequacy. They are not sensitive
to the communicative context or the interactional complexities of language use. The level
of abstraction is too great, creating a gulf between the score and its meaning. Only with
a richer description of contextually based performance, can we strengthen the meaning
of the score, and hence the validity of score-based inferences. (pp. 8â€“9)
There is also some evidence that the branching structure of the EBB scale specifically
can allow for more reliable and valid assessments, even if it is typically easier to calibrate and
use conventional scales (Hirai & Koizumi, 2013). Finally, scholars have also argued that
theory-based approaches to scale development do not always result in instruments that
realistically capture ordinary classroom situations (Knoch, 2007, 2009).
The most prevalent criticism of empirical scale development in the literature is that the
local, contingent nature of empirical scales basically discards any notion of their resultsâ€™
generalizability. Fulcher (2003), for instance, makes this basic criticism of the EBB scale even
as he subsequently argues that â€œthe explicitness of the design methodology for EBBs is
impressive, and their usefulness in pedagogic settings is attractiveâ€ (p. 107). In the context of
this particular paperâ€™s aims, there is also the fact that the literature supporting empirical scale
development originates in the field of writing assessment, rather than teaching assessment.
Moreover, there is little extant research into the applications of empirical scale development for
the latter purpose. Thus, there is no guarantee that the benefits of empirical development
approaches can be realized in the realm of teaching assessment. There is also no guarantee
that they cannot. In taking a tentative step towards a better understanding of how these
assessment schema function in a new context, then, the study described in the next section
asks whether the principles that guide some of the most promising practices for assessing
students cannot be put to productive use in assessing teachers.
Materials and Methods
This section proposes a pilot study that will compare the ICaP SET to the Heavilon
Evaluation of Teacher (HET), an instrument designed to combat the statistical ceiling effect
described above. In this section, the format and composition of the HET is described, with
special attention paid to its branching scale design. Following this, the procedure for the study is
outlined, and planned interpretations of the data are discussed.
The Purdue ICaP SET
The SET employed by Introductory Composition at Purdue (ICaP) program as of
January 2019 serves as an example of many of the prevailing trends in current SET
administration. The evaluation is administered digitally: ICaP students receive an invitation to
complete the evaluation via email near the end of the semester, and must complete it before
finals week (i.e., the week that follows the normal sixteen-week term) for their responses to be
counted. The evaluation is entirely optional: teachers may not require their students to complete
it, nor may they offer incentives like extra credit as motivation. However, some instructors opt to
devote a small amount of in-class time for the evaluations. In these cases, it is common practice
for instructors to leave the room so as not to coerce high scores.
The ICaP SET mostly takes the form of a simple multiple-choice survey. Thirty-four
MCQs appear on the survey. Of these, the first four relate to demographics: students must
indicate their year of instruction, their expected grade, their area of study, and whether they are
taking the course as a requirement or as an elective. Following these are two questions related
to the overall quality of the course and the instructor (students must rate each from â€œvery poorâ€
to â€œexcellentâ€ on a five-point scale). These are â€œuniversity coreâ€ questions that must appear on
every SET administered at Purdue, regardless of school, major, or course. The Students are
also invited to respond to two short-answer prompts: â€œWhat specific suggestions do you have for
improving the course or the way it is taught?â€ and â€œwhat is something that the professor does
well?â€ Responses to these questions are optional.
The remainder of the MCQs (thirty in total) are chosen from a list of 646 possible
questions provided by the Purdue Instructor Course Evaluation Service (PICES) by department
administrators. Each of these PICES questions requires students to respond to a statement
about the course on a five-point Likert scale. Likert scales are simple scales used to indicate
degrees of agreement. In the case of the ICaP SET, students must indicate whether they
strongly agree, agree, disagree, strongly disagree, or are undecided. These thirty Likert scale
questions assess a wide variety of the course and instructorâ€™s qualities. Examples include â€œMy
instructor seems well-prepared for class,â€ â€œThis course helps me analyze my own and other
students’ writing,â€ and â€œWhen I have a question or comment I know it will be respected,â€ for
One important consequence of the ICaP SET within the Purdue English department is
the Excellence in Teaching Award (which, prior to Fall 2018, was named the Quintilian or,
colloquially, â€œQâ€ Award). This is a symbolic prize given every semester to graduate instructors
who score highly on their evaluations. According to the ICaP site, â€œICaP instructors whose
teaching evaluations achieve a certain threshold earn [the award], recognizing the top 10% of
teaching evaluations at Purdue.â€ While this description is misleadingâ€”the award actually goes
to instructors whose SET scores rank in the top decile in the range of possible outcomes, but
not necessarily ones who scored better than 90% of other instructorsâ€”the award nevertheless
provides an opportunity for departmental instructors to distinguish their CVs and teaching
Insofar as it is distributed digitally, it is composed of MCQs (plus a few short-answer
responses), and it is intended as end-of-term summative assessment, the ICaP SET embodies
the current prevailing trends in university-level SET administration. In this pilot study, it serves
as a stand-in for current SET administration practices (as generally conceived).
Like the ICaP SET, the HET uses student responses to questions to produce a score
that purports to represent their teacherâ€™s pedagogical ability. It has a similar number of items
(28, as opposed to the ICaP SETâ€™s 34). However, despite these superficial similarities, the
instrumentâ€™s structure and content differ substantially from the ICaP SETâ€™s.
The most notable differences are the construction of the items on the text and the way
that responses to these items determine the teacherâ€™s final score. Items on the HET do not use
the typical Likert scale, but instead prompt students to respond to a question with a simple
â€œyes/noâ€ binary choice. By answering â€œyesâ€ and â€œnoâ€ to these questions, student responders
navigate a branching â€œtreeâ€ map of possibilities whose endpoints correspond to points on a 33-
point ordinal scale.
The items on the HET are grouped into six suites according to their relevance to six
different aspects of the teaching construct (described below). The suites of questions
correspond to directional nodes on the scaleâ€”branching paths where an instructor can move
either â€œupâ€ or â€œdownâ€ based on the studentâ€™s responses. If a student awards a set number of
â€œyesâ€ responses to questions in a given suite (signifying a positive perception of the instructorâ€™s
teaching), the instructor moves up on the scale. If a student does not award enough â€œyesâ€
responses, the instructor moves down. Thus, after the student has answered all of the
questions, the instructorâ€™s â€œend positionâ€ on the branching tree of possibilities corresponds to a
point on the 33-point scale. A visualization of this structure is presented in Table 1.
Illustration of HETâ€™s Branching Structure
Note. Each node in this diagram corresponds to a suite of HET/ICALT items, rather than to a single item.
The questions on the HET derive from the International Comparative Analysis of
Learning and Teaching (ICALT), an instrument that measures observable teaching behaviors for
the purpose of international pedagogical research within the European Union. The most recent
version of the ICALT contains 32 items across six topic domains that correspond to six broad
teaching skills. For each item, students rate a statement about the teacher on a four-point Likert
scale. The main advantage of using ICALT items in the HET is that they have been
independently tested for reliability and validity numerous times over 17 years of development
(see, e.g., Van de Grift, 2007). Thus, their results lend themselves to meaningful comparisons
between teachers (as well as providing administrators a reasonable level of confidence in their
ability to model the teaching construct itself).
The six â€œsuitesâ€ of questions on the HET, which correspond to the six topic domains on
the ICALT, are presented in Table 1.
HET Question Suites
Suite # of Items Description
Safe learning environment 4 Whether the teacher is able to
maintain positive, nonthreatening
relationships with students (and to
foster these sorts of relationships
Classroom management 4 Whether the teacher is able to
maintain an orderly, predictable
Clear instruction 7 Whether the teacher is able to
explain class topics
comprehensibly, provide clear sets
of goals for assignments, and
articulate the connections between
the assignments and the class
topics in helpful ways.
Suite # of Items Description
Activating teaching methods 7 Whether the teacher uses strategies
that motivate students to think about
the classâ€™s topics.
Learning strategies 6 Whether teachers take explicit steps
to teach students how to learn (as
opposed to merely providing
students informational content).
Differentiation 4 Whether teachers can successfully
adjust their behavior to meet the
diverse learning needs of individual
Note. Item numbers are derived from original ICALT item suites.
The items on the HET are modified from the ICALT items only insofar as they are phrased
as binary choices, rather than as invitations to rate the teacher. Usually, this means the addition
of the word â€œdoesâ€ and a question mark at the end of the sentence. For example, the second
safe learning climate item on the ICALT is presented as â€œThe teacher maintains a relaxed
atmosphere.â€ On the HET, this item is rephrased as, â€œDoes the teacher maintain a relaxed
atmosphere?â€ See Appendix for additional sample items.
As will be discussed below, the ordering of item suites plays a decisive role in the teacherâ€™s
final score because the branching scale rates earlier suites more powerfully. So too does the
â€œsensitivityâ€ of each suite of items (i.e., the number of positive responses required to progress
upward at each branching node). This means that it is important for local stakeholders to
participate in the development of the scale. In other words, these stakeholders must be involved
in decisions about how to order the item suites and adjust the sensitivity of each node. This is
described in more detail below.
Once the scale has been developed, the assessment has been administered, and the
teacherâ€™s endpoint score has been obtained, the student rater is prompted to offer any textual
feedback that s/he feels summarizes the course experience, good or bad. Like the short
response items in the ICaP SET, this item is optional. The short-response item is as follows:
â€¢ What would you say about this instructor, good or bad, to another student considering
taking this course?
The final four items are demographic questions. For these, students indicate their grade
level, their expected grade for the course, their school/college (e.g., College of Liberal Arts,
School of Agriculture, etc.), and whether they are taking the course as an elective or as a
degree requirement. These questions are identical to the demographic items on the ICaP SET.
To summarize, the items on the HET are presented as follows:
â€¢ Branching binary questions (32 different items; six branches)
o These questions provide the teacherâ€™s numerical score
â€¢ Short response prompt (one item)
â€¢ Demographic questions (four items)
The main data for this instrument are derived from the endpoints on a branching ordinal
scale with 33 points. Because each question is presented as a binary yes/no choice (with â€œyesâ€
suggesting a better teacher), and because paths on the branching scale are decided in terms of
whether the teacher receives all â€œyesâ€ responses in a given suite, 32 possible outcomes are
possible from the first five suites of items. For example, the worst possible outcome would be
five successive â€œdownâ€ branches, the second-worst possible outcome would be four â€œdownâ€
branches followed by an â€œup,â€ and so on. The sixth suite is a tie-breaker: instructors receive a
single additional point if they receive all â€œyesâ€ responses on this suite.
By positioning certain suites of items early in the branching sequence, the HET gives
them more weight. For example, the first suite is the most important of all: an â€œupâ€ here
automatically places the teacher above 16 on the scale, while a â€œdownâ€ precludes all scores
faculty in US higher ed. AAUP Updates. https://www.aaup.org/news/data-snapshotcontingent-faculty-us-higher-ed#.Xfpdmy2ZNR4
of Educational Psychology, 87(4), 656â€“665. http://dx.doi.org/10.1037/0022-
Becker, W. (2000). Teaching economics in the 21st century. Journal of Economic Perspectives,
14(1), 109â€“120. http://dx.doi.org/10.1257/jep.14.1.109
Benton, S., & Young, S. (2018). Best practices in the evaluation of teaching. Idea paper, 69.
Anderson, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. PS: Political
Science and Politics, 30(2), 216â€“219. https://doi.org/10.2307/420499
Armstrong, J. S. (1998). Are student ratings of instruction useful? American Psychologist,
53(11), 1223â€“1224. http://dx.doi.org/10.1037/0003-066X.53.11.1223
Attiyeh, R., & Lumsden, K. G. (1972). Some modern myths in teaching economics: The U.K.
experience. American Economic Review, 62(1), 429â€“443.
Bachen, C. M., McLoughlin, M. M., & Garcia, S. S. (1999). Assessing the role of gender in
college students’ evaluations of faculty. Communication Education, 48(3), 193â€“210.
Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal
Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin
slices of nonverbal behavior and physical attractiveness. Journal of Personality and
Social Psychology, 64(3), 431â€“441. http://dx.doi.org/10.1037/0022-35126.96.36.1991
American Association of University Professors. (n.d.). Background facts on contingent faculty
American Association of University Professors. (2018, October 11). Data snapshot: Contingent
Berk, R. A. (2005). Survey of 12 strategies to measure teaching effectiveness. International
Bloom, B. S., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of
educational objectives: The classification of educational goals. Addison-Wesley
Brandenburg, D., Slinde, C., & Batista, J. (1977). Student ratings of instruction: Validity and
normative interpretations. Research in Higher Education, 7(1), 67â€“78.
Carrell, S., & West, J. (2010). Does professor quality matter? Evidence from random
assignment of students to professors. Journal of Political Economy, 118(3), 409â€“432.
Cashin, W. E. (1990). Students do rate different academic fields differently. In M. Theall, & J. L.
Franklin (Eds.), Student ratings of instruction: Issues for improving practice. New
Directions for Teaching and Learning (pp. 113â€“121).
Centra, J., & Gaubatz, N. (2000). Is there gender bias in student evaluations of
teaching? The Journal of Higher Education, 71(1), 17â€“33.
Davis, B. G. (2009). Tools for teaching (2nd ed.). Jossey-Bass.
Denton, D. (2013). Responding to edTPA: Transforming practice or applying
shortcuts? AILACTE Journal, 10(1), 19â€“36.
Dizney, H., & Brickell, J. (1984). Effects of administrative scheduling and directions upon
student ratings of instruction. Contemporary Educational Psychology, 9(1), 1â€“7.
DuCette, J., & Kenney, J. (1982). Do grading standards affect student evaluations of teaching?
Some new evidence on an old question. Journal of Educational Psychology, 74(3), 308â€“
Journal of Teaching and Learning in Higher Education, 17(1), 48â€“62.
Edwards, J. E., & Waters, L. K. (1984). Halo and leniency control in ratings as influenced by
format, training, and rater characteristic differences. Managerial Psychology, 5(1), 1â€“16.
Fink, L. D. (2013). The current status of faculty development internationally. International
Journal for the Scholarship of Teaching and Learning, 7(2).
Fulcher, G. (2003). Testing second language speaking. Pearson Education.
Fulcher, G., Davidson, F., & Kemp, J. (2011). Effective rating scale development for speaking
tests: Performance decision trees. Language Testing, 28(1), 5â€“29.
Gaff, J. G., & Simpson, R. D. (1994). Faculty development in the United States. Innovative
Higher Education, 18(3), 167â€“76. https://doi.org/10.1007/BF01191111
Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to
Hoffman, R. A. (1983). Grade inflation and student evaluations of college courses. Educational
and Psychological Research, 3(3), 51â€“160. https://doi.org/10.1023/A:101557981
Howard, G., Conway, C., & Maxwell, S. (1985). Construct validity of measures of college
teaching effectiveness. Journal of Educational Psychology, 77(2), 187â€“96.
Kane, M. T. (2013) Validating interpretations and uses of test scores. Journal of Educational
Measurement, 50(1), 1â€“73.
Kelley, T. (1927) Interpretation of educational measurements. World Book Co.
Knoch, U. (2007). Do empirically developed rating scales function differently to conventional
rating scales for academic writing? Spaan Fellow Working Papers in Second or Foreign
Language Assessment, 5, 1â€“36. English Language Institute, University of Michigan.
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating
scales. Language Testing, 26(2), 275-304.
Sample ICALT Items Rephrased for HET
Suite Sample ICALT Item HET Phrasing
Safe learning environment The teacher promotes mutual
Does the teacher promote mutual
Classroom management The teacher uses learning time
Does the teacher use learning time
Clear instruction The teacher gives feedback to
Does the teacher give feedback to
Activating teaching methods The teacher provides interactive
instruction and activities.
Does the teacher provide interactive
instruction and activities?
Learning strategies The teacher provides interactive
instruction and activities.
Does the teacher provide interactive
instruction and activities?
Differentiation The teacher adapts the instruction
to the relevant differences between
Does the teacher adapt the
instruction to the relevant
differences between pupils?
Are you busy and do not have time to handle your assignment? Are you scared that your paper will not make the grade? Do you have responsibilities that may hinder you from turning in your assignment on time? Are you tired and can barely handle your assignment? Are your grades inconsistent?
Whichever your reason is, it is valid! You can get professional academic help from our service at affordable rates. We have a team of professional academic writers who can handle all your assignments.
Students barely have time to read. We got you! Have your literature essay or book review written without having the hassle of reading the book. You can get your literature paper custom-written for you by our literature specialists.
Do you struggle with finance? No need to torture yourself if finance is not your cup of tea. You can order your finance paper from our academic writing service and get 100% original work from competent finance experts.
While psychology may be an interesting subject, you may lack sufficient time to handle your assignments. Don’t despair; by using our academic writing service, you can be assured of perfect grades. Moreover, your grades will be consistent.
Engineering is quite a demanding subject. Students face a lot of pressure and barely have enough time to do what they love to do. Our academic writing service got you covered! Our engineering specialists follow the paper instructions and ensure timely delivery of the paper.
In the nursing course, you may have difficulties with literature reviews, annotated bibliographies, critical essays, and other assignments. Our nursing assignment writers will offer you professional nursing paper help at low prices.
Truth be told, sociology papers can be quite exhausting. Our academic writing service relieves you of fatigue, pressure, and stress. You can relax and have peace of mind as our academic writers handle your sociology assignment.
We take pride in having some of the best business writers in the industry. Our business writers have a lot of experience in the field. They are reliable, and you can be assured of a high-grade paper. They are able to handle business papers of any subject, length, deadline, and difficulty!
We boast of having some of the most experienced statistics experts in the industry. Our statistics experts have diverse skills, expertise, and knowledge to handle any kind of assignment. They have access to all kinds of software to get your assignment done.
Writing a law essay may prove to be an insurmountable obstacle, especially when you need to know the peculiarities of the legislative framework. Take advantage of our top-notch law specialists and get superb grades and 100% satisfaction.
We have highlighted some of the most popular subjects we handle above. Those are just a tip of the iceberg. We deal in all academic disciplines since our writers are as diverse. They have been drawn from across all disciplines, and orders are assigned to those writers believed to be the best in the field. In a nutshell, there is no task we cannot handle; all you need to do is place your order with us. As long as your instructions are clear, just trust we shall deliver irrespective of the discipline.
Our essay writers are graduates with bachelor's, masters, Ph.D., and doctorate degrees in various subjects. The minimum requirement to be an essay writer with our essay writing service is to have a college degree. All our academic writers have a minimum of two years of academic writing. We have a stringent recruitment process to ensure that we get only the most competent essay writers in the industry. We also ensure that the writers are handsomely compensated for their value. The majority of our writers are native English speakers. As such, the fluency of language and grammar is impeccable.
There is a very low likelihood that you won’t like the paper.
Not at all. All papers are written from scratch. There is no way your tutor or instructor will realize that you did not write the paper yourself. In fact, we recommend using our assignment help services for consistent results.
We check all papers for plagiarism before we submit them. We use powerful plagiarism checking software such as SafeAssign, LopesWrite, and Turnitin. We also upload the plagiarism report so that you can review it. We understand that plagiarism is academic suicide. We would not take the risk of submitting plagiarized work and jeopardize your academic journey. Furthermore, we do not sell or use prewritten papers, and each paper is written from scratch.
You determine when you get the paper by setting the deadline when placing the order. All papers are delivered within the deadline. We are well aware that we operate in a time-sensitive industry. As such, we have laid out strategies to ensure that the client receives the paper on time and they never miss the deadline. We understand that papers that are submitted late have some points deducted. We do not want you to miss any points due to late submission. We work on beating deadlines by huge margins in order to ensure that you have ample time to review the paper before you submit it.
We have a privacy and confidentiality policy that guides our work. We NEVER share any customer information with third parties. Noone will ever know that you used our assignment help services. It’s only between you and us. We are bound by our policies to protect the customer’s identity and information. All your information, such as your names, phone number, email, order information, and so on, are protected. We have robust security systems that ensure that your data is protected. Hacking our systems is close to impossible, and it has never happened.
You fill all the paper instructions in the order form. Make sure you include all the helpful materials so that our academic writers can deliver the perfect paper. It will also help to eliminate unnecessary revisions.
Proceed to pay for the paper so that it can be assigned to one of our expert academic writers. The paper subject is matched with the writer’s area of specialization.
You communicate with the writer and know about the progress of the paper. The client can ask the writer for drafts of the paper. The client can upload extra material and include additional instructions from the lecturer. Receive a paper.
The paper is sent to your email and uploaded to your personal account. You also get a plagiarism report attached to your paper.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more