Wager Mage
Photo by Markus Spiske Pexels Logo Photo: Markus Spiske

How accurate are human predictions?

We find that individual guesses by humans achieve 58.3% accuracy, better than random, but worse than machines which display 71.6% accuracy.

What does +4.5 mean in spread?
What does +4.5 mean in spread?

Above is an football point spread. Pittsburgh is +4.5, with Cincinnati at -4.5, which means Pittsburgh is a 4.5-point underdog and Cincinnati is...

Read More »
What is England's biggest ever international win?
What is England's biggest ever international win?

England won the 1966 World Cup Final (a tournament it also hosted), making it one of eight nations to have won the World Cup. They have qualified...

Read More »

Next, we turn to measuring the human accuracy at predicting whether a group escaped or not. We blur the photos to obfuscate any explicit performance signaling information (e.g., the signs), but keep the faces of the group members clear (Fig. 4). We also made sure that the photos are not too blurred so that study participants can see the body posture and the physical proximity of the group members. We run two studies. In the first study, we measure the baseline accuracy of the participants’ predictions and test whether training participants by showing them only four labeled photos improves their accuracy. In the second study, we investigate whether showing participants four training examples is enough and whether showing more labeled photos significantly improves their accuracy. Figure 4 A sketch of a real photo shown to study participants asked to predict whether a group escaped or not (blurred version of the photo in Fig. 1). All performance-signaling information has been obfuscated, while still allowing participants to see the body posture and physical proximity of the group members. Full size image

Study 1

The goals of this study are to measure the baseline accuracy of the human predictions and to test whether training participants by showing them a sample of labeled photos will improve their prediction accuracy.

Experimental setup

For the purpose of the study, we used a random sample of 2,000 photos extracted from the full dataset. We recruited 400 participants (U.S. residents only) using the Mechanical Turk platform for a compensation of 60 cents per participant. We randomly assigned participants to one of two conditions: training (treatment) and no training (control). Participants in both groups went through a “warm up” phase. However, in the training (treatment) condition participants were first shown four blurred photos, asked to predict whether the group on the photo escaped or not, and then were given the answer, i.e., shown the original (unblurred) photo. The photos were shown one at a time, giving participants immediate feedback before the next prediction. In the no training (control) condition, participants went through the same process, but they were not shown the original, unblurred photos. Participants in both conditions were told that this stage is only meant to help them familiarize themselves with the task and that their answers will not be taken into account. After the initial training stage, we gave participants in both conditions the main task. We showed each participant 10 blurred photos (similar to Fig. 4) and we asked them to predict whether the group shown in each photo escaped or not. To avoid confusion, we showed participants one photo at a time. To ensure that the participants in the two conditions were shown the exact same set of photos, we randomly assigned each photo to one and only one participant in each condition. The participants made a total of 4,000 predictions (400 participants, 10 predictions / participant) and each photo was rated by exactly two participants, one from each condition.

Results

To compare the accuracy of the participants’ predictions in the two conditions, we first compute the accuracy of each participant on the ten photos they were shown and then compute the mean accuracy of all the participants in the same condition. We find that training increases the accuracy of the participants’ predictions by 5% (Fig. 5a), a statistically significant increase ((p= 0.002)). Trained participants achieve a mean accuracy of 64% (95% CIs [62%, 66%]) superior to the accuracy of the untrained participants with a mean accuracy of 59% (95% CIs [57%, 61%]). It is worth noting that since participants in both conditions were shown the same four blurred photos before being asked to performed the main task, we know that the improved accuracy of the trained participants is not due to mere exposure to more blurred photos or more familiarity with the blurred photos. Instead, the difference in accuracy must be due to exposure to ground truth (i.e., original photos), a process similar to the training of machine learning models.

Study 2

What is 3 shots in golf called?
What is 3 shots in golf called?

For hole completions three strokes under par is recognized in golf as Albatross. This is also known as “double eagle” in relation to the “birdie”...

Read More »
How hard is it to hit a 10 team parlay?
How hard is it to hit a 10 team parlay?

A 10 team parlay payout will return 643.08 times your wager amount using -110 odds for each leg of the parlay. A ten team parlay has a 0.2% percent...

Read More »

In Study 1, we found that training participants by showing them four labeled photos significantly improves their performance compared to no training at all. This prompts the question of whether participants will make even more accurate predictions if they went through a longer training phase and were shown more labeled photos. The answer to this question is non-obvious and may unfold in three different ways: (1) by seeing more labeled photos participants may have a better chance at detecting more nuanced patterns in the photos of successful vs. unsuccessful groups and further improve their performance, (2) participants may have already inferred all important patterns of success and longer training phase will not have any effect on their accuracy, or (3) longer training phase may lead to respondents fatigue and deteriorate their prediction accuracy. In this study, we run another experiment to answer this question empirically.

Experimental setup

We randomly assigned participants to one of four conditions: no training (control), training with 4, training with 8, or training with 12 photos. We recruited 400 participants (100 per condition) from the the U.S., prohibiting participants who took part in Study 1 to participate again. We increased the compensation rate to $1 as we expected the task to take significantly longer due to the longer training phase in two of the four conditions. In the training phase, we used a random sample of 1,200 photos from the full dataset. To ensure that the training photos are as similar as possible across different conditions, we selected them in batches. In each batch, we first took a batch of 12 from the 1,200 photos and used these photos in the training 12 condition. Then we took a subsample of 8 and 4 photos out of these 12 and used them in the training 8 and training 4 conditions, respectively. We repeated this process 100 times. In all cases, we ensured that the samples contain an equal number of positive and negative examples. To select the photos for the testing phase, we randomly sampled a different group of 1,000 photos and then divided the 1,000 photos into groups of 10 and used the same groups across all four conditions. Photos shown in the training phase did not appear in the test phase. The task was exactly the same as in Study 1. In the training phase participants were shown a blurred photo, asked to guess whether the group escaped or not, then shown the original photo, repeating this process 4, 8, or 12 times depending on the condition they were assigned to. Participants in the no training condition skipped this phase. In the test phase, participants were shown only blurred photos, asked to guess whether the group escaped and to indicate their confidence. Participants in all conditions were shown 10 test photos. Figure 5 (a) Results of Study 1. Comparison between the prediction accuracy of participants in the two conditions: no training (control) vs. training (treatment). Trained participants, who were shown four labeled photos, perform significantly better than untrained participants. (b) Results of Study 2. Comparison between the prediction accuracy of participants who were not shown any training photos and participants who were shown 4, 8, or 12 training photos. As in Study 1, showing participants four labeled photos significantly improves their prediction accuracy. However, the improvements of showing additional labeled photos are not statistically significant. (The error bars represent (95\%) confidence intervals; ***(p<0.001), **(p<0.01), *(p<0.05)). Full size image

Results

Similar to Study 1, we compute the accuracy of each participant over the 10 test photos and then we compute the mean accuracy of the participants over the four different conditions. First, we observe that our findings from Study 1 are replicated in this study: training participants by giving them feedback on their predictions on only four training photos significantly increases their accuracy over no training ((p=0.01)). Beyond replicating the same pattern, we observe very similar levels of accuracy in the two conditions across the two studies: no training, 58.2% accuracy; and training on four photos, 64.5% accuracy (Fig. 5b). This also suggests that the increase of the compensation from 60 cents to $1 did not have a significant effect on the accuracy.

What is a wasp car?
What is a wasp car?

The Hudson Wasp is an automobile that was built and marketed by the Hudson Motor Car Company of Detroit, Michigan, from the 1952 through the 1956...

Read More »
Who is the richest gambler?
Who is the richest gambler?

William Benter (born 1957) is an American professional gambler and philanthropist who focuses on horse betting. Benter earned nearly $1 billion...

Read More »

Second, we find that the returns of training participants on more photos are diminishing. Training participants on 8 instead of 4 photos leads to almost the same level of accuracy, 63.96% ((p=0.84)). Similarly training participants on 12 instead of 4 photos leads to accuracy of 67.4% vs. 64.5%, The improvement of 2.9% is not statistically significant ((p=0.2)).

Human predictions and group characteristics

Besides asking participants of the studies to make predictions, we also asked them to explain how they made their decisions. At the end of the survey, participants were given a list of group characteristics and asked to report which, if any, of the listed characteristics they considered while making their predictions (they could select multiple characteristics). Figure 6 Analysis of the group characteristics used by the participants in Study 1 to make their predictions. The left panel shows the “stated” relevance, i.e., how often each group characteristic has been reported as important by the participants. The right panel shows the “actual” relevance, i.e., absolute value of the correlation (Spearman correlation) between the predictions made by participants (escaped or not) and each of the nine characteristics. Full size image Here we look at which group characteristics participants used to make their predictions. We compute two metrics: (1) the fraction of participants who reported that a given group characteristic was useful in making their prediction, and (2) the actual correlation between the participants’ predictions and the group characteristics (Fig. 6). Since the direction of the correlation is irrelevant, to ease the comparison, we report the absolute values of the correlations. While the first metric captures the group characteristics that participants were aware of, the second captures the group characteristics that actually influenced participants the most. We find that the overall emotional expression of the group members (i.e., mean smiling index) is by far the most used group characteristic, both according to the participants reports and the correlation with their predictions (Fig. 6). We also find that participants tend to overestimate the importance of the diversity of emotional expressions of the group members (i.e., standard deviation of the smiling index), but underestimate the importance of the physical distance between the group members and the overall age of the group. The last two, although more correlated to the participants’ predictions, are less often reported as important characteristics by the participants. This suggests that these characteristics may have subconsciously influenced the participants’ predictions.

Is gambling bailable in India?
Is gambling bailable in India?

The law commission of the state has already prepared a draft bill that penalizes any act of gambling, including online gambling and various forms...

Read More »
How does a Yankee bet work?
How does a Yankee bet work?

The 'Yankee' bet type is made up from four selections, with 11 individual bets across those chosen picks. In this bet, you will have six doubles,...

Read More »
Why did Sportsbet get fined?
Why did Sportsbet get fined?

The Australian Communications and Media Authority (ACMA) found that Sportsbet breached the Spam Act 2003 (Cth) (Spam Act) by: sending, or causing...

Read More »
How much does a 2 team teaser pay?
How much does a 2 team teaser pay?

The most common teaser is a two-team, six-point football teaser that pays the close to one normal point spread bet at -120 and gives you six extra...

Read More »