Celebrity Memory Test Version Update


Figure 1: Percentage of correctly named celebrities derived from the Beta version of the task. The average percentage correct was 35%, which was much too low for the task to be deployed into a wider population as it was.

With this post we would like to give you an  update on the progress being made on the Celebrity Memory Test. From its initial launch, we have piloted the test with 2 cohorts of university students so far. The results from the first run (see Fig. 1) showed that the initial test was difficult and therefore had quite low scores even though participants were young, healthy students. It was clear that some changes were necessary. This motivated changes to the way celebrity faces were selected for use  in the test Here, we present a comparison between the initial and second version of the  Celebrity Memory Test. For the demographic comparison between the two runs, see the “Participants” section at the end of this post.

Comparison of run 1 vs. run 2

Figure 2: Example of an image that participants were presented with. In this case, the celebrity was Michael Jackson.

The task consisted of naming celebrities based on displayed images. A participant can enter a name into a text box beneath the image, and be taken to the next page, where they were asked to rate their performance. The possible answers were (see Figure 2):

  • they got the answer right
  • had the name on the tip of their tongue
  • recognized but couldn’t name the person
  • they didn’t recognize the person at all.

The first study obtained, on average, 7.13 correctly recognized instances out of all 20 presented stimuli (SD = 3.23). The mean number of unknown celebrities was 9.05 (SD  = 3.20). An average of 2.98 celebrities were recognized, but participants were unable to recall their name (SD = 1.96). Very few reported having the name at their “tip of the tongue” (M = .84, SD = 1.00) [provide a figure reference].  This represents a mere 35.56% accuracy, as opposed to the intended 70-80%. Therefore, the second study was completed with a different stimulus selection algorithm, aimed at improving this performance.

In order to achieve this, the way that celebrities were drawn from the entire Pantheon dataset was tweaked. Based on the data collected in the first study, celebrity stimuli were sorted into four categories ranging from very famous (A-list) to less famous (D-list), based on recognition scores obtained from the first run.

[give a few examples: top5 %recognised vs. bottom5 in a table?].

These categories were then implemented into the survey to create an adaptive celebrity selection process in the test. Concretely, the first 5 questions drew samples from the A-list, and subsequent questions drew new samples depending on the participants’ current performance. If performance was over 90% correct, stimuli were taken from the D_list, where the faces are less well-known and therefore had an expected correct rate of under 50%. If the performance was over 80%, the C_list was used, with an expected correct rate of up to 50%. Similarly, when performing above 70% correct, stimuli from the B-list were used with an expected correct rate of up to 75% and if the performance was lower than 70% correct, only stimuli from the A-list was used, which has an expected correct rate of over 75%. It was expected that this would drastically improve the performance of participants, and also allow for a personalized analysis of their memory capacity, as discussed above.

The following study achieved an average score of 16.57 (SD = 3.42)correctly recognized celebrities, which represented an average of 66.26% correct (see Figure 3). The mean number of unknown celebrities was 4.62 (SD = 2.85). An average of 2.91 celebrities were recognized, but participants were unable to recall their name (SD = 2.1). Very few reported having the name at their “tip of the tongue” (M = .9, SD = 1.11).. This is a sufficient improvement for the application, and signifies that the tool is ready for further deployment in a wider population.


Figure 3: Percentage of correctly named celebrities derived from the Gamma version of the task. The average percentage correct was now 66%, a significant improvement compared to the Beta version. This increase in average performance will allow for the task to be used in a wider population.


Students were recruited from Tilburg University. Students receive credits as remuneration for taking the questionnaire. In the initial study, ages ranged from 18 to 38, with a mean of 22.3 years (SD = 3.55). Of the participants, 81 were female, 40 were male and one participant selected not to disclose that information, totalling to 122 participants. Only 10 participants were native English speakers, and 112 were not.  Participants were also asked to supply information about the highest education level they had completed. Most participants had completed a Bachelor’s or Associate’s Degree (N = 72), 29 had completed High School, 8 had a Master’s degree, 5 a Doctoral or Professional Degree and 2 had completed College. Six participants selected that they had completed some university, but hadn’t finished.

In the follow up study, ages ranged between X and Y, with a mean of Z years (SD = A). The participants consisted of B females, C males, and D participants opting not to disclose that information. The total was E participants. Of the participants, F were native speakers and G were not. Participants with a completed Bachelor’s Degree or Associate’s Degree numbered H, I had completed High School, J had completed a Master’s Degree, K a Doctoral or Professional Degree, L had completed College and M had completed some university only.


The questionnaire itself was developed and distributed online using Qualtrics.

The dataset of celebrity images employed for the online questionnaire was the Pantheon dataset (Yu et al., 2016), which consists of images of famous individuals drawn from Wikipedia. It includes information about the lifetime, occupation and birthplaces of the celebrities, as well as popularity measures. For the project outlined in this report, the dataset was constrained to entries born after WW2, and stripped of some sensitive characters such as extremists, mafiosos, pirates and pornographic actors. Additionally, to select for the more famous individuals, only the entries with the highest page views were retained in the final dataset. This also skewed the dataset to favour individuals that participants were more likely to recognize, increasing the applicability of the experiment.


The participants signed up through the university. They had the possibility to complete it from anywhere using any device capable of accessing the questionnaire online. The time window in which the first experiment was available was between 04/09/2019 until 11/11/2019. The second experiment was conducted in the same way between 08/04/2020 and 28/04/2020.

The questionnaire started with some basic demographic questions about age, gender and whether English was their native language. Participants subsequently got a quick tutorial on how the experiment was structured, instructing them to firstly write the name of the presented celebrity in a text box. Afterwards, they would assess themselves on whether they either: got the answer right, had the name on the tip of their tongue, recognized but couldn’t name the person, or whether they didn’t recognize the person at all. Following the instructions, the experiment began. Participants were presented with a total of 25 stimuli. Once this was completed, the questionnaire asked for some further demographic information.


The results demonstrate that there was a significant improvement in performance between the initial and follow-up studies, as planned. Changing the criteria to favor more famous individuals vastly improved the applicability of the tool and allowed for meaningful analyses in a wide population, where performance is bound to vary. Ensuring that the population that is among the least likely to suffer from cognitive decline (young adults), performs well on the task, also ensures that populations that are more at risk for memory problems will obtain scores that are valid for analysis.

This marks the progress that’s being made, and allows for the project to move forward to testing in target groups with mild cognitive impairment. Practically, this will entail further testing and many more tweaks, but it has been a crucial step towards making Mindify the diagnostic tool it’s intended to be.

Posted in Uncategorized.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.