Review score, expertise, and length statistics for InfoVis 2016 and 2017.

The InfoVis Review Process by the Numbers

by Niklas Elmqvist, University of Maryland, College Park

Serving as papers co-chair for IEEE InfoVis 2016 and 2017 gave me a unique insight into the review process for the conference. In this post, I will present, compare, and briefly discuss some statistics for these two years of the conference.

For context, InfoVis 2016 had 165 submissions and accepted 37, for an acceptance rate of 22.4%. There were 261 reviewers and a total of 666 reviews (scary!). InfoVis 2017, on the other had, saw 170 submissions and 39 accepts, yielding a 22.9% acceptance rate. There were 286 reviewers writing a total of 690 reviews. (Note that the number of reviewers and reviews is approximate because even papers co-chairs don’t get to see all data due to conflicts of interests.)

Scores

InfoVis reviewers assign scores (or ratings) to submissions on a scale from 1 (lowest) to 5 (highest), including half points. In other words, this scale has 9 levels, labeled as follows: 5 – strong accept, 4.5 – between accept and strong accept, 4 – accept, 3.5 – between possible accept and accept, 3 – possible accept, 2.5 – between reject and possible accept, 2 – reject, 1.5 – between strong reject and reject, and 1 – strong reject. Here are two histograms showing the distribution and averages (red line) of scores for InfoVis 2016 (left) and 2017 (right).

In general, the shapes of these charts (which is really the only thing that matters since the absolute numbers are different) are quite similar. In 2017, the drop-off for higher scores is a little steeper and there seems to be more 2.0 and 5.0 ratings than for 2016.

Expertise

The self-reported expertise for InfoVis reviews indicates the level of expertise for that reviewer in the topic of the submission being reviewed. The implicit assumption is that an expert-level reviewer has more authority and credibility about their rating than a reviewer with a lower expertise. This scale has three levels: 1 – no or passing knowledge, 2 – knowledgeable, and 3 – expert. Here are two histograms showing the distribution and averages (red line) of expertise for InfoVis 2016 (left) and 2017 (right).

Again, the shapes are similar, even if there seems to be fewer (self-reported) experts in 2017 than in 2016.

Review Length

The number of characters in a review are not necessarily direct indicators of review quality—a short review can still be good—but they at least measure the amount of time spent on writing the review (although not necessarily reading the submission being reviewed). Furthermore, there is also probably a lower bound on what constitutes a useful review: ideally, even a positive review should list compelling arguments for accepting a paper, and a negative review should give concrete suggestions for improvements rather than merely listing the shortcomings of the submission. For this reason, review length can be a useful metric to study. Here are two histograms showing the distribution and averages (red line) of review lengths for InfoVis 2016 (left) and 2017 (right).

Once again, the shape of the two histograms is similar, although it looks like the average length has increased slightly from 2016 to 2017, and there is a more even number of reviews between 2,000 to 4,000 and between 4,000 and 6,000 characters in 2017.

Disposition

Finally, the disposition of a reviewer is defined as the reviewer’s average score difference from the average score for each submission the reviewer was involved in. In other words, a negative disposition means that the reviewer consistently gives a lower score than other reviewers for the same submission (i.e. the reviewer is “harsh” or critical), whereas a positive disposition means the reviewer gives a higher score (i.e. the reviewer is “nice” or positive). Near-zero disposition means that the reviewer tends to give scores close to the average score. Of course, a reviewer that sometimes gives harsh scores and sometimes nice scores may also have a near-zero disposition. We capture this using the “distance” metric, which is the average of the absolute difference, giving an indication of how divergent the reviewer’s scores are.

Anyway, here are two scatterplots showing the disposition (horizontal axis) and the standard deviation of the disposition for each reviewer for InfoVis 2016 (left) and 2017 (right).

Comparing these two plots is challenging and left as an exercise to the reader, but perhaps there are more reviewers on the positive side of disposition (right of the horizontal 0.0) in the 2017 plot? You be the judge.

Conclusion

Summarizing these charts, my conclusion is that the two years of the InfoVis conference more or less look the same in terms of reviewing. While this may be a little boring, it is also reassuring: it tells me that the review process is working, is consistent, and is able to handle the (slight) increase in submissions from 2016 to 2017. Do you have another take? Feel free to send me feedback.