View Full Version : Anime Academy Grades
LilSlugger
10-26-2006, 09:02 AM
I work as a Statistician for Central Government, but I am also a big Anime fan and I was interested in the grades given to various titles in your library.
I noted that a simple mean had be taken where more than one review related to an individual title.
I was wondering whether any work had been done to refine your rating sytem in allowing for the differences in various professors historical markings and standardising the scores obtained given the historical trends for each professor.
E.g. Some Professors might mark higher than others or may award a greater varation in marks.
These differences could be standardised by calculating the number of SDs different to the mean in relation to individuals compared with the population in order to derive an overall mark which neutralises the various marking differences.
Happy to here thoughts from all.
Regards
LilSlugger
7Raven7
10-26-2006, 11:47 AM
Being from the psychology field myself I actually had wondered something similar (though I am glad someone nerdier than myself brought it up).
Basically in any grading criteria a person will, over time, develop a tendancy. If it is toward higher marks, that person is a lenient grader, lower marks make a strict grader. Even a person who trys to be well rounded, resulting in that neat bell curve, gets a lable called a central tendancy.
While I don't think any revision is needed to the actual grading or posting, as viewers often have their favorite profs to read and little of a review's value has to do with the numerical grade, it would be interesting to see which professors are "naughty or nice." Maybe our resident statistician can whip something up (seeing as us 2 are prob the only ones lame enough to view this thread :( ).
Milkymagic
10-26-2006, 12:03 PM
Can I be lame too?
Seriously though, I'd like to see a statistic that shows what averages the professors have for their reviews, that would be cool.
I guess I would be lenient on my five-star scale, I've given over half of my anime at least 3's or 4's with plenty of 5's too. But in this same token, there's still tons of bad stuff I've seen too (1's and 2's). I use halves to adjust my mood if I'm uncertain to a whole rating). But then I buy according to what I like, and try to avoid something potentially bad (unless the story sounds real convincing regardless of outside criticism).
In short, don't make me a professor.
Nah, I'll view it :3
Speaking from a professor's point of view... I get what you're saying, generally speaking, but I see no good way (nor a good reason) to implement any sort of change.
For starters, on a practical end, reworking a new formula would be a pain, and yet even more work on Mugs' end whenever a new review is put up into the library. Also, it would only impact anime who have more than one review; what if a particular Professor who tends towards high scores is the only one to review a bunch of titles? Should (s)he and/or Mugs have to go through more trouble just to create a more accurate number read-out? What if one professor used to me more lenient, but has grown harsher as time goes on? What if a professor seems to give lower scores than most professors, but in reality is only reviewing anime that (s)he fins to be sub-par?
As far as the accuracy of a numerical grade goes, there really is no standard besides a base percentage; its only a number that the professor personally chooses to rate that particular anime with. The reviews aren't about the percentage, they're about what's written down in that review section, otherwise we would just list a whole bunch of anime titles with an number next to them, and the Deans wouldn't choose professors for their writing ability.
All in all, I don't care if someone wants to pile some statistical data on the numeric portion of AA's reviews, but I think any change or addition to our current system is impractical and unuseful.
And I hope what I say makes sense... been sick all day and my brain is kind of fuzzy.
LilSlugger
10-27-2006, 12:00 AM
Many thanks all for your interest.
Having read your comments, I agree that adjusting the grades is probably counter productive.
However, purely because I am so nerdie, I too would be interested in seeing an average grade awarded by each professor. I ll put something together and post it onto this thread for those who may be interested.
Finally I would like to point out that I am definitely the champion nerd! :doh
LilSlugger
LilSlugger
10-27-2006, 06:02 AM
Section 1
I have finished my analysis relating to the marks awarded by various professors using the data provided by the ‘library’ ‘sort by professor’ option of the main home page.
Because this is likely to be a slightly lengthy post I will make it in three parts.
This first part simply outlines the sections and data source used.
The second part outlines methodology.
The third part gives the table of results.
The fourth part the conclusions.
Hope to see you in part II, but maybe not !!!!
----------------------------------------------------------------------------
Section 2
There are various issues when ensuring the consistency of data across various groups. First there is the overall level of the data. i.e the mean or average mark awarded by each professor in this exercise. Second is the variation in marks, for example if one professor awards 60, 55, 50, 45, 40 the average is 50. However if another professor awards 90, 80, 50 , 20, 10, the average is still 50 but the variation in marks is much greater. This means an anime would be more likely to achieve very high marks or low marks with the second professor. There are also other issues such as skewness in the distribution of marks, i.e where two professors have similar average mark awarded and similar overall but marks awarded above the mean may have wider spread of those awarded below. Complicated eh!
To simply present the statistical mean, variance and skewness for each professor’s marks, while being informative, is not the most useful to a layman’s understanding. Therefore with this in mind I felt the most easily understandable representation of these issues would be to present a table showing the proportion of anime marks falling into each of three bands for each professor.
Band 1 is the percentage of marks awarded by a professor in excess 85%
Band 2 is the percentage of marks awarded by a professor that lie between 70% and 85% inclusive.
Band 3 is the percentage of marks awarded by a professor that are less than 70%%
Needless to say the total for each professor should sum to a total of 100%, give or take 1 point due to rounding.
How do we interpret this data.
Well a professor who has many more Band 1s compared with Band 3s is a lenient marker.
A professor who has many more Band 3s than Band 1s is a harsh marker.
A professor where Bands 1 & 3 tend towards similar values is a balanced marker.
A professor who has high proportion of marks in Band 2 has a low variation in marks and therefore will tend to undermark good anime and overmark poor anime.
A professor with a low proportion of marks in Band 2 has a high variation of marks and therefore would tend to undermark poor anime and overmark good anime.
On to the bread and butter. The table of results.
-------------------------------------------------------------
Section 3
Eek 19% 41% 40% -21%
Ender 30% 38% 32% -2%
Gatts 38% 36% 26% 12%
Griveton 17% 39% 43% -26%
Kain 24% 45% 31% -7%
Kei 25% 57% 18% 7%
Keitaro 32% 35% 32% 0%
Kjeldoran 26% 54% 20% 6%
Liegenschonheit 23% 52% 26% -3%
Madoka 29% 46% 25% 4%
Mugs 33% 49% 18% 16%
Soundchazer 35% 39% 26% 10%
Taleweaver 20% 60% 20% 0%
Key = First Figure 'Band 1', Second Figure 'Band 2 - ', Third Figure 'Band 3', Fourth Figure 'Band 1 - Band 3'
-------------------------------------------------------------------
Section 4
As can bee seen from the table, there seems to be differences in marks awarded by the various professors over and above that which can be explained away by the simple variation in the quality of the anime watched.
The harshest markers appear to be Eek and Griveton, having scored ‘Band 1-3’ scores of -21% and -26% respectively. Titles graded by these professors show evidence of being under marked and have increased chances of falling within the lower than 70% category and less chance of achieving an over 85% grade. Having said this, those titles achieving scores in excess of 85% from these professors are achieving the highest possible recommendation.
The most lenient marking professors are Gatts and Mugs, having scored ‘Band 1-3’ scores of 12% and 16% respectively. Titles graded by these professors show evidence of being over marked and have increased chances of falling within the higher than 85% category and less chance of achieving a lower than 70% grade. Having said this, those titles scoring less than 70% from these professors are achieving the strongest possible warning regarding poor quality.
The best balanced markers are Keitaro and Taleweaver. Both of whom have perfectly balanced Band 1 and Band 3 scores. However, it is interesting to note that while Keitaro achieves a Band 2 score of 35%, Taleweaver’s score is 60%. 60% is a high proportion to fall within the middle band and therefore while Taleweavers scores appear balanced it would seem that there is a tendancy to shy away from higher and lower scores and maintain a middle of the road marking.
My conclusion is that statistically the fairest marker on Anime Academy is Professor Keitaro, having awarded 32% High Grades, 35% Medium Grades and 32% low grades (rounded to the nearest percentage point).
First things first, double posting (let alone triple/quadruple/etc in a row posting) is against the rules. I'll forgive your first two because they are done at seperate times with seperated ideas, but the rest are uneccesary, as there are many other ways to split up what you're saying without making different posts.
As far as your research goes, I'll agree with the numbers, but I don't agree with your analyzation of those numbers. Again, just because certain professors tend to grade things high or low does mean that makes them lenient/harsh, that's just faulty logic. They could just review a lot of bad or good anime, or a lot of middle ground anime.
soundchazer
10-27-2006, 06:54 AM
I agree with Mana.
I tend not to watch too much trashy anime and review it, so that will always make me look more lenient than I would like to, but seriously, I'm too old to be using my precious time picking up anime I'm fairly sure I won't like at least a bit.
LilSlugger
10-27-2006, 07:11 AM
First of all I apologise for breaking the rules of the forum, it certainly was not my intention.
I certainly consider myself reprimanded both for the posts and my faulty logic.
I will try to be more thoughtful before posting in future.
Sorry again.
LilSlugger :( :whine:
PS Please feel free to delete the thread should you feel it necessary.
Roark
10-27-2006, 07:18 AM
No need to delete the thread. I, at least, find the mathematical nerdiness of this intriguing. I dont' think anyone is saying "stop" or "don't do this." We're just making sure that good statistical analysis is performed.
soundchazer
10-27-2006, 08:51 AM
Nobody is reprimending you LilSlugger. Theories work until someone debunks them. You just need to fine tune it.
I have an idea. Try using your statistical prowess to figure out what genres tend to have the most acceptance among the professors. Maybe that will lead to some interesting data.
LilSlugger
10-27-2006, 01:32 PM
Many thanks for the kind words.
As I'm sure you will have noticed, I'm a bit over sensitive at the moment. Probably something to do with the fact that I'm just recovering from clinical depression, so mood swings are the order of the day.
I think the criticism is right however, the unsaid assumption in the above analysis is that the allocation of Anime titles between professors is equal. i.e. either random or if their own selection with all professors tending to review the anime they like, rather than would appear to be the case with some professors reviewing more anime that they either don't like or find mediocre. The idea that the allocation is random or that all professors have bought their preffered anime, but it has just worked out that some have a 2:1 ratio of good and other 2:1 bad (or mediocre) in what is for some professors over 50 titles is statistically highly improbable. I could do the math to say how improbable but I'm not going to.
To be honest I'll call it a day on the stats front on this site and limit my comments to individual anime etc.
I still love the site though and as a good place to find independent and unbiased reviews, you're still second to none.
Thanks for the comments and feedback on my ideas be they postive or the more critical ones. Without differences of opinion all the threads would be very boring.
Take care all and speak to all again soon in a different thread.
LilSlugger :hapbounce
PS I've only posted a single post this time :p
7Raven7
10-27-2006, 07:03 PM
I also agree with the above statements and offer these suggestions.
-------------
First, where did you select your margins from? The 100%-85%-70%-0% break up is extremely irregular. You might want to keep it simple, marking a tally next to each of the AA designated tiers (I.E. 100-90, 89-80) per prof giving the grade. Simple 10% sku break-ups make a lot more sence and it is easier to defend margins that are equal. You could do a 0-100 scale or, what I would recommend, 20-100 (highest and lowest score skus, shouldn't mess up your stats). Then start with a simple mean, medium and mode for each prof.
Other than the way you broke up the score, I think the end comment of the overal scores of AA was the most reliable information to walk away from your study (kudos on doing the % tallies as well, thats a pain in the butt).
I have one reservation on your "balanced grading" ideology however. Ideally, a "fair" grader is not necessarily one who scores equally in 1 & 3. On a 0-100 scale, broken into threes, we would call this end weighted tendancy, your graph would have two humps at the end, or a tendancy to give either very high or very low grades. You see this a lot with customer comments of resteraunt waiters for example (those little comment cards). But in your sku, I am not sure what kind of weight that gives it, seeing as the below 70% has as many "tallies" as above 85%, I would say that is an exponential tendancy, or extremely lenient. Your graph would look either like when you graph x^2.
That is unless those two values are lower than the middle which would give you a central tendancy. With your score breakup, the graph would look like a slow assending line with a bubble at the end. Either way, the result of the ends cannot be calc-ed without noting the relationship to the middle. Psh, don't you remember your lectures on standard deviation and degrees of freedom?
But of your analysis of overall AA grades, seeing as they are near equal (and if the margins are equal, an equal % for all margins-WHEN NOT GRADING A HUMAN-is a balanced grade.) You could argue that ideally a perfect grader would have a bell-curve on any given subject but I disagree strongly with this school of thought.
If anything I would say this puta an emphasis on AA's fair grading (seeing as the between 70-85 bracket is near the same as the 85-100 bracket) with the stipulation that they are slightly lenient (again, because of the odd tier break up). I would go further in saying that this lenient grading trend is due to the fact that there are simply not a lot of bad anime titles being reviewed (who can blame them).
-------
Second I also agree with Mana's comment. Here's why. In all studies, usually school or work review related, the determination of strictness or leniency in a grader is ALWAYS based on a measure of the same variables. I.E. the reviewers grade the same thing. In order to compare strictness based on your method, each professor would have to review the same titles and that simply isn't going to happen.
There is another way I remember (thumbs through psych book) of how they determine this with movie critics (keeps thumbing). To keep it simple, it basically would show a prof in relation to their peers on any given title and then calc a total.
So imagine an Excel spreadsheet with all the profs across the top and all the reviewed titles along the side. You then rank of the prof in relation to whether they were higher or lower than their peers in any given box (for example, if there were three reviewers and Kain gave the highest grade he would get a 1 in his box. If there were 5 reviewers and Mana gave the second lowest, she would get a 4, etc). Of course you would put an x in any box that the prof did not review or if they were the only review. Essentially you talley the score and note the tendency of the reviewer overal(if there is one).
You can also run Chi Squared (I think this is the meathod to use) to check how reliable this score is. For a reviewer that has few reviews or few reviews in relation to their peers should have a lower value in this test, a higher number will show how reliable the data is to support the conclusion.
There are drawbacks to this method also (mainly dealing with the issue that the highest grader should not get a "1" and the fact that there is not a lot of data), and it was tweaked a bit, but that gives you an idea. We can go into more detail if anyone is still awake. :)
LilSlugger
10-28-2006, 06:31 AM
Thanks Raven
I don't really want to carry this on much further but I will provide a brief answer to some of your points.
I agree with the selections of banding being regular makes for easier defending of margins. However, normally statistical bands are selected according the count of data within them rather than an even width of the bands themselves. Imagine a table of people's height. To band by 0-10 inches, 10-20 inches etc would be silly, but for the first band to 0-60 inches and the second to be 60 - 63 would be more meaningful. Therefore bands are selected that better represent the data. The bands selected here are roughly equal in representation within AA grades. Admittedly not exectly, as as I wanted numbers that were a but rounder than having, for example, 72.3% as a band limitation. This is very basic and can be seen in even the most rudimentary statistical work involving histograms. (Not Bar Charts! Sorry a pet hate of mine when people get the two mixed up!)
Your second point relating to even grading and a tailing off towards the extremes, is extremely well made and demonstrates that you have a sound understanding of statistical distribution theory. However, when selecting bands in the way this exercise has been carried out, the bands are balanced by the underlying population data and therefore the very selection of these bands is carried out to negate the effect of the tailing off of the extremes. See previous paragraph)
Nice idea about chi squared test, but I doubt people would be interested in it or understand it. Likewise the matrices comparing reviews on titles carried out by more than 1 professor. Unfirtunatley there are just not enough multiple reviews to provide conclusions of any reliability.
I am unsure about your comment relating to an 'exponential tendency or extremely lenient'? I don't understand lenient within ths context. A scale given an exponential relationship which is within a % score would seem unlikely. Remember a percentage is less than 1 and to square it reduces the value so I assume mean x ^ 0.5 rather than x ^ 2. I just don't think I'm grasping this at all. I would however agree that you would expect to left skew within the distribution and I think I reffered to skewness in my original post. If this is point you are driving at then we are in perfect agreement. Incidentally the bubbles you see in relation to comment cards is common and has to do with survey design and tendency of people to mark in extremes when dealing with service related surveys. I doubt that this is a good example to use when looking at the AA markings.
To be honest I can't remember lectures on Standard Deviation and Degrees of Freedom as these were rather to basic a concept to be handled at degree level. I do remember covering Standard Deviation as part of an addional mathematics O level sylabus at 15 and degrees of freedom within an A level course in Statistics at 17. Possibly within none mathematical degree courses these concepts may be covered to ensure basic understanding of the statistics as a secondary tool in interpreting data relating to the primary study area.
We certainly agree on your thoughts relating to a bell grading curve or as I would prefer to decribe such a pattern, a normal distribution. I think we would both see this as being extremely unlikely and is of course again dealt with by the skewness of a distribution and balanced within my analysis by the selection of the bands.
I never at any point said that I found the AA scores overall either lenient or harsh. I was only comparing different professors using bands defined by the statistical distribution within the population. Please don't misunderstand me on this. My analysis was not commenting on the leniency or harshness of AA marks overall. Ony the relatinship between marks of various professors in relation to the population. Hence, what I described as a lenient marker was only lenient in relation to the other professors, rather than lenient being an absolute value judgement. No judgement was made in relation to the overall population of AA scores, their variation or skewness of distribution. Rather this was used as the bench mark against which the various professor grades were measured as defined by the deliberate band selection to ensure approximate balance.
For me there are two major points where my work here has been found wanting.
First of all a point which has been made in various comments, if not directly. I needed to be far more descriptive in defining methodology and should have gone into greater depths in explaining the conclusions. I had hoped by saying that the way the data was being represented in order to be more easily understood by the layman explained why I had not followed a more usual statistical approach of mean, mode, median, SD, IQ Ranges etc. I stand corrected.
Secondly, the assumption that all professors had equal chance of choosing or receivingthe Anime they like or prefer. This was a big assumption to make and I suspect it may not be true. In this case all the analysis is based upon a false assumption is irrelevant and again I stand corrected.
Please delete the thread. I want to move on to providing comments on Gunslinger Girl!!! :nbleed:
LilSlugger:lol2:
vBulletin® v3.7.3, Copyright ©2000-2008, Jelsoft Enterprises Ltd.