Recently, I published a blog title which I titled, The State of JS Survey Is A Farce in which I expressed criticism that the State of JS survey is highly inaccurate, biased and dangerous.
@AbolitionOf calling the State of JS a “farce” was pretty unkind. I hope you get better treatment if you ever launch your own projects
Admittedly, this Tweet took me by surprise. When I wrote the post, I couldn’t have told you if you asked me who runs the survey. And my intention wasn’t to put down someone else’s work, it was to call out what I saw was bias in a survey growing in popularity.
I was critical, but I never resorted to personal attacks or name-calling. It was strictly criticism and valid criticism (or so I thought). As someone who actively participates in open source myself, I know all too well what unconstructive criticism looks like, but this wasn’t one of those times (at least, not intentionally).
I responded to Sasha on Twitter with the following:
Sorry, you took it personally, Sasha. It was never personal and I apologise if you think otherwise. I just have a problem with biased data being used to turn front-end development into a schoolyard popularity contest by declaring winners and losers.
I apologised and clarified that my post wasn’t personal, it was a criticism of the survey itself and the fact it was trying to turn front-end development into a popularity contest. Sasha didn’t like my response and blocked me without responding.
A few hours later, Sasha unblocked me and sends me a few responses, one of which was the following:
Well in any case I can’t wait for part two of your post where you actually explain why you think the data is biased
I can be pretty blunt, sometimes brutally honest, but one thing I would never do is personally attack someone and their projects for no reason. I have no reason to pick fights or put down others online, I am not a bully, I am a developer too.
My blog post was only criticism of the survey and the data, the data of 20,000 participants, not the people collecting and sorting the data. It’s like blaming the outcome of an election on the people counting the ballot papers.
I can understand that maybe Sasha and his team are proud of the survey which explains why I was met with such hostility, but honestly as I said in my previous blog post, it’s a good idea, it just needs better data.
I thought Sasha’s comment about a follow-up where I explain why I think the data is biased was fair, so here is the follow-up where I will do my best to explain why the data is biased and how it can be fixed.
At a glance: how does data become biased?
Before we proceed, I am not a statistics expert nor do I have professional experience in this field. However, just because this isn’t my realm of expertise doesn’t mean I am unqualified, because the bias is as clear as day in this survey.
Bias in data can come from a lot of things, but in the case of the State of JS survey, in particular, I believe it comes down to:
- Survey questions that have been worded in a particular way to get a specific/inaccurate result result
- The data is heavily skewed towards specific countries and excludes a wide variety of demographics, particularly non-English speakers
- Data has been grouped into misleading categories
- The team behind the survey mostly all use ReactJS and have a vested interest in its success and market position
Let’s go from the top here. While participants in the survey came from a wide variety of countries, there is some obvious bias here, most of the survey participants came from the USA.
What American developers get to use, is widely different than what developers in say India or South America get to use. One of the fastest growing economies in the world China only had 75 participants and India had 521 participants.
I worked for a company in 2014 that was building a Netflix type streaming video platform for the South American market. We were constrained by needing to support IE8 and AngularJS 1.3 dropped support for IE8, so we were forced to stay on the version prior. This meant we couldn’t use the latest and greatest, internet speeds were also slower and devices had lower specs.
Living in a first-world country, developers are spoiled for choice. Some of us only have to support IE11 minimum now, some of us don’t have to support IE at all. It’s easy to forget the entire world isn’t living in the future or has the latest technology like countries such as the USA is fortunate to have.
Region limitations aside, a huge piece of bias in the survey is that it is only available in one language: English. The lack of translation for other languages such as; Mandarin, Spanish, Arabic is a huge barrier for participants considering Mandarin is the worlds most popular language and English is the third.
As you will see further down, the exclusion of certain countries (due to only being in English) yields interesting results from underrepresented countries.
Translate the survey into more languages. The survey excludes a very large portion of the world population by only being available in English.
Marketing and Reach: Selection bias
The survey is predominately marketed on Reddit, Twitter, Hacker News and Product Hunt. If you participated in surveys from previous years, you probably got an email. From the outset (because I don’t have the figures), it appears most of the traffic seems to come from social media.
There is a huge problem here: countries like China are more strict in terms of what their citizens can see and do on the internet, social media is notoriously locked down in China. In fact, Twitter, Google, and Reddit are all banned in China.
This explains why China only had 75 participants, chances are you if you live in China you don’t even know this survey exists. If you don’t speak English, you also probably never heard of the survey or did and could not participate.
Don’t assume that everyone uses social media or can access it. Also, don’t assume that all developers visit Hacker News or other websites. This is a harder problem to crack, but one that maybe partnering with a larger company can solve (such as Google or StackOverflow). The reach and accessibility of the survey needs to be improved.
Angular v AngularJS (miscategorised and slanted questioning )
Unlike previous years (2016 and 2017), the 2018 survey when it came to questions about Angular really shit the bed (so-to-speak) in how it polled developers.
Angular is the newer version (2+) and AngularJS is the older version (< 2). Previous years made the distinction between old Angular and new Angular, however in 2018, the distinction was not made and it essentially invalidated this entire portion of the survey.
While the newer version of Angular is the recommended choice for new projects, not everyone has the luxury of throwing out what they have and starting from scratch (because it can be expensive for starters).
The survey appears to have erroneously made the assumption that AngularJS has been deprecated and abandoned by Google, when AngularJS 1.7 has a long term support (LTS) period of three years that only began July 1, 2018, and expires in 2021.
A lot of companies are still using AngularJS because their applications work and understand the importance of the wise proverb, “If it ain’t broke, don’t fix it.” comes into play here.
This appears to have caused confusion in the survey data. While some can discern the difference between Angular and AngularJS when presented with both options, when presented with just one, it appears they’re both being lumped together and this skews the data.
Why not make the distinction like the previous years? The complete analysis is worthless because of this. Of course a large number of people wouldn’t use AngularJS again, but that’s not necessarily the case for Angular. If you can’t make a non-biased analysis, don’t do it
In a further reply, Olivier goes on to say:
It’s just basic statistics: don’t compare things if you changed the referential between each data point. Being aware of it is even worse you’re admitting that the data is wrong and yet in the final conclusion about frameworks you say that it won’t be a top-end framework ever again
Once again, we have someone else calling out the bias (albeit a specific part of the survey) and one of the creators of the survey downplaying its significance like it doesn’t matter. This kind of thinking is dangerous and it’s wrong.
The most telling sign of exclusion bias is shown in the section, Angular Usage by Country. The happiest Angular users are in the most underrepresented countries.
Romania at 58 users makes up 37.9% of the happy camp of Angular users. Egypt at 17 users makes up 35.4% of happy Angular users. New Zealand at 39 users equates to 26.7% of happy Angular users.
Where is this going you ask? Go back to the Participation by Country section and count how many participations from those countries there were in the survey overall.
Romania which had the highest percentage of happy Angular users made up just 0.76% of the survey with a total of 153 participants. This gives us a total of 36.64% of Romanian participants are using Angular and are happy with it.
Now Egypt, only 48 users participated in the survey making a tiny 0.24% of the overall participant count. Now, interestingly the second highest count of happy Angular users above at 17 makes 35.41% of happy Angular users.
Finally, New Zealand had a total of 146 participants and makes up 0.72% of the survey. New Zealand fairs slightly lower, but out of all participants, 26.71% are happy Angular users.
I know large New Zealand companies such as TradeMe.co.nz are big Angular users amongst other New Zealand companies who use Angular. It seems to be used a bit over there, which for a small country is quite impressive.
There are a lot more underrepresented countries who are using Angular and quite happy with it. I only picked a couple of them, but I recommend you go check out the data yourself.
But this seems to somewhat align with the StackOverflow developer survey results for 2018. Even though, StackOverflow targets a more broad audience and has a larger number of participants, we see developers still love working with Angular and are clearly using it (54.6%).
Questions about Angular and AngularJS should be separate until after the LTS for AngularJS 1.7 ends in 2021 at the very least. The data is also skewed because the participants who were the happiest with Angular were among the least represented in the survey, increasing representation would help address this.
The team behind the survey
For the record, I think this is worth including, but it’s not the primary factor here for why I believe the data in the survey is heavily biased. All three people behind the State of JS survey work with React and so, naturally, anyone who follows them and what they’re working on probably falls into the React camp.
One of the people behind the survey and the one who called me out on Twitter over the previous blog post Sasha Greif actually seems to run an Open source self-described full-stack React+GraphQL framework.
One of the other State of JS members is Raphaël Benitte who has a dashboard tool built with Node, React and D3 called Mozaïk as well as another project os DataViz components built using D3 and React.
Finally, Michael Rambeau runs a site called bestofjs, which seems dominated heavily by React content. On the left-hand side under the popular tags, React has 189 tagged articles and Vue has 50.
The very fact that two of the three owners of the State of JS survey are heavily invested into React introduces bias because of their followers most likely leaning into React as well, and the only solution here is to introduce more data into the survey so this eventually this is not an issue anymore.
My initial blog post was not personal, and it was not intended to be an attack on Sasha or anyone who runs the State of JS survey.
Reiterating what I already said in my previous blog post, there is bias in the data and there is no doubt about that. I invite all criticism and feedback, so if I made a mistake or assumption in this post, please let me know so I can correct it.
If the team behind the survey simply acknowledged some of these biases when presenting the results, I would not have published my blog post in the first place.
When you take tainted data and you use it to besmirch the name and reputation of frameworks, libraries and tools and tell people to avoid using frameworks like Ember and that Angular is dying, that kind of schoolyard popularity contest bullshit is not needed in an already heavily politicised industry.
I think the State of JS survey is great and it’s the first of its kind, but the data needs to be more random and widespread. The language being used also needs to be less about “us vs them” or “avoid using this” and instead just focusing on displaying the data for what it is and let people draw their own conclusions.
I hope in 2019 we see a more representative and less exclusionary survey that yields more truthful results than what we were given in 2018. I want to see this survey succeed.