The admission comes from the author of the snippet itself, Andreas Lundblad, a Java developer at Palantir, and one of the highest-ranked contributors to StackOverflow, a Q&A website for programming-related topics. An academic paper [PDF] published in 2018 identified a code snippet Lundblad posted on the site as the most copied Java code taken from StackOverflow and then re-used in open source projects. The code snippet was provided as an answer to a StackOverflow question posted in September 2010. The code snippet printed byte counts (123,456,789 bytes) in a human-readable format, like 123.5 MB. Academics found that this code had been copied and embedded in more than 6,000 GitHub Java projects, more than any other StackOverflow Java snippet. In a blog post published last week, Lundblad admitted that the code was flawed and that it incorrectly converted byte counts into human-readable formats. Lundblad said he revisited the code after learning of the academic paper and its results. He looked at the code again and published a corrected version on his blog.
StackOverflow code sometimes contains security bugs
But while Lundblad’s code snippet contained a trivial conversion bug that only resulted in slightly inaccurate file size estimations, things could have been much worse. The code could have contained a security flaw, for example. If it did, then fixing all the vulnerable applications would have taken months or years, leaving users exposed to attacks. Even if it is universally understood that copy-pasting code from StackOverflow is a bad idea, developers still do it. The 2018 research paper showed just how widespread this practice was in the Java ecosystem, revealing that the vast majority of developers who copied popular StackOverflow answers didn’t even bother to credit their source. Software developers who copy code from StackOverflow without attribution are effectively hiding from fellow coders that they’ve introduced unvetted code inside a project. This might sound like an overly alarmistic statement, but a different academic research project published in October 2019 [PDF] showed that StackOverflow code snippets do contain vulnerabilities – and that this is not just an urban myth that developers use to scare each other. The research paper found major security flaws in 69 of the most popular C++ code snippets posted on StackOverflow in the past ten years. Researchers said they found these 69 vulnerable code snippets in a total of 2,859 GitHub projects, showing how one bad StackOverflow answer could wreak damage across an entire ecosystem of open source apps.