That doesn't mean that learning scientific language isn't difficult, of course. I've made the point more than once that the woo-woo misuse of terminology springs from basic intellectual laziness. The problem is, though, that because the language itself requires hard work to learn, the use of scientific vocabulary and academic syntax can cross the line from being precise and clear into deliberate obscurantism, a Freemason-like Guarding of the Secret Rituals. There is a significant incentive, it seems, to use scientific jargon as obfuscation, to prevent the uninitiated from understanding what is going on.
[image courtesy of the Wikimedia Commons]
The scientific world just got a demonstration of that unfortunate tendency with the announcement yesterday that 120 academic papers have been withdrawn by publishers, after computer scientist Cyril Labbé of Joseph Fourier University (Grenoble, France) demonstrated that they hadn't, in fact, been written by the people listed on the author line...
... they were, in fact, computer-generated gibberish.
Labbé developed software that was specifically written to detect papers produced by SciGen, a random academic paper generator produced by some waggish types at MIT. The creators of SciGen set out to prove that meaningless jargon strings would still make it into publication -- and succeeded beyond their wildest dreams. “I wasn’t aware of the scale of the problem, but I knew it definitely happens. We do get occasional emails from good citizens letting us know where SciGen papers show up,” says Jeremy Stribling, who co-wrote SciGen when he was at MIT.
The result has left a lot of folks in the academic world red-faced. Monika Stickel, director of corporate communications at IEEE, a major publisher of academic papers, said that the publisher "took immediate action to remove the papers" and has "refined our processes to prevent papers not meeting our standards from being published in the future."
More troubling, of course, is how they got past the publishers in the first place, because I think this goes deeper than substandard (worthless, actually) papers slipping by careless readers. Myself, I have to wonder if anyone can actually read some of the technical papers that are currently out there, and understand them well enough to determine if they make sense or not. Now, up front I have to say that despite my scientific background, I am a generalist through and through (some would say "dilettante," to which I say: guilty as charged, your honor). I can usually read papers on population genetics and cladistics with a decent level of understanding; but even papers in the seemingly-related field of molecular genetics zoom past me so fast they barely ruffle my hair.
Are we approaching an era when scientists are becoming so specialized, and so sunk in jargon, that their likelihood of reaching anyone who is not a specialist in exactly the same field is nearly zero?
It would be sad if this were so, but I fear that it is. Take a look, for example, at the following little quiz I've put together for your enjoyment. Below are eight quotes, of which some are from legitimate academic journals, and some were generated using SciGen. See if you can determine which are which.
- On the other hand, DNS might not be the panacea that cyberinformaticians expected. Though conventional wisdom states that this quandary is mostly surmounted by the construction of the Turing machine that would allow for further study into the location-identity split, we believe that a different solution is necessary.
- Based on ISD empirical literature, is suggested that structures like ISDM might be invoked in the ISD context by stakeholders in learning or knowledge acquisition, conflict, negotiation, communication, influence, control, coordination, and persuasion. Although the structuration perspective does not insist on the content or properties of ISDM like the previous strand of research, it provides the view of ISDM as a means of change.
- McKeown uses intersecting multiple hierarchies in the domain knowledge base to represent the different perspectives a user might have. This partitioning of the knowledge base allows the system to distinguish between different types of information that support a particular fact. When selecting what to say the system can choose information that supports the point the system is trying to make, and that agrees with the perspective of the user.
- For starters, we use pervasive epistemologies to verify that consistent hashing and RAID can interfere to realize this objective. On a similar note, we argue that though linked lists and XML are often incompatible, the acclaimed relational algorithm for the visualization of the Internet by Kristen Nygaard et al. follows a Zipf-like distribution.
- Interaction machines are models of computation that extend TMs with interaction to capture the behavior of concurrent systems, promising to bridge the fields of computation theory and concurrency theory.
- Unlike previous published work that covered each area individually (antenna-array design, signal processing, and communications algorithms and network throughput) for smart antennas, this paper presents a comprehensive effort on smart antennas that examines and integrates antenna-array design, the development of signal processing algorithms (for angle of arrival estimation and adaptive beamforming), strategies for combating fading, and the impact on the network throughput.
- The roadmap of the paper is as follows. We motivate the need for the location-identity split. Continuing with this rationale, we place our work in context with the existing work in this area. Third, to address this obstacle, we confirm that despite the fact that architecture can be made interposable, stable, and autonomous, symmetric encryption and access points are continuously incompatible.
- Lastly, we discuss experiments (1) and (4) enumerated above. Error bars have been elided, since most of our data points fell outside of 36 standard deviations from observed means. On a similar note, note that active networks have more jagged seek time curves than do autogenerated neural networks.
#2: Daniela Mihailescu and Marius Mihailescu, "Exploring the Nature of Information Systems Development Methodology: A Synthesized View Based on a Literature Review," Journal of Service Science and Management, June 2010.
#3: Robert Kass and Tom Finin, "Modeling the User in Natural Language Systems," Computational Linguistics, September 1988.
#5: Dina Goldin and Peter Wegner, "The Interactive Nature of Computing: Refuting the Strong Church-Turing Thesis," Kluvier Academic Publications, May 2007.
#6: Salvatore Bellofiore et al., "Smart Antenna System Analysis, Integration, and Performance on Mobile Ad-Hoc Networks (MANETs)," IEEE Transactions on Antennas and Propagation, May 2002.
How'd you do? If you're like most of us, I suspect that telling them apart was guesswork at best.
Now, to reiterate; it's not that I'm saying that scientific terminology per se is detrimental to understanding. As I say to my students, having a uniform, standard, and precise vocabulary is critical. Put a different way, we all have to speak the same language. But this doesn't excuse murky writing and convoluted syntax, which often seem to me to be there as much to keep non-scientists from figuring out what the hell the author is trying to say as it is to provide rigor.
And the Labbé study illustrates pretty clearly that it is not just a stumbling block for relative laypeople like myself. That 120 computer-generated SciGen papers slipped past the eyes of the scientists themselves points to a more pervasive, and troubling, problem.
Maybe it's time to revisit the topic of academic writing, from the standpoint of seeing that it accomplishes what it originally was intended to accomplish; informing, teaching, enhancing knowledge and understanding. Not, as it seems to have become these days, simply being a means of creating a coded message that is so well encrypted that sometimes not even the members of the Inner Circle can elucidate its meaning.