AI Slop Is Flooding Academic Journals. A Top Journal Measured It

The term “AI slop” usually describes the low-quality images and text that clutter social media feeds. It has now reached the peer review system that governs what counts as scientific knowledge.

Submissions to Organization Science , one of the top journals in management research, have risen 42% since the launch of ChatGPT. The writing has gotten worse.

A new study from the journal’s editorial team finds that AI generated manuscripts are harder to read, more jargon laden and more likely to be rejected than those written by humans. Meanwhile, over 30% of the expert peer reviews that journals routinely use to decide what to publish now show detectable AI use, and editors report that those reviews are essentially uninformative. The technology that was supposed to make science more productive is, for the moment, making it harder to evaluate. Undoubtedly, this is a snapshot of a system in transition, not a permanent verdict on AI in research. But the snapshot is worth a close look.

I have been writing about the structural pressures on peer review, and about the risk that AI tools could degrade the quality of scientific thinking even as they accelerate output. This paper, from the journal’s AI Task Force led by Sharique Hasan of Duke, is the first to put detailed data behind those concerns at a single journal.

When I spoke with Claudine Gartenberg, a senior editor on the team and a professor at Wharton, we ended up talking for nearly an hour. She is not an AI skeptic. “I use Claude Code and Codex all day long,” she told me. “Every aspect of my research program over the last year.” But she and her coauthors had been noticing something in the manuscripts crossing their desks, a qualitative shift that was hard to name. “We didn’t come with a point to make,” she said. “We just said, let’s put some facts to this feeling.” They analyzed 6,957 submissions and 10,389 reviews since January 2021, scoring each for AI content using Pangram, which independent evaluation has rated as the most accurate AI detection tool currently available.

Since the release of ChatGPT in late 2022, submissions to Organization Science have risen 42%, roughly double the bump the journal saw during COVID. The editors attribute nearly all of that increase to AI. Submissions judged to be human-only actually declined. By early 2026, the majority of manuscripts submitted to the journal show some degree of AI involvement, and the fastest-growing category is papers scored at 70% or higher AI content.

The quality of that writing has deteriorated. AI scores and Flesch Reading Ease were negatively correlated across submissions. Manuscripts with high AI content require a higher grade level to parse, use more nominalizations (words like “conceptualization” and “operationalization”) and carry more jargon. Interestingly, there are a few dimensions where AI text performed better: it tended to be more specific and less hedging. But the net effect is prose that is denser and more difficult to read. Gartenberg invoked George Orwell’s essay “Politics and the English Language,” with its examples of politicians burying meaning in abstraction. AI prose, she said, reads like those politicians: dense, vaguely impressive, hard to follow.

The paper’s title frames the problem: “More Versus Better.” “AI, as it’s being used today, is colliding with institutional incentives to create more rather than better research,” Gartenberg said. “It’s not AI on its own. It’s AI plus publish-or-perish incentives.”

The paper includes a finding that should worry anyone who thinks this is just about a few bad actors with ChatGPT. Business schools whose faculty historically respond most to publication count rankings disproportionately increased their AI submissions after ChatGPT became available. The ranking in question is the UTD list , maintained by the University of Texas at Dallas, which scores business schools by how many papers their faculty publish in 24 designated top journals. It is one of the most widely watched metrics in business academia, and scholars at schools that compete on it have strong incentives to maximize quantity.

The effect is statistically significant: schools that compete on the UTD rankings submitted more papers after ChatGPT, and those additional papers were disproportionately AI written. This suggests that heavy AI use in manuscripts is not random, but tracks institutional incentives. Authors at schools where counting publications matters most are the ones leaning hardest on the machines.

Gartenberg described what she sees as her central insight from the project: AI is an agnostic tool. You can point it at quality or you can point it at volume. “There’s such powerful volume incentives right now,” she said, “that it can really be destructive.”

AI is not confined to the submission pipeline. Over 30% of peer reviews at Organization Science now show detectable AI use, up from near zero before ChatGPT. Those reviews follow the same pattern as submissions: harder to read, more nominalization, more jargon. They also shift emphasis toward theory and away from data and empirical methods, a narrowing of the evaluative range that, if it persists, could reshape what kind of science gets rewarded.

The ethics of AI in peer review are unstable. If an expert uses AI to inform an opinion, is that opinion still the expert’s? Unpublished manuscripts are shared with reviewers in confidence. Uploading them to the servers of a chatbot is generally viewed as unacceptable, a disclosure to an unauthorized reader, even if that reader is a machine. But does the calculus change if the manuscript is already public on a preprint server? If the AI runs in a sandboxed environment that retains nothing? If the reviewer poses a question that never explicitly shares the text? These distinctions matter, and none of them are settled.

The most telling finding is that at Organization Science AI reviews do not appear to inform editorial decisions at all. Human reviews correlate with editorial outcomes. AI reviews do not. “It’s not like the editors know that those are AI reviews and they’re throwing them out,” Gartenberg said. “They’re reading them and they’re not informing the editor’s ultimate recommendation.” Editors are substituting their own judgment, which means the review, the core mechanism of quality control in science, is producing text that nobody acts on.

Holding the Line, for Now

The good news is that the editorial process at Organization Science is still filtering effectively. Only 3.2% of manuscripts scored at 70% or higher usage of AI receive a revise and resubmit, compared with 11.9% for low AI papers. Published articles remain overwhelmingly human generated. The editors are catching the bad work.

There is a significant human cost, however. The journal doubled its deputy editors from six to eleven and nearly doubled its senior editors from roughly 30 to 60. All of this is volunteer labor, unpaid academics donating time to maintain scientific quality. When those academics are weeding out AI slop they are not using their time to teach classes, conduct research, or serve their professions.

The economist Scott Cunningham has framed scientific output as a production function with two inputs: human time and machine time. A little machine time, combined with substantial human time, raises the quality of the output. But if researchers let the machine substitute for their own engagement, they enter what Cunningham calls the “danger zone,” a region where output quality actually falls below what they would have produced without AI at all. The mechanism is simple: human time is not just labor. It is the process through which attention accumulates into knowledge and judgment. Skip the hours and you skip the learning.

Gartenberg’s data suggest that danger zone is not hypothetical. It is already visible in the submission statistics of a major journal. “People think as they write,” she told me, “and so if you don’t write, you’re not thinking as deeply about it.” The researchers submitting manuscripts scored at 70% or higher AI content have, in Cunningham’s framework, crossed that threshold. They have traded thinking for output. The editors can tell.

A Snapshot, Not a Verdict

These findings deserve context. The data run through early 2026, but much of the AI writing the team detected was produced with earlier models, ChatGPT 3.5 and GPT-4, which had well known stylistic tells and a tendency toward bloated, nominal prose. The tools are improving rapidly. There is nothing fundamental preventing a language model from being trained or prompted to write at a target reading level, to minimize jargon, to pass the same readability tests that human editors use. The quality gap this paper documents may be substantially a function of how crudely most researchers are currently deploying the tools. There is both the maturity of the tools themselves and the maturity of the people using them. Both are evolving.

It is also worth noting what the paper does not show. Organization Science is not losing its best work. The journal’s top papers are still getting through, and its overall rejection rate is essentially unchanged across AI categories. The additional submissions are mostly mediocre, and the editors are filtering them out. One way to read the data is that the net effect on knowledge is still positive: all the good science that was being produced before, plus some fraction of new work that, while not field changing, records facts and findings that may prove useful to someone down the line. Science has always generated a long tail of incremental work alongside the breakthroughs. If the cost of producing that tail falls, and the editorial process can still separate signal from noise, the knowledge base may grow even if the average quality of submissions declines.

There is also a more speculative possibility. If AI is increasing variance in submission quality while the editorial process trims the lower tail, the best papers in the pipeline could actually be better than before. Researchers who use AI well, as a thinking partner rather than a ghostwriter, might be producing more ambitious work than they could have managed alone. The data cannot confirm that yet. But they are consistent with it.

At one point in our conversation, Gartenberg drew an analogy to chess. AI can beat any human player, yet chess is more popular now than it has ever been. The question she keeps turning over: what becomes the goal of science when AI can produce the outputs? When I spoke to Jeff Clune , senior author of a recent Nature paper on end-to-end automation of research , he made a similar observation about rock climbing. Alex Honnold can climb El Capitan faster and better than Clune ever will. That has not made Clune give up climbing. Mountains and chess and science are not entertainment. They are things that provide people with meaning. Science may be approaching an existential moment, one where the purpose of the work matters more than the products of it.

Where AI Could Actually Help

The irony of the current situation is that the same technology creating problems on the submission side could be useful on the editorial side. The bottleneck in academic publishing is not production. It is evaluation. Journals are drowning in manuscripts and struggling to find reviewers willing to read them. AI is well suited to exactly the kind of structured assessment that could relieve that pressure.

Consider what a journal might do with AI in the editorial pipeline. Before any human reads a submission, an automated screen checks reading ease, jargon density and sentence complexity. Papers below a threshold get returned to authors with specific feedback. That alone would filter out a substantial share of the low quality submissions currently consuming volunteer editors’ time.

AI might also assess dimensions that human editors evaluate intuitively but inconsistently: whether a paper’s claims are supported by its methods, whether the literature review engages the relevant prior work, whether the statistical approach matches the research design. None of these assessments would need to be definitive. They would need to be informative enough to help editors allocate their attention to the manuscripts that merit it. The human stays in the loop. The machine handles triage. On the review side, where Gartenberg’s data show AI shifting attention toward theory and away from data, a well designed review assistant could do the opposite: prompting reviewers to engage with specific empirical claims, flagging inconsistencies between methods and results, scaffolding the review rather than replacing it.

To my knowledge, the technology is not mature enough to deploy reliably at scale. Getting the implementation wrong could introduce new problems. But the binding constraint in science is shifting from production to evaluation, and AI is the most plausible tool for addressing it.

Gartenberg herself uses Claude and Codex in her own research. She is not arguing that AI should be kept out of science. Her paper is a measurement of where things stand today, not a prediction of where they will end up. As a journal editor myself, I recognize everything in it: the rising submissions, the declining reviewer engagement, the growing editorial burden. The system is holding. The open question is whether the tools that are currently straining it can be repurposed to strengthen it.

AI Slop Is Flooding Academic Journals. A Top Journal Measured It

Holding the Line, for Now

A Snapshot, Not a Verdict

Where AI Could Actually Help

Read Next