Here’s What GPT-4 Thinks Being a Journalist is All About

Charlotte Li
Generative AI in the Newsroom
8 min readMay 26, 2024

Note: This post was co-authored with Nick Diakopoulos.

As AI becomes more integrated in people’s everyday life and work, conversations around building agents to assist with tedious tasks in various workspaces, including newsrooms, have also surfaced. So-called “agentic AI” systems have been defined as those that “can pursue complex goals with limited direct supervision”. In order to be able to autonomously assist with newswork, an AI agent needs to be able to take a high-level directive, like “write me a news article about the latest social trend” and break it down into tasks that it can accomplish towards achieving that larger goal. In other words, it needs to have a sense of the work of journalism and how tasks can be decomposed so that it can plan out the work.

Image Credit: Turing Commons (https://alan-turing-institute.github.io/turing-commons/resources/gallery/)

So, just how much does a model like GPT-4 know about how journalism gets done? The depths of its knowledge of the tasks inherent to producing news should be indicative of its potential to leverage that knowledge in an agentic AI system. While research has recently explored the impact of generative AI on occupations such as entrepreneurs and data scientists, with methodologies like user-centric participatory studies and large scale quantitative approaches, little has touched specifically on the field of journalistic work, especially when it comes to how generative AI can play a role as a supportive agent.

To begin to answer this question we prompted GPT-4 for a list of journalistic tasks to test the extent of its knowledge about what journalists do. Though news tasks differ drastically depending on one’s role in the newsroom, the type of newsroom, and the subject of their reporting, GPT-4 surprised us by outputting a sensible array of news tasks. It appears to have reasonable coverage of what it means to “do” journalism, at least in terms of specific tasks entailed in news production.

Our Approach

In order to understand the quality of GPT-4’s knowledge about journalism tasks, we leveraged the O*NET Resource Center for expert-validated occupational information about journalists¹. Specifically, we looked at the News Analysts, Reporters, and Journalists occupation category, which includes both a list of tasks that are specific to this category and a ranked list of work activities that tend to be more “important”² to the occupation. These generic activities include things such as “Getting Information”, “Interpreting the Meaning of Information for Others” and “Thinking Creativity.”

With the list of work activities that are related to the journalist occupation category on O*Net, we then prompted GPT-4 to produce a list of tasks that are related to each of those activities using the following prompt.

'For the following work activity, please give one sentence each for as many tasks as possible that a journalist might do that are related to it. Please avoid providing duplicated tasks, and the level of specificity should be consistent across all tasks, Work Activity: {activity name} — {activity description}'

A few things are worth noting in this prompt. First, we only prompt once per activity while asking GPT-4 to come up with as many tasks as possible related to that activity. This is because when we set out to prompt GPT-4 multiple times for the same activity, it often repeats some tasks across several different responses, making it hard to de-duplicate the data. Additionally, we also explicitly asked GPT-4 to create non-repeating tasks in the same level of specificity, in order to obtain results that are unique and comparable to each other.

Using the prompt above to query OpenAI’s “gpt-4” model on March 29th, 2024, we obtained 285 task descriptions in total across the top 19 most important activities listed, each activity receiving 10 to 20 tasks. We then manually clustered these tasks by hand, disregarding their original activity categories. We iterate on inductively labeling and re-clustering these tasks until some larger themes emerge, resulting in a GPT journalistic tasks taxonomy.

What We Found

Through a process of iterative thematic grouping we ended up organizing the outputs from GPT-4 into 6 high level categories: gathering information, sensemaking, editing, publication and distribution, productivity, and journalism training. Each high level category is then divided into several sub categories reflecting more specific tasks. We illustrate some codes but refer the reader to the full taxonomy for all of the examples.

  • Gathering information: This category of tasks includes tasks that aim to monitor, source, and gather information for reporting purposes. Tasks under this category are either organized by the type of sources involved in information gathering (e.g. experts, scholarly sources, social media, etc.), the method used for information gathering (e.g. interviewing, field work, information requests, etc), or ways to maintain sources. For example, social media can be a type of source, for which “A key task could be monitoring social media channels for breaking news and trending topics” would belong. Methods of information gathering, on the other hand, can include tasks such as interviewing: “A journalist may conduct in-depth interviews with key individuals related to the story.” And an example of a source maintenance task produced by GPT-4 was, “They might participate in community events to establish relationships with locals and gather potential news stories.”
  • Sensemaking: This category consists of tasks that journalists do to make sense of concepts for different purposes. It includes tasks such as ideation (e.g. “A journalist might come up with a fresh angle to approach a widely reported story.”); judging the newsworthiness of an item (e.g. “A journalist may assess the newsworthiness of a press release or tip.”); analyzing contents of different source types (e.g. text or data) and domains (e.g. politics, law, health, science, etc.); conducting archival research; or making sense of multimedia content. Sensemaking tasks may thus intersect with tasks during many phases of production such as gathering, editing, and publication.
  • Editing: This category includes tasks that involve checking and editing work in progress for readability, accuracy, legality, and other journalistic editorial judgements related to standards. An example task generated by GPT-4 related to readability is, “They might judge the readability and coherence of their own piece before publishing.” An example of verification and checking for accuracy is “They might have to cross-check data from multiple sources to ensure consistency.”
  • Publication and Distribution: This category includes tasks that aim to publish and deliver content to an audience that maximizes information spread and readership. Tasks in this category represent a wide variety of ways to deliver information to readers, such as live delivery (e.g. “They may deliver information to the public through live broadcasts or social media updates.”), or publishing content in various online formats (e.g. “A journalist could send newsletters or email updates to subscribers, updating them on recent news or articles.”). This category also includes ways journalists may interact and engage with their audience in order to understand the impact of their publication and receive feedback. Of course different technologies, such as newspaper and broadcast have different relationships to publication and distribution.
  • Productivity: Tasks in this category have the goal of increasing productivity and ensuring the prompt delivery of the work that journalists do. Tasks here include organizing workspaces and time (e.g. “They may maintain a digital calendar of upcoming news-worthy events.”); managing content to be published (e.g. “They might utilize an online content management system to upload their articles, along with relevant photos or videos.”); planning for work and projects (e.g. “They may establish a plan for conducting background research around the topic of their story.”); and coordinating with other news employees (e.g. “A journalist may meet in person with their supervisor to discuss concerns or questions about a current assignment”).
  • Journalism Training: This category of tasks is related to personal and skill developments for journalists to improve their expertises as a journalist. This includes training on writing (e.g. “They could take part in training or a workshop to master a new style of writing or reporting.”); using technology (e.g. “They might learn to use a new software for data visualization to make their articles more engaging.”); or participating in training that are general to the work of journalism (e.g. “They would participate in training and development activities with colleagues for team building”).

It was surprising that among the 285 tasks output by GPT-4, writing as a task is only directly mentioned once outside of the context of journalism training. While this might be attributed to the language used to prompt GPT-4, which was adapted from O*Net and doesn’t explicitly mention writing as an activity, it might reflect a lack in GPT-4’s understanding of how journalists go about achieving each task. Another surprising aspect of this result is the mention of journalism training when such learning was not necessarily explicitly relayed to GPT-4 as an activity that journalists do.

Future Directions

This taxonomy on journalism tasks generated by GPT-4 should not be considered a definitive guide to journalistic work. But it does begin to establish the range of “understanding” the model has for the work of journalism. And in the future it may serve both as a methodological approach for exploring other models’ knowledge of journalism as well as a baseline for further investigations of the usage of AI in journalism tasks.

Aspects of the methodology can be improved in a few ways. For one, we prompted only for the work “a journalist” might do, thus the outputs we analyzed largely depended on what it associated with the word “journalist.” However, as pointed out earlier, people serve different roles in newsrooms, and their work differs a lot depending on their roles. Investigating how the outputs from GPT-4 might differ based on different newsroom roles given to it could reveal discrepancies in its understanding of more specific aspects of newswork (e.g. of a news photographer, or an audience engagement editor). Additionally, the descriptions of work activities adapted from O*Net seem to influence the outputs of GPT-4 a lot, especially in the action words used to describe tasks. Changing the wording of the activities in the prompts might change the outputs in ways we have not explored yet and could shed light on different prompting strategies when adapting language models into practice.

One of the most significant improvements to these results, however, lie in practitioner-centered validation and adaptation of this taxonomy. We are interested in exploring whether the level of specificity and the span of coverage of this taxonomy make sense to practitioners of newswork. Given these tasks, we are also interested in exploring the current strategies for and attitudes towards completing these tasks, the compatibility for adopting AI-assistance to each of these tasks, and the criteria for evaluating successful performance on these tasks.

This leads us to conclude this blog with an invitation for you to participate in this discussion of AI-assisted journalistic work. Does the typology reflect what you think of as newswork? What do you think may be missing? How would you like to use AI in your work?

See or download our data in spreadsheet format!

[1]: O*Net claims to be “the nation’s primary source of occupational information,” and we further investigated the methodologies O*Net uses to generate and update their occupation-specific tasks to ensure the validity of these task descriptions. According to a report on the instruments for O*Net data collection: “Occupation Specific Requirements are not measured using questionnaires, but are measured by job analyst observations, job holder and supervisor descriptions elicited in group discussions, and task inventories.” A later report detailed that “since many occupations change over time (e.g., due to new technology), job incumbents are given the opportunity to write in information about important tasks they believe are excluded from the current task list for their occupation.” Most recently, researchers continued to improve occupational task descriptions using internet search methods (2011) and automated evaluation of write-in task descriptions through natural language processing (2021).

[2]: Importance is defined here as a score of greater than 3 out of 5 on the O*NET rating scale of 1–5 of importance, see here: https://www.onetonline.org/help/online/scales

--

--