I’ve been experimenting with OpenAI’s Deep Research feature, based on this tweet from BuccoCapital Bloke.
My initial impressions:
- I was blown away
- How you use it is different to how you use ‘traditional’ language models
- It helps a lot if you have a good sense of what you want to get out before you start
- I’m increasingly convinced that developing a sense of taste around using models is only going to get more important
Here is the process I went through, based on the tweet above.
Step 1: Initial prompt
Use ChatGPT o1 (the reasoning model) to build a prompt for Deep Research to do deep research on your topic of interest.
Here’s the prompt I used for this example:
Build a prompt for Deep Research to do deep research on the likely impact of student use of generative AI on typical assessment practices in the UK higher education system. I want the final report to provide a framework for how we should respond. I don’t want you to hedge; commit to a likely outcome and develop what you think is the ‘best’ response. Don’t worry about the feasibility of implementing this response in the current system; assume the system can change to meet the report recommendations. The final report should be aimed at an academic audience, focusing on peer-reviewed articles, although you can use technical reports and even high quality blog posts. Use APA style for your citations. I want the tone of the final report to be professional but conversational. I’m less concerned about the length of the report, and more with ensuring it is comprehensive and deep. Ask clarifying questions before you begin.
Step 2: Answer clarifying questions
o1 then asked me the following clarifying questions, which I spent some time answering.
- Scope of coverage: Should the report address the impact of generative AI on all forms of assessment (e.g. essays, exams, presentations, creative projects), or do you want particular emphasis on certain types (such as written coursework)?
- Timeframe: Are we focusing on near-term impacts (e.g. within the next few years), or taking a broader, longer-term perspective (5–10 years or more)?
- Nature of generative AI: Should we concentrate specifically on large language models (e.g. ChatGPT) and text generation, or would you like to include other generative AI tools (e.g. image- or code-generating tools) as well?
- Depth of literature: Do you envision the final report engaging with meta-analyses and systematic reviews, or should it also incorporate conceptual/theoretical articles where empirical data may be limited?
- Framework focus: Should the framework address policy-making and institutional strategy, as well as pedagogical approaches (e.g. how faculty could adapt course design and assessment methods)?
- Use of examples or case studies: Do you want real or hypothetical examples of how institutions might adapt assessment practices in light of generative AI?
- Anticipated audience: Beyond academic staff, are there other stakeholders (such as university leadership, quality assurance bodies, professional accreditation organisations) whose perspectives or concerns we should factor in?
- Expectations around bias or neutrality: Are there any specific angles or biases you would like the report to avoid or highlight (e.g. ethical concerns, academic integrity, digital divide)?
- Length vs. comprehensiveness: You mentioned you’re not too concerned about length but do want comprehensiveness. Is there a rough maximum word count you would like to keep in mind, or should I simply prioritise depth?
o1 thinking about how it’s going to put together the prompt I asked for.

Step 3: Give the prompt to Deep Research
Here is the prompt provided by o1, which you can copy and paste into a text file (you don’t have to but I prefer to keep the context window neat):
Title: Likely impact of student use of generative AI on typical assessment practices in the UK higher education system
Objective
Produce a comprehensive, in-depth report that critically examines the near-future impact (1–2 years) of generative AI tools on UK higher education assessment. The report should put forward a clear standpoint on the most likely outcomes, offer robust recommendations for how institutions and stakeholders should respond, and explore how these responses might be implemented—assuming the system can adapt to accommodate them. The tone should be professional yet conversational, and the report should be aimed primarily at an academic audience, with full references in APA style.
Scope and focus
- Assessment types: Investigate a wide range of assessment formats, including but not limited to written essays, exams, presentations, group work, problem-based assessments, creative projects, and dissertations.
- Generative AI tools: Cover the full spectrum of generative AI applications (text, images, code, audiovisual content).
- Timeframe: Prioritise imminent and short-term impacts (1–2 years). Outline what needs to change now and in the near future to address these impacts effectively.
- Types of sources:
- Draw upon peer-reviewed scholarly articles, published within academic journals.
- Include conceptual and theoretical pieces, given that the topic is relatively new and empirical evidence may be limited.
- Use reputable technical reports and high-quality professional or institutional blog posts where appropriate.
- Stakeholders: Consider the ramifications for:
- Individual instructors and module leaders
- Programme leads
- Heads of schools, colleges, and institutions
- University leadership teams
- The wider higher education sector (including regulatory and quality assurance bodies)
- Public perception and employer evaluation of accredited degrees
- Ethics and perceptions:
- Acknowledge moral panic concerns and discuss ethical considerations, academic integrity issues, equity, and access.
- Explore counter-arguments and highlight the potential of AI to enhance learning and assessment.
- Real examples: Wherever possible, include real-world instances of universities or programmes that have begun to adjust their practices in response to generative AI.
- Conclusions and recommendations: Provide a definitive stance on the likely outcome of widespread student use of generative AI in assessment, and develop a ‘best’ institutional and sector-wide response strategy.
- Report structure:
- Table of Contents
- Executive Summary: Offer a concise overview of the analysis, highlighting key findings and recommendations.
- Introduction
- Methods (outline how literature was identified and evaluated)
- Main body: Use headings, subheadings, and bulleted lists where appropriate.
- Short-term implications for teaching staff, institutional leadership, and broader public/employer perceptions
- Discussion of pedagogical, institutional, and policy-level changes
- Ethical and integrity considerations
- Potential benefits and opportunities arising from AI use
- Conclusion: Summarise critical insights and commit to a likely outcome. Propose the most effective framework for responding, assuming feasible sector-wide adoption.
- References: Follow APA style consistently.
- Figures or graphs (where relevant)
- Style:
- Professional but conversational tone.
- Use British spelling.
- Include strong, direct statements and avoid excessive hedging.
- Emphasise depth and thoroughness over length restrictions.
Key deliverables
- A comprehensively researched and well-organised final report, with logically structured sections, headings, and subheadings using sentence case.
- An executive summary that clearly outlines the most critical information from the research.
- Conclusions and recommendations that decisively propose the ‘best’ institutional and sector-wide response to the challenges and opportunities posed by generative AI.
- Full references in APA style, drawing on relevant, high-quality academic sources, as well as significant technical reports and select professional blog posts where suitable.
Step 4: Use this prompt to generate the report
Make sure you turn on the Deep Research function.
Attach the prompt template (above) to a new o1 chat and provide additional detail in the context window.
I used the following prompt in this example:
I’m an Associate Professor of Digital Innovation in a UK-based university. I’m interested in the impact of AI on typical assessment tasks, mostly in undergraduate health professions programmes. I’ve attached a template I’d like you to use to inform your report. My position is that higher education institutions can take a more positive and proactive position on the use of AI, not only in support of learning, but as part of students’ assessment tasks as well. I believe that there are 3 paths open to us: 1) grade inflation to the point where grades become meaningless (we are already starting to see this), 2) a shift to curve grading, which we cannot do in standards-based assessments (which is what most regulated health professions programmes use), and 3) raise the standards. Of these 3 paths, I believe that the third is the only feasible option. I want the report to explore option 3 as a reasonable approach to addressing the impact of student use of AI on assessment. While I don’t want you to ignore the other 2, you should explore them as alternatives, and explain why they might work. But I believe that the “least worst” option (and possibly even the best option outright), is path number 3. Ask any clarifying questions.
Deep Research then asked me the following questions, which I answered briefly:
- Depth of Exploration for Alternatives (Paths 1 & 2): While you want the report to primarily focus on raising standards (Path 3), how extensively should I explore the other two options? Should I provide in-depth discussion, or simply acknowledge them and explain why they are less viable?
- Empirical vs. Theoretical Sources: Since empirical studies on AI’s impact on assessment may be limited, would you like me to prioritise theoretical/conceptual analyses, institutional reports, and expert opinions?
- Regulatory and Accreditation Considerations: Since health professions education is subject to stringent standards, should the report include perspectives from professional bodies or regulators on how AI might affect accreditation and competency-based assessments?
- Case Studies & Real-World Examples: Would you prefer case studies from within UK higher education only, or would examples from international institutions also be valuable?
- Specific Stakeholder Focus: Are there any particular stakeholder groups (e.g., programme leads, regulatory bodies, employers) that you want to be given more attention in the discussion?
o1 Deep Research then generated a 32-page report, using 30+ appropriate sources (which I haven’t checked for accuracy), that you can download here.
Note: see the images below for transcripts of o1 reasoning through the process.






Summary of the report
Here is Claude’s summary of the report (make of that what you will):
The report examines how generative AI tools like ChatGPT impact assessment practices in undergraduate health professions education (HPE). It analyzes three potential institutional responses to students’ AI use and argues that proactively raising assessment standards is the most sustainable approach.
Key points from the report:
- Three possible institutional responses to AI use:
- Grade inflation: Allowing unchecked AI use risks academic standards and public trust
- Curve grading: Found incompatible with health professions education, which relies on fixed competency standards rather than normative ranking
- Raising standards: The recommended approach – redesigning assessments to demand higher-order thinking beyond AI’s capabilities
- Theoretical perspectives: Drawing on cognitive science, the report notes that AI can reduce cognitive load for trivial tasks but risks hindering deeper skill development if overused. Students might develop an “illusion of competence” from AI-assisted high grades.
- Accreditation challenges: Regulatory bodies need to ensure graduates still meet competency standards in an era where AI can assist or distort performance. The report recommends possibly requiring more authentic assessments to verify competence.
- International case studies: The report examines various institutional approaches, from outright AI bans (generally ineffective) to integrating AI literacy into curricula. Early findings suggest assessment redesign is more effective than detection tools alone.
- Stakeholder perspectives: The report considers viewpoints from instructors, university leadership, accreditation bodies, employers, and the public – all of whom prioritise maintaining standards and integrity.
- Recommendations include:
- Updating academic integrity policies with clear AI usage guidelines
- Redesigning assessments to focus on higher-order skills
- Strengthening authentication through more oral exams and in-class components
- Incorporating AI literacy into the curriculum
- Monitoring outcomes to allow continuous improvement
The report concludes that generative AI should be viewed as an opportunity to enhance education quality rather than just a threat. With thoughtful implementation, AI can push institutions to improve assessment practices and better prepare graduates for an AI-enhanced workplace while maintaining high standards of competence.
Conclusions
I don’t agree with everything in the report and there are sections where it seems to have deviated from my instructions. Nonetheless, it’s a thought-provoking piece of work that mostly holds up.
Don’t think of the output as a “less-good Wikipedia article”, or a decent undergraduate project. Right now, the ideal use case (for me) is for Deep Research to create a good-enough report on a question that includes several interacting variables, that would typically take me several days (or weeks) to prepare. Deep Research is a research assistant that can put together a comprehensive response to a complex question.
I agree that the output isn’t PhD- or even good MSc-level work. So what? This technology didn’t exist 6 months ago and yet we’re complaining that the quality of it’s work isn’t at the level of a domain expert. Baffling.
- Is it perfect? No.
- Could I do better? Yes.
- How long would it take me? Depending on the question, several days. Most likely a couple of weeks.
- How long did this take me? 20 minutes.
- How confident am I in the output? Based on a preliminary skim, I’m 70% confident that the report is mostly right, even where I disagree with it.
And in case I haven’t said it enough, this is the worst that reasoning models are ever going to be.
We need to massively update our intuitions about what AI models are capable of, as well as recalibrate our expectations about what students can do with them. If not, then I’m sorry to say that we’re in a lot trouble.