OTHER FIELD NOTES
The Problem with AI Insights, Part 2: If You Must, Here’s Some Notes on How
Understanding Knowledge Cultures: What the relationship between “research” and “strategy” says about an organization
The Problem with AI Insights, Part 1: The Wrong Way to Use the Tool
Self-knowledge: An Overlooked Skill for the Future
Exploring the Future of Automobile Travel
Driving Innovation by Activating a Company's History as "Cultural Data" with Generative Artificial Intelligence
Building a Meaningful Car Sharing Service of the Future: Four Principles for Radical Human-Centric Design
Car Sharing of the Future: Unlocking a New World of Automotive Experiences Beyond the Commuter Paradigm




FIELD NOTE
The Problem with AI insights, Part 3: Turning AI into a research partner, Why training your gen-AI on methodology leads to better results
Daniel Mai
Dec 17, 2025
Not a day passes that gen-AI-driven insights solutions are peddled as the future of applied social-scientific research. Following the popular trope of automation, they are advertised to deliver quicker results at lower costs, supposedly making researchers more efficient and productive. The disappointing truth is: AI-generated insights still suck, as Paul Hartley (2025, October 30) pointed out in the first article of our AI series. When applied in practice, popular systems such as ChatGPT, Gemini, and Claude.ai mostly produce generic insights, riddled with inconsistencies and hallucinations, not to mention a lack of methodological rigor and transparency.
What I heard at the recent 2025 Ethnographic Practice in Industry (EPIC) conference in Helsinki only confirmed our own experiences. Several colleagues presented their approaches on how they had experimented with gen-AI chatbots in their qualitative research workflows. They had used them to analyze interview transcripts, extract patterns in unstructured data, and interrogate "synthetic personas". The overall verdict was a mix of curiosity and experimental joy, but mostly disappointment and frustration about the inferior quality of AI-generated insights dominated.
So what can gen-AI actually be good for in applied social-scientific research if it remains a sub-par automation solution for ready-made insights?
Advancing from insights automation tool to research partner
For some time, I have been experimenting with a primary use case in mind – training a gen-AI tool to be a research partner that participates in a dialogue with a trained expert to elevate their thinking at particular points of the human-driven insights generation process. I did not expect the AI system to automatically analyze raw customer/user data at the initial stage of analysis. Coming in later, I wanted the system to help me interpret descriptive patterns of human behavior (observations) that I had distilled manually – i.e. work with the intermediate outcomes of a human-led analytical process. I wanted to see if the system could reliably and convincingly provide the 'why' behind the 'what' of an insight by applying select concepts and theories of the social sciences.
Some training was required to achieve that goal. The real failure of default gen-AI models is not just their hallucinations or factual inaccuracies. It's that they are insufficiently attuned to the ways of knowledge-making of the social sciences. In default mode, gen-AI models simulate reasoning, but often without explicating their "understanding" and application of theory, method, or intellectual tradition. As such, they are poor sparring partners for human-centric researchers. Hence, this shortcoming needed to be fixed first.
To this end, I have been training a ChatGPT model to argue like a business anthropologist. That means it can distinguish between different theoretical approaches, ground claims in academic literature, articulate its assumptions, use reflexivity and – in still limited ways – contextual sensitivity. In other words, it doesn't just tell you what it thinks, but why it thinks that way. The outcome is still miles away from achieving the vision of explainable AI, but a productive next step that makes the tool more useful for our applied research purposes.
Academic socialization of the AI
Training a gen-AI model for research is a similar process to mentoring a graduate student, or training a commercial researcher. It involves more than feeding it more data; it requires acquainting it with specific schools of thought, sharpening its argumentative coherence, and refining its tone. More specifically, I needed the AI model to adhere to a range of qualities that mark a good research partner:
Fulfilling the most basic scientific standards, an AI must make transparent the knowledge models it employs and the principles by which it makes sense of information. This is where the field of epistemology comes in handy – the philosophical study of the source of knowledge and how knowledge is generated (see e.g. Kant, 1781/2009, Kuhn, 1962). It helps us interrogate how we know what we know.
AI must simulate not just knowledge, but unravel the conditions under which knowledge becomes meaningful to people in terms of conscious human experience. This matters when wanting to interpret human experience – e.g. explaining how people relate to objects and the world out there. The field of phenomenology is concerned with exactly that (see e.g. Husserl, 1936/1969, Heidegger, 1927/2005). It investigates how human perception of reality is always embodied and subjective, the experience of which is not the same as the "real thing itself" under observation.
Next, an AI must reflect how meaning is not miraculously discovered in a blank space but collaboratively created and ascribed by human actors. In this respect, social constructivism is a central theory of knowledge which acknowledges that knowledge and our view of reality are co-produced in social interaction and embedded in cultural contexts (see e.g. Berger & Luckmann, 1967).
Last but not least, an AI must be taught to situate its claims, acknowledge its limitations, and declare its positionality. This is what the overarching methodological idea of reflexivity is good for. Following Alvesson and Sköldberg (2009), all knowledge is situated in particular traditions and subjectively biased, and a good research partner should openly reflect upon that.
Training protocol
Practically, I conducted this AI training in parallel with our ongoing innovation research. To compare effort and quality, I tested the model on a live research project in tandem with our manual, anthropologist-only insights generation approach. Using ChatGPT 5 Plus' GPT custom model feature, I created an "AnthroGPT" in a few simple steps.
First, I uploaded a curated corpus of academic handbooks, concept and theory dictionaries, ethnographic texts, analytical methods literature, and our own publications – pieces that I know in detail from my own PhD training and decade-long applied research work, which I had amassed in my digital library. This corpus-based personalization allowed me to control the model. Because I know the uploaded literature intimately, I could easily detect hallucinations, challenge weak reasoning, and refine its discursive practices.
Second, I instructed the AI model to prioritize this literature over its default training set, defining a hierarchy of reference that ranges from curated corpus to default training to more recent online sources. Since I could not, however, work on an untrained "blank slate" GPT model, this hierarchy of reference was intended to diminish the impact of default training, which OpenAI keeps entirely opaque.
Third, I defined for the entire model that it should employ particular epistemic models, behavioral concepts, ways of worldmaking and interpretation of experience, discursive rules, and automatically reflect upon their use and their limitations with every user prompt. Without listing all of them here, prioritized models and concepts included the ethnographic imagination, cultural relativism, social constructivism, abductive reflexivity, and other approaches that I personally consider core to our craft. Additionally, I outlined key principles in the general instructions how the system should argue: e.g. maintain scholarly rigor by citing and comparing relevant academic concepts and theories, ask for clarification when prompts remain ambiguous, provide nuanced answers based on the information available, clearly reveal the limits of knowledge without making things up, and some other instructions that are related to the tone and style of writing.
Fourth, I gradually tweaked the model in preview mode. This involved asking it questions about my own PhD research (that I had uploaded as part of the training corpus) to see if it could argue in similar ways and arrive at similar conclusions. Then I added more prompts in the general instructions section of the configure tab to make it more didactic, increase the amount of interdisciplinary cross-references, and reduce the formality of tone.
Testing the model in practice
So how did the trained AI model fare in practice? I started by feeding it written-up patterns of human behavior and clusters of observations from a current innovation research project – the descriptive 'What' part of an insight that my team and I had manually synthesized the old-fashioned way. The AnthroGPT instantly provided good interpretations. It managed to elevate my analytical takes on the 'Why' behind what we had heard, observed, experienced, and distilled manually. It reliably referred to its prioritized training corpus and employed theories and concepts as instructed, also sticking to the tone and reflexivity I desired. More importantly, it managed to meaningfully connect and elevate strands of thought, provide inspiration for the application of some theories and concepts we would have overseen otherwise, and do so within minutes. In other words, it was a success.
Testing if a partially trained GPT would do better than its default counterpart, I once again tried using the model for analyzing datasets of raw unstructured interview transcripts and semistructured, observational fieldnotes from our project. While this was not intended to be the trained model's primary use case, I wanted to see if it could automatically distill observations that represent insightful patterns of commonality and difference in human behavior from our research sample. After all, I had uploaded a handbook on ethnographic analysis, and two sociological handbooks on qualitative data analysis and interpretation, also instructing the model to run through a particular analysis process outlined in a specific chapter. Apparently, this had not been enough.
Similar to my initial experience outlined in the beginning of this article, the trained AI model remained embarrassingly superficial, overemphasized certain sections of the data, misidentified supposed patterns, and kept hallucinating not just quotes but fabulated individual respondents who were not in the sample. Even when prompting the AI system to double check data references and avoid hallucinations at all costs, it remained incapable of doing so reliably. Compared to what our manual analysis process had produced, the output was not convincing. In other words, methodology training could not make up for a lack of hands-on training in practical methods and techniques.
What is still missing
To become helpful as a research partner throughout the entire insights generation process, a few things are still missing in terms of methods and context.
On the research practice level of methods, an AI must learn how to make sense of unstructured data, such as field notes and transcripts, beyond just summarizing it. An AI has to be able to transparently engage in a process of data reduction, data display, case-by-case comparison, conclusion drawing, and verification (cf. Miles & Huberman, 1994). It requires organizing, structuring, focusing, refining, and coding formerly unstructured data into something coherent that can be worked with in a systematic way. This process allows us to reliably and repeatedly identify patterns of commonality, variations, disruptions, and clear differences in language and observed behavior in the data – the descriptive 'what' part of an insight.
That's why it is imperative to train AI on methods of qualitative data analysis (see e.g. Kelle & Kluge, 2010, Silverman, 2006). This endeavor needn't start from scratch. Developers of popular gen-AI systems are well-advised to incorporate techniques that qualitative data analysis software, such as MAXQDA and ATLAS.ti, have been offering for decades – digital workbenches that have recently been gaining AI-driven functionalities themselves.
Most of the human patterns of commonality and difference we are interested in are research context-dependent, and the most fascinating ones often remain non-obvious. Prior explanation of context – be it social, cultural, political, economic, etc. – is thus needed for an AI to make sense of the data from the perspective of those humans the research investigates. But explanation of the researcher's context is equally important. In addition to academic paradigms of thinking, it is key to outline central business questions, project goals, research questions, and central frameworks that shape the course of analysis and determine the usefulness of insights from a client perspective. Hence, future training of an AI must also account for context.
Conclusion
Overall, the trained "AnthroGPT" model was helpful in elevating our own analysis and thinking; but it failed miserably at executing a convincing primary analysis of unstructured qualitative data by itself. In this respect, the system was still a one-trick pony. While there are other commercially available gen-AI systems that promise to do a better job at data analysis than ChatGPT (e.g. Gemini, Breyta), this experimental outcome with the leading player in the industry reverberates our initial reservations toward the AI-driven automation of insights.
AI-powered market research tools that claim to fully automate a full suite of complex research tasks – from study design, participant recruitment, chatbot-conducted remote interviews, to qualitative analysis, and insight generation – are misleading at best, and potentially dangerous when it comes to relying on them for business decisions. Unsurprisingly, none of these commercial offerings are outlining how they have trained their AI systems to make them suited for human-centric research – because they most likely have not done so.
However, the real promise of generative AI in social-scientific research lies not in automating laborious tasks and hoping for ready-made insights. Instead, it can help us inquire about people in better ways. Better, in this regard, doesn't mean faster or cheaper. It means deeper inquiry, evoking new directions, and sparking new ideas, thus uncovering what the individual human eye may miss. A custom trained gen-AI doesn’t replace the human researcher. Instead, it becomes a partner, trained as a research assistant, who is capable of managing within a clearly defined approach to analysis.
But that potential only unfolds when we move beyond treating AI as a ready-set tool and engage it as a semi-autonomous collaborator that can be shaped. Much like a junior colleague or student, the machine's algorithmic process must be academically and practically socialized, critiqued, and held accountable. Since the quality of collaboration in my own tests ranged from elevating human reasoning to making misleading conclusions in primary analysis, the application of gen-AI can only remain a supervised experiment for now that has limited impact on strategic decisionmaking. Unless something magical happens, the role of the human expert as the last instance and interpretive authority figure will not be replaced by a machine any time soon.
This calls for a shift in mindset: from users seeking quick answers and automating workflows – to researchers cultivating multi-step dialogical systems. The black box may remain opaque as long as explainable AI hasn't arrived, but through careful contextualization, reflexive training, and epistemological grounding, we can illuminate what truly matters: making sense of the complexity of the human condition.
References
Alvesson, M., & Sköldberg, K. (2009). Reflexive methodology: New vistas for qualitative research (2nd ed.). London: Sage.
Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the sociology of knowledge. New York City, NY: Anchor Books.
Comte, A. (1844/1995). Discours sur l'esprit positif. Paris: Vrin.
Hartley, P. (2025, October 30). The problem with AI insights part 1: The wrong way to use the tool. Human Futures. Retrieved from https://humanfutures.com/fieldnotes/the-problem-with-ai-insights-part-1-the-wrong-way-to-use-the-tool.
Heidegger, M. (1927/2005). Die Grundprobleme der Phänomenologie. Frankfurt a. M.: Klostermann Seminar.
Husserl, E. (1936/1969). Die Krisis der europäischen Wissenschaften und die transzendentale Phänomenologie. Husserliana, Vol. 4. Den Haag: Nijhoff.
Kant, I. (1781/2009). Kritik der reinen Vernunft. Cologne: Anaconda.
Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.
Silverman, D. (2006). Interpreting qualitative data. London: Sage
OTHER FIELD NOTES
FIELD NOTE
The Problem with AI insights, Part 3: Turning AI into a research partner, Why training your gen-AI on methodology leads to better results
Daniel Mai
Dec 17, 2025
Not a day passes that gen-AI-driven insights solutions are peddled as the future of applied social-scientific research. Following the popular trope of automation, they are advertised to deliver quicker results at lower costs, supposedly making researchers more efficient and productive. The disappointing truth is: AI-generated insights still suck, as Paul Hartley (2025, October 30) pointed out in the first article of our AI series. When applied in practice, popular systems such as ChatGPT, Gemini, and Claude.ai mostly produce generic insights, riddled with inconsistencies and hallucinations, not to mention a lack of methodological rigor and transparency.
What I heard at the recent 2025 Ethnographic Practice in Industry (EPIC) conference in Helsinki only confirmed our own experiences. Several colleagues presented their approaches on how they had experimented with gen-AI chatbots in their qualitative research workflows. They had used them to analyze interview transcripts, extract patterns in unstructured data, and interrogate "synthetic personas". The overall verdict was a mix of curiosity and experimental joy, but mostly disappointment and frustration about the inferior quality of AI-generated insights dominated.
So what can gen-AI actually be good for in applied social-scientific research if it remains a sub-par automation solution for ready-made insights?
Advancing from insights automation tool to research partner
For some time, I have been experimenting with a primary use case in mind – training a gen-AI tool to be a research partner that participates in a dialogue with a trained expert to elevate their thinking at particular points of the human-driven insights generation process. I did not expect the AI system to automatically analyze raw customer/user data at the initial stage of analysis. Coming in later, I wanted the system to help me interpret descriptive patterns of human behavior (observations) that I had distilled manually – i.e. work with the intermediate outcomes of a human-led analytical process. I wanted to see if the system could reliably and convincingly provide the 'why' behind the 'what' of an insight by applying select concepts and theories of the social sciences.
Some training was required to achieve that goal. The real failure of default gen-AI models is not just their hallucinations or factual inaccuracies. It's that they are insufficiently attuned to the ways of knowledge-making of the social sciences. In default mode, gen-AI models simulate reasoning, but often without explicating their "understanding" and application of theory, method, or intellectual tradition. As such, they are poor sparring partners for human-centric researchers. Hence, this shortcoming needed to be fixed first.
To this end, I have been training a ChatGPT model to argue like a business anthropologist. That means it can distinguish between different theoretical approaches, ground claims in academic literature, articulate its assumptions, use reflexivity and – in still limited ways – contextual sensitivity. In other words, it doesn't just tell you what it thinks, but why it thinks that way. The outcome is still miles away from achieving the vision of explainable AI, but a productive next step that makes the tool more useful for our applied research purposes.
Academic socialization of the AI
Training a gen-AI model for research is a similar process to mentoring a graduate student, or training a commercial researcher. It involves more than feeding it more data; it requires acquainting it with specific schools of thought, sharpening its argumentative coherence, and refining its tone. More specifically, I needed the AI model to adhere to a range of qualities that mark a good research partner:
Fulfilling the most basic scientific standards, an AI must make transparent the knowledge models it employs and the principles by which it makes sense of information. This is where the field of epistemology comes in handy – the philosophical study of the source of knowledge and how knowledge is generated (see e.g. Kant, 1781/2009, Kuhn, 1962). It helps us interrogate how we know what we know.
AI must simulate not just knowledge, but unravel the conditions under which knowledge becomes meaningful to people in terms of conscious human experience. This matters when wanting to interpret human experience – e.g. explaining how people relate to objects and the world out there. The field of phenomenology is concerned with exactly that (see e.g. Husserl, 1936/1969, Heidegger, 1927/2005). It investigates how human perception of reality is always embodied and subjective, the experience of which is not the same as the "real thing itself" under observation.
Next, an AI must reflect how meaning is not miraculously discovered in a blank space but collaboratively created and ascribed by human actors. In this respect, social constructivism is a central theory of knowledge which acknowledges that knowledge and our view of reality are co-produced in social interaction and embedded in cultural contexts (see e.g. Berger & Luckmann, 1967).
Last but not least, an AI must be taught to situate its claims, acknowledge its limitations, and declare its positionality. This is what the overarching methodological idea of reflexivity is good for. Following Alvesson and Sköldberg (2009), all knowledge is situated in particular traditions and subjectively biased, and a good research partner should openly reflect upon that.
Training protocol
Practically, I conducted this AI training in parallel with our ongoing innovation research. To compare effort and quality, I tested the model on a live research project in tandem with our manual, anthropologist-only insights generation approach. Using ChatGPT 5 Plus' GPT custom model feature, I created an "AnthroGPT" in a few simple steps.
First, I uploaded a curated corpus of academic handbooks, concept and theory dictionaries, ethnographic texts, analytical methods literature, and our own publications – pieces that I know in detail from my own PhD training and decade-long applied research work, which I had amassed in my digital library. This corpus-based personalization allowed me to control the model. Because I know the uploaded literature intimately, I could easily detect hallucinations, challenge weak reasoning, and refine its discursive practices.
Second, I instructed the AI model to prioritize this literature over its default training set, defining a hierarchy of reference that ranges from curated corpus to default training to more recent online sources. Since I could not, however, work on an untrained "blank slate" GPT model, this hierarchy of reference was intended to diminish the impact of default training, which OpenAI keeps entirely opaque.
Third, I defined for the entire model that it should employ particular epistemic models, behavioral concepts, ways of worldmaking and interpretation of experience, discursive rules, and automatically reflect upon their use and their limitations with every user prompt. Without listing all of them here, prioritized models and concepts included the ethnographic imagination, cultural relativism, social constructivism, abductive reflexivity, and other approaches that I personally consider core to our craft. Additionally, I outlined key principles in the general instructions how the system should argue: e.g. maintain scholarly rigor by citing and comparing relevant academic concepts and theories, ask for clarification when prompts remain ambiguous, provide nuanced answers based on the information available, clearly reveal the limits of knowledge without making things up, and some other instructions that are related to the tone and style of writing.
Fourth, I gradually tweaked the model in preview mode. This involved asking it questions about my own PhD research (that I had uploaded as part of the training corpus) to see if it could argue in similar ways and arrive at similar conclusions. Then I added more prompts in the general instructions section of the configure tab to make it more didactic, increase the amount of interdisciplinary cross-references, and reduce the formality of tone.
Testing the model in practice
So how did the trained AI model fare in practice? I started by feeding it written-up patterns of human behavior and clusters of observations from a current innovation research project – the descriptive 'What' part of an insight that my team and I had manually synthesized the old-fashioned way. The AnthroGPT instantly provided good interpretations. It managed to elevate my analytical takes on the 'Why' behind what we had heard, observed, experienced, and distilled manually. It reliably referred to its prioritized training corpus and employed theories and concepts as instructed, also sticking to the tone and reflexivity I desired. More importantly, it managed to meaningfully connect and elevate strands of thought, provide inspiration for the application of some theories and concepts we would have overseen otherwise, and do so within minutes. In other words, it was a success.
Testing if a partially trained GPT would do better than its default counterpart, I once again tried using the model for analyzing datasets of raw unstructured interview transcripts and semistructured, observational fieldnotes from our project. While this was not intended to be the trained model's primary use case, I wanted to see if it could automatically distill observations that represent insightful patterns of commonality and difference in human behavior from our research sample. After all, I had uploaded a handbook on ethnographic analysis, and two sociological handbooks on qualitative data analysis and interpretation, also instructing the model to run through a particular analysis process outlined in a specific chapter. Apparently, this had not been enough.
Similar to my initial experience outlined in the beginning of this article, the trained AI model remained embarrassingly superficial, overemphasized certain sections of the data, misidentified supposed patterns, and kept hallucinating not just quotes but fabulated individual respondents who were not in the sample. Even when prompting the AI system to double check data references and avoid hallucinations at all costs, it remained incapable of doing so reliably. Compared to what our manual analysis process had produced, the output was not convincing. In other words, methodology training could not make up for a lack of hands-on training in practical methods and techniques.
What is still missing
To become helpful as a research partner throughout the entire insights generation process, a few things are still missing in terms of methods and context.
On the research practice level of methods, an AI must learn how to make sense of unstructured data, such as field notes and transcripts, beyond just summarizing it. An AI has to be able to transparently engage in a process of data reduction, data display, case-by-case comparison, conclusion drawing, and verification (cf. Miles & Huberman, 1994). It requires organizing, structuring, focusing, refining, and coding formerly unstructured data into something coherent that can be worked with in a systematic way. This process allows us to reliably and repeatedly identify patterns of commonality, variations, disruptions, and clear differences in language and observed behavior in the data – the descriptive 'what' part of an insight.
That's why it is imperative to train AI on methods of qualitative data analysis (see e.g. Kelle & Kluge, 2010, Silverman, 2006). This endeavor needn't start from scratch. Developers of popular gen-AI systems are well-advised to incorporate techniques that qualitative data analysis software, such as MAXQDA and ATLAS.ti, have been offering for decades – digital workbenches that have recently been gaining AI-driven functionalities themselves.
Most of the human patterns of commonality and difference we are interested in are research context-dependent, and the most fascinating ones often remain non-obvious. Prior explanation of context – be it social, cultural, political, economic, etc. – is thus needed for an AI to make sense of the data from the perspective of those humans the research investigates. But explanation of the researcher's context is equally important. In addition to academic paradigms of thinking, it is key to outline central business questions, project goals, research questions, and central frameworks that shape the course of analysis and determine the usefulness of insights from a client perspective. Hence, future training of an AI must also account for context.
Conclusion
Overall, the trained "AnthroGPT" model was helpful in elevating our own analysis and thinking; but it failed miserably at executing a convincing primary analysis of unstructured qualitative data by itself. In this respect, the system was still a one-trick pony. While there are other commercially available gen-AI systems that promise to do a better job at data analysis than ChatGPT (e.g. Gemini, Breyta), this experimental outcome with the leading player in the industry reverberates our initial reservations toward the AI-driven automation of insights.
AI-powered market research tools that claim to fully automate a full suite of complex research tasks – from study design, participant recruitment, chatbot-conducted remote interviews, to qualitative analysis, and insight generation – are misleading at best, and potentially dangerous when it comes to relying on them for business decisions. Unsurprisingly, none of these commercial offerings are outlining how they have trained their AI systems to make them suited for human-centric research – because they most likely have not done so.
However, the real promise of generative AI in social-scientific research lies not in automating laborious tasks and hoping for ready-made insights. Instead, it can help us inquire about people in better ways. Better, in this regard, doesn't mean faster or cheaper. It means deeper inquiry, evoking new directions, and sparking new ideas, thus uncovering what the individual human eye may miss. A custom trained gen-AI doesn’t replace the human researcher. Instead, it becomes a partner, trained as a research assistant, who is capable of managing within a clearly defined approach to analysis.
But that potential only unfolds when we move beyond treating AI as a ready-set tool and engage it as a semi-autonomous collaborator that can be shaped. Much like a junior colleague or student, the machine's algorithmic process must be academically and practically socialized, critiqued, and held accountable. Since the quality of collaboration in my own tests ranged from elevating human reasoning to making misleading conclusions in primary analysis, the application of gen-AI can only remain a supervised experiment for now that has limited impact on strategic decisionmaking. Unless something magical happens, the role of the human expert as the last instance and interpretive authority figure will not be replaced by a machine any time soon.
This calls for a shift in mindset: from users seeking quick answers and automating workflows – to researchers cultivating multi-step dialogical systems. The black box may remain opaque as long as explainable AI hasn't arrived, but through careful contextualization, reflexive training, and epistemological grounding, we can illuminate what truly matters: making sense of the complexity of the human condition.
References
Alvesson, M., & Sköldberg, K. (2009). Reflexive methodology: New vistas for qualitative research (2nd ed.). London: Sage.
Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the sociology of knowledge. New York City, NY: Anchor Books.
Comte, A. (1844/1995). Discours sur l'esprit positif. Paris: Vrin.
Hartley, P. (2025, October 30). The problem with AI insights part 1: The wrong way to use the tool. Human Futures. Retrieved from https://humanfutures.com/fieldnotes/the-problem-with-ai-insights-part-1-the-wrong-way-to-use-the-tool.
Heidegger, M. (1927/2005). Die Grundprobleme der Phänomenologie. Frankfurt a. M.: Klostermann Seminar.
Husserl, E. (1936/1969). Die Krisis der europäischen Wissenschaften und die transzendentale Phänomenologie. Husserliana, Vol. 4. Den Haag: Nijhoff.
Kant, I. (1781/2009). Kritik der reinen Vernunft. Cologne: Anaconda.
Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.
Silverman, D. (2006). Interpreting qualitative data. London: Sage
OTHER FIELD NOTES
FIELD NOTE
The Problem with AI insights, Part 3: Turning AI into a research partner, Why training your gen-AI on methodology leads to better results
Daniel Mai
Dec 17, 2025
Not a day passes that gen-AI-driven insights solutions are peddled as the future of applied social-scientific research. Following the popular trope of automation, they are advertised to deliver quicker results at lower costs, supposedly making researchers more efficient and productive. The disappointing truth is: AI-generated insights still suck, as Paul Hartley (2025, October 30) pointed out in the first article of our AI series. When applied in practice, popular systems such as ChatGPT, Gemini, and Claude.ai mostly produce generic insights, riddled with inconsistencies and hallucinations, not to mention a lack of methodological rigor and transparency.
What I heard at the recent 2025 Ethnographic Practice in Industry (EPIC) conference in Helsinki only confirmed our own experiences. Several colleagues presented their approaches on how they had experimented with gen-AI chatbots in their qualitative research workflows. They had used them to analyze interview transcripts, extract patterns in unstructured data, and interrogate "synthetic personas". The overall verdict was a mix of curiosity and experimental joy, but mostly disappointment and frustration about the inferior quality of AI-generated insights dominated.
So what can gen-AI actually be good for in applied social-scientific research if it remains a sub-par automation solution for ready-made insights?
Advancing from insights automation tool to research partner
For some time, I have been experimenting with a primary use case in mind – training a gen-AI tool to be a research partner that participates in a dialogue with a trained expert to elevate their thinking at particular points of the human-driven insights generation process. I did not expect the AI system to automatically analyze raw customer/user data at the initial stage of analysis. Coming in later, I wanted the system to help me interpret descriptive patterns of human behavior (observations) that I had distilled manually – i.e. work with the intermediate outcomes of a human-led analytical process. I wanted to see if the system could reliably and convincingly provide the 'why' behind the 'what' of an insight by applying select concepts and theories of the social sciences.
Some training was required to achieve that goal. The real failure of default gen-AI models is not just their hallucinations or factual inaccuracies. It's that they are insufficiently attuned to the ways of knowledge-making of the social sciences. In default mode, gen-AI models simulate reasoning, but often without explicating their "understanding" and application of theory, method, or intellectual tradition. As such, they are poor sparring partners for human-centric researchers. Hence, this shortcoming needed to be fixed first.
To this end, I have been training a ChatGPT model to argue like a business anthropologist. That means it can distinguish between different theoretical approaches, ground claims in academic literature, articulate its assumptions, use reflexivity and – in still limited ways – contextual sensitivity. In other words, it doesn't just tell you what it thinks, but why it thinks that way. The outcome is still miles away from achieving the vision of explainable AI, but a productive next step that makes the tool more useful for our applied research purposes.
Academic socialization of the AI
Training a gen-AI model for research is a similar process to mentoring a graduate student, or training a commercial researcher. It involves more than feeding it more data; it requires acquainting it with specific schools of thought, sharpening its argumentative coherence, and refining its tone. More specifically, I needed the AI model to adhere to a range of qualities that mark a good research partner:
Fulfilling the most basic scientific standards, an AI must make transparent the knowledge models it employs and the principles by which it makes sense of information. This is where the field of epistemology comes in handy – the philosophical study of the source of knowledge and how knowledge is generated (see e.g. Kant, 1781/2009, Kuhn, 1962). It helps us interrogate how we know what we know.
AI must simulate not just knowledge, but unravel the conditions under which knowledge becomes meaningful to people in terms of conscious human experience. This matters when wanting to interpret human experience – e.g. explaining how people relate to objects and the world out there. The field of phenomenology is concerned with exactly that (see e.g. Husserl, 1936/1969, Heidegger, 1927/2005). It investigates how human perception of reality is always embodied and subjective, the experience of which is not the same as the "real thing itself" under observation.
Next, an AI must reflect how meaning is not miraculously discovered in a blank space but collaboratively created and ascribed by human actors. In this respect, social constructivism is a central theory of knowledge which acknowledges that knowledge and our view of reality are co-produced in social interaction and embedded in cultural contexts (see e.g. Berger & Luckmann, 1967).
Last but not least, an AI must be taught to situate its claims, acknowledge its limitations, and declare its positionality. This is what the overarching methodological idea of reflexivity is good for. Following Alvesson and Sköldberg (2009), all knowledge is situated in particular traditions and subjectively biased, and a good research partner should openly reflect upon that.
Training protocol
Practically, I conducted this AI training in parallel with our ongoing innovation research. To compare effort and quality, I tested the model on a live research project in tandem with our manual, anthropologist-only insights generation approach. Using ChatGPT 5 Plus' GPT custom model feature, I created an "AnthroGPT" in a few simple steps.
First, I uploaded a curated corpus of academic handbooks, concept and theory dictionaries, ethnographic texts, analytical methods literature, and our own publications – pieces that I know in detail from my own PhD training and decade-long applied research work, which I had amassed in my digital library. This corpus-based personalization allowed me to control the model. Because I know the uploaded literature intimately, I could easily detect hallucinations, challenge weak reasoning, and refine its discursive practices.
Second, I instructed the AI model to prioritize this literature over its default training set, defining a hierarchy of reference that ranges from curated corpus to default training to more recent online sources. Since I could not, however, work on an untrained "blank slate" GPT model, this hierarchy of reference was intended to diminish the impact of default training, which OpenAI keeps entirely opaque.
Third, I defined for the entire model that it should employ particular epistemic models, behavioral concepts, ways of worldmaking and interpretation of experience, discursive rules, and automatically reflect upon their use and their limitations with every user prompt. Without listing all of them here, prioritized models and concepts included the ethnographic imagination, cultural relativism, social constructivism, abductive reflexivity, and other approaches that I personally consider core to our craft. Additionally, I outlined key principles in the general instructions how the system should argue: e.g. maintain scholarly rigor by citing and comparing relevant academic concepts and theories, ask for clarification when prompts remain ambiguous, provide nuanced answers based on the information available, clearly reveal the limits of knowledge without making things up, and some other instructions that are related to the tone and style of writing.
Fourth, I gradually tweaked the model in preview mode. This involved asking it questions about my own PhD research (that I had uploaded as part of the training corpus) to see if it could argue in similar ways and arrive at similar conclusions. Then I added more prompts in the general instructions section of the configure tab to make it more didactic, increase the amount of interdisciplinary cross-references, and reduce the formality of tone.
Testing the model in practice
So how did the trained AI model fare in practice? I started by feeding it written-up patterns of human behavior and clusters of observations from a current innovation research project – the descriptive 'What' part of an insight that my team and I had manually synthesized the old-fashioned way. The AnthroGPT instantly provided good interpretations. It managed to elevate my analytical takes on the 'Why' behind what we had heard, observed, experienced, and distilled manually. It reliably referred to its prioritized training corpus and employed theories and concepts as instructed, also sticking to the tone and reflexivity I desired. More importantly, it managed to meaningfully connect and elevate strands of thought, provide inspiration for the application of some theories and concepts we would have overseen otherwise, and do so within minutes. In other words, it was a success.
Testing if a partially trained GPT would do better than its default counterpart, I once again tried using the model for analyzing datasets of raw unstructured interview transcripts and semistructured, observational fieldnotes from our project. While this was not intended to be the trained model's primary use case, I wanted to see if it could automatically distill observations that represent insightful patterns of commonality and difference in human behavior from our research sample. After all, I had uploaded a handbook on ethnographic analysis, and two sociological handbooks on qualitative data analysis and interpretation, also instructing the model to run through a particular analysis process outlined in a specific chapter. Apparently, this had not been enough.
Similar to my initial experience outlined in the beginning of this article, the trained AI model remained embarrassingly superficial, overemphasized certain sections of the data, misidentified supposed patterns, and kept hallucinating not just quotes but fabulated individual respondents who were not in the sample. Even when prompting the AI system to double check data references and avoid hallucinations at all costs, it remained incapable of doing so reliably. Compared to what our manual analysis process had produced, the output was not convincing. In other words, methodology training could not make up for a lack of hands-on training in practical methods and techniques.
What is still missing
To become helpful as a research partner throughout the entire insights generation process, a few things are still missing in terms of methods and context.
On the research practice level of methods, an AI must learn how to make sense of unstructured data, such as field notes and transcripts, beyond just summarizing it. An AI has to be able to transparently engage in a process of data reduction, data display, case-by-case comparison, conclusion drawing, and verification (cf. Miles & Huberman, 1994). It requires organizing, structuring, focusing, refining, and coding formerly unstructured data into something coherent that can be worked with in a systematic way. This process allows us to reliably and repeatedly identify patterns of commonality, variations, disruptions, and clear differences in language and observed behavior in the data – the descriptive 'what' part of an insight.
That's why it is imperative to train AI on methods of qualitative data analysis (see e.g. Kelle & Kluge, 2010, Silverman, 2006). This endeavor needn't start from scratch. Developers of popular gen-AI systems are well-advised to incorporate techniques that qualitative data analysis software, such as MAXQDA and ATLAS.ti, have been offering for decades – digital workbenches that have recently been gaining AI-driven functionalities themselves.
Most of the human patterns of commonality and difference we are interested in are research context-dependent, and the most fascinating ones often remain non-obvious. Prior explanation of context – be it social, cultural, political, economic, etc. – is thus needed for an AI to make sense of the data from the perspective of those humans the research investigates. But explanation of the researcher's context is equally important. In addition to academic paradigms of thinking, it is key to outline central business questions, project goals, research questions, and central frameworks that shape the course of analysis and determine the usefulness of insights from a client perspective. Hence, future training of an AI must also account for context.
Conclusion
Overall, the trained "AnthroGPT" model was helpful in elevating our own analysis and thinking; but it failed miserably at executing a convincing primary analysis of unstructured qualitative data by itself. In this respect, the system was still a one-trick pony. While there are other commercially available gen-AI systems that promise to do a better job at data analysis than ChatGPT (e.g. Gemini, Breyta), this experimental outcome with the leading player in the industry reverberates our initial reservations toward the AI-driven automation of insights.
AI-powered market research tools that claim to fully automate a full suite of complex research tasks – from study design, participant recruitment, chatbot-conducted remote interviews, to qualitative analysis, and insight generation – are misleading at best, and potentially dangerous when it comes to relying on them for business decisions. Unsurprisingly, none of these commercial offerings are outlining how they have trained their AI systems to make them suited for human-centric research – because they most likely have not done so.
However, the real promise of generative AI in social-scientific research lies not in automating laborious tasks and hoping for ready-made insights. Instead, it can help us inquire about people in better ways. Better, in this regard, doesn't mean faster or cheaper. It means deeper inquiry, evoking new directions, and sparking new ideas, thus uncovering what the individual human eye may miss. A custom trained gen-AI doesn’t replace the human researcher. Instead, it becomes a partner, trained as a research assistant, who is capable of managing within a clearly defined approach to analysis.
But that potential only unfolds when we move beyond treating AI as a ready-set tool and engage it as a semi-autonomous collaborator that can be shaped. Much like a junior colleague or student, the machine's algorithmic process must be academically and practically socialized, critiqued, and held accountable. Since the quality of collaboration in my own tests ranged from elevating human reasoning to making misleading conclusions in primary analysis, the application of gen-AI can only remain a supervised experiment for now that has limited impact on strategic decisionmaking. Unless something magical happens, the role of the human expert as the last instance and interpretive authority figure will not be replaced by a machine any time soon.
This calls for a shift in mindset: from users seeking quick answers and automating workflows – to researchers cultivating multi-step dialogical systems. The black box may remain opaque as long as explainable AI hasn't arrived, but through careful contextualization, reflexive training, and epistemological grounding, we can illuminate what truly matters: making sense of the complexity of the human condition.
References
Alvesson, M., & Sköldberg, K. (2009). Reflexive methodology: New vistas for qualitative research (2nd ed.). London: Sage.
Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the sociology of knowledge. New York City, NY: Anchor Books.
Comte, A. (1844/1995). Discours sur l'esprit positif. Paris: Vrin.
Hartley, P. (2025, October 30). The problem with AI insights part 1: The wrong way to use the tool. Human Futures. Retrieved from https://humanfutures.com/fieldnotes/the-problem-with-ai-insights-part-1-the-wrong-way-to-use-the-tool.
Heidegger, M. (1927/2005). Die Grundprobleme der Phänomenologie. Frankfurt a. M.: Klostermann Seminar.
Husserl, E. (1936/1969). Die Krisis der europäischen Wissenschaften und die transzendentale Phänomenologie. Husserliana, Vol. 4. Den Haag: Nijhoff.
Kant, I. (1781/2009). Kritik der reinen Vernunft. Cologne: Anaconda.
Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.
Silverman, D. (2006). Interpreting qualitative data. London: Sage
OTHER FIELD NOTES