OTHER FIELD NOTES
Driving Innovation by Activating a Company's History as "Cultural Data" with Generative Artificial Intelligence
Self-knowledge: An Overlooked Skill for the Future
Understanding Knowledge Cultures: What the relationship between “research” and “strategy” says about an organization
Exploring the Future of Automobile Travel




FIELD NOTE
The Problem with AI Insights Part 1: The Wrong Way to Use the Tool
Paul Hartley
Oct 30, 2025
This is part one of a multi-part series on artificial intelligence and its problems, potential, and uses in developing insights for use in strategy, innovation, and design. This article starts by revealing some of the most important problems. The next articles in the series will provide suggestions for the solutions and appropriate use.
Much has been made recently in the news, marketing claims, and business chatter about the possibility of AI insights. New start-ups have popped into existence to provide this as a service. Agencies are promising AI-backed insights delivered almost instantaneously. And many new tools have been developed for use in companies craving quick, reliable information about what people want, and how they behave. A lot of this is built on the stunning successes of ChatGPT, Google’s Gemini, and other large language models (LLMs) that can seemingly generate knowledge out of thin air, summarize a mass of dull data, or generate an image of anything you can imagine. Amazing as these LLMs are, are they able to help us understand more about ourselves—and by extension understand something about the people we are trying to serve in a business setting? Or is a lot of this just part of the hype cycle and the claims of AI insights a fascination coupled with an overestimation of what’s possible?
The answer is yes to both. Some carefully built and monitored LLMs are capable of being a powerful tool in the search for useful knowledge about customers, users and patients. But much of what is being said about AI driven insights is just hype—dangerous hype at that.
The current state of AI insights is grounded in a tendency to prefer the new shiny object over its older alternatives. We all love a new toy, and because for most people who are not aware of the long history of AI research and development, these seem like they have emerged fully formed from nothing. However, all of these tools have a longer history, and they are actually carrying the baggage of what came before. First amongst these issues is the prevailing belief that AI is somehow intelligent as we conventionally understand it, and that these tools, being machines, are superior to humans in some ways. These beliefs come from the old claims of machines well before computing, during the industrial revolution where mechanical machines were sold as doing more than a single man. But they also come from the stories we have told in books, films, and television shows where AI beings are able to outclass their human counterparts. The idea that these LLMs, which are software systems and server farms, are able to outperform people in their analytical capacity is really just an assumption underpinned by stories from the past.
The second issue is one that gets the business world buzzing: efficiency. AI tools promise efficiencies in speed and scale. Using these tools in their current forms, we are already able to churn through clean and sensible data more quickly. We can summarize meetings, take notes more easily, generate images and videos that replace graphic designers and photographers, and eliminate entire business units with cloud-based tools. All of this is undeniably true, and mostly useful. But when it comes to understanding people and trying to develop new products and services for them, we have to ask the question, does efficiency provide us with something more valuable than what we had before? Realistically, no. It does not. And the reason is because it is not a lack of speed that makes the task difficult. Good insights about humans, their thoughts, actions, and beliefs require patience and persistence, two things not offen associated with immediacy. Moreover, the efficiencies LLMs provide in a research and analysis process are actually quite limited. One can process more inputs, but only in small tightly controlled environments. Mostly this is about data management, and as a consequence, the preparations make rapidity impractical. The training process to accomplish a task is long and difficult. Just jamming a bunch of interviews into ChatGPT is not an analysis process. It is just an exercise in summarization, fraught with hallucinations, mistakes, and fundamental misunderstanding. The efficiencies AI can provide are undone by the errors and the arduous task of training it to do it properly.
Whether AI-driven insights are perceived as possible actually turns on non-technological points. Fundamentally, research about humans only begins in the discovery of facts and observations, where researching people and their actions is mostly an analytical procedure. If AI is even capable of developing insights about people that are more than mere factoids is a question answered in examining research methods and in the true structure and workings of an LLM. This is more of a question of design and AI’s use as a tool in what is still a fundamentally human-directed effort.
Starting in the 1950s, linguistic scholars like Noam Chomsky (yes, that Chomsky), Zellig S. Harris, Morris Halle, and others considered the idea that all of language, even the act of speaking, has some underlying code. They went looking for the generative grammar of language, a set of rules that were sufficient to explain how a brain can generate a finite, but incredibly vast and complex web of utterances, and to show the deep structure of all languages. While they were able to revolutionize linguistics, and ultimately provide many of the underpinnings for the semantic theories that contributed to natural language process (NLP) and LLM structures, they never did manage to find the laws of motion for language and speech. Their task failed to find these laws in language and completely missed how a complex system gives rise to a nearly infinite variation of possible things to say. Despite their failure to do so, the idea this was possible somewhat stuck around, reappearing in the promises of big data. The idea being that through all of the noise of a seemingly infinite amount of data, there were some basic laws that held everything together. While the methods changed, the idea stuck around, and attached itself to artificial intelligence.
This is the same idea lurking at the core of AI insights. While the desire is still there, the ability to fulfill this promise is entirely missing. There is still no strong indication of any deep logic holding human behaviour together—at least, none that serves to explain how and why people behave beyond the most superficial details. However, the belief that AI is capable of this task, or something close to it, is part of the promise of AI insights. The idea is that machines, with their superior abilities in managing masses of data, can develop an understanding unavailable to limited human researchers. What people might fail to do, the fantastical summary machines can surely do. But deep structure might be the wrong goal, and no more than wishful thinking. Things as complicated as language, and human behaviour, routinely resist simple explanations. If we cannot isolate and explain deep structure ourselves, we cannot train an AI system to find it. To assume AI will find it for us, is rather like building a unicorn trap hoping to have it prove they exist. Nothing would please me more than to be wrong about this. A generative grammar of human behaviour would certainly make the life of an anthropologist like me a lot easier. But sadly, it seems this is only a dream.
Complicating matters is the problem that the tools, as remarkable as they are, are quite limited and error prone. This is in part because of the technical problems preventing them from hallucinating. But it is also a factor of their training process and the data we use to train them. First of all, training data is never clean. It is filled with contradictions, inaccuracies, and incomplete information. Even the internet, as vast and rich as it is, is not the sum total of human knowledge. It is just an index to what we’ve said about what we’ve learned during our time on this planet. So it isn’t even complete. The internet certainly is not a repository for human behaviour. It is just a manifestation of one kind of human activity. This means the training data we feed AI systems is not sufficient to provide insights beyond factoids of what people are talking about online.
Next, the question needs to be asked if summarization is an analytical method. Most of the current LLMs are able to summarize large documents, or clusters of documents. However, they often fail the most basic test of their conclusions. I, and many of my colleagues, have conducted simple tests where we input documents we know in full and then evaluate the summary. The first concerning feature of ChatGPT and Gemini’s efforts (these are the ones we used) is the lack of repeatability. They never do the exact summary twice. And these were not small variations, but often entirely different hierarchies, betraying an entirely altered analytical procedure. Repeatability is a key component of analysis. If you cannot show your work and explain why that conclusion was reached, then you do not have control of the knowledge generated. The lack of analytical transparency means the insights are questionable from the start.
Analysis in the social sciences is a very difficult thing. It requires a deep understanding of the history of a particular perspective and method. For example, understanding how a food trend on TikToc grows and influences people—something where the data is mostly available to an LLM—requires understanding discursive principles captured in practice theory, an ethnography of performance, foodways, the social production of meaning, discursive analysis, and transnational social formation at a minimum. Each of these have long histories, contradictions, good and bad examples, and analytical responses repudiating the methodologies. It involves selecting the authors in each of these analytical traditions to be emulated and those to be avoided. It involves understanding how to take inspiration from an analysis discussing political speech in France in the 1960s and connect it to the mechanics of identity formation amongst Bulgarian and Algerian football fans in the 1990s. Whilst these works seem unconnected, each of them contain a way to dig into a key focal point of an analysis about how influencers capture attention and elicit a response. A trained ethnographic analyst can do this work. An LLM cannot. In our experiment, we also tried training LLMs with the relevant social theories to see if it is possible to get an LLM to do this work. The results were mixed, as you would expect. And since we could not see how it managed the training data and then used it to apply the analysis, the results were not trustworthy. It used the language of these theories, but did not provide adequate results.
This is just the very beginning. To produce an explanation of what is going on in the thoughts and actions of a group of people also involves an assessment of context—something an AI neither has knowledge of, or has any ability to interact with. Much of what we can learn about the basics of human behaviour emerges out of a particular framing context. Without this context, true understanding is impossible. This context is often the very object of study, and is usually captured in the term “culture.” It exists between what is said, or in what is not said. It is only apparent after a degree of fluency in the culture has been achieved. It requires “being there,” in the words of anthropologist Clifford Geertz. AI systems cannot do any of this. They cannot read between the lines. They cannot immerse themselves in a culture. They cannot be there and experience something for themselves. Consequently, what they can be fed, interviews, documents, and recordings, are less than 90% of what is discoverable in that moment of contact with that group of people. So, they are working with too little information to credibly produce anything revealing about what is going on.
The lack of transparency into the process also invalidates the results—in fact, it invalidates the entire process. If we cannot understand how the insight was developed, we cannot understand what it is saying or what it means. Real analytical work in the social sciences is conducted under the oversight of the peer-review process. Half of an insights document, whether it is written for a business or academic audience, needs to be a statement of how the results were achieved. AI generated insights lack this. They lack the explanation that allows us to understand what is going on, where it is to be found elsewhere, and, most importantly, it lacks the answer to the question of why it is happening at all. As a consequence, they are weak factoids, and not insights. The fact they are accepted as insights at all is grounded on the misplaced trust we have in all-seeing machines and their mythical abilities to do what humans cannot. Scratch the surface with the questions that aid understanding, and they fall apart completely. Ultimately, summarization cannot be an analytical procedure, and clustering observations is an insufficient analytical process.
Insights about people, their behaviours, beliefs, actions, and thoughts, are nothing in themselves. What we are really doing when we deliver insights to a business audience is helping one group of people better understand another. The fundamental work is not to build a data set or to provide superficial factoids. It is to develop a connection between customers/users/patients and the audience in the business to help those in the company develop something for these people in the most efficient, impactful, and cost-effective way. The audience has to believe what they are being told, and they need to understand why they need to change what they are doing in light of what they learned. They need to understand what it means to think, act, and speak like the customers/users/patients. It means the insights do more work than just transfer knowledge: they are the catalyst for action within a company. This is the foundation of a radically human-centric method. Insights connect two groups of people in a productive way, eliminating barriers and misunderstandings, to enable coordinated action and mitigate failure. Using an LLM to do the bulk of the analysis is simply not human-centric. It means the connection is now mediated by a machine that does not understand. It cannot be a tool in connecting the two groups of people.
Putting a machine in this position demonstrates a lack of respect for people at both ends of the process. The use of these systems to generate insights results in the virtualization (and trivialization) of their lives and voices as they are reduced to a synthetic representation. But this is not the LLM’s problem. It is the fault of the users, as these are just tools. The failure lies in the hands of companies looking for efficiencies where they do not belong. It lies in the fetishization of technology and the desire to put it into places where it is not needed to achieve a good result. It also lies in the failure of the moment, where in a rush to participate in the AI bubble, for fear of being left out, many are allowing the continuation of multi-billion dollar implementation schemes (driven by a handful of firms) with the goal of dehumanizing organizations and the economy to achieve more efficient robotic production, performance and consumption metrics across all sectors of society.
AI insights are an effective intrusion of corporate-held technologies into a conversation between people. Their use as replacements for human effort is a denaturing of the very connection insights are meant to develop. While LLMs can be fantastic tools, and can play an important role in this process, setting them as direct alternatives to human analysis results in a profound error. Human insights are supposed to foster understanding between groups of people. They are meant to explain the seemingly unexplainable and build the bridges capable of turning two groups into a single community. This involves eliminating the objectification of people and allowing us to achieve a common understanding. It is the process of turning ‘us/them’ into just ‘us.’ The use of LLMs to do this work keeps the distance and adds another barrier: the idea that people are data points and not just fellow travellers through life. The only way we can get LLMs to do the work of ethnographizing people is to turn people into something they can understand. This means alienating ourselves entirely and becoming what the machine needs. We must turn ourselves into the data to be computed.
The other fundamental mistake of automated research is the idea that the answer is the purpose. This is like elementary students wanting to shirk the work of repetition. It isn’t the answer that matters, very often the value lies in the doing of the task. The task of understanding is the process of learning. There is no cheat code, no prompt, and no efficient shortcut to this. We need humans to do the work of introducing us to ourselves. We have to learn to use these AI tools better, not to assume they can or should do the work themselves. Human insights are a process, not a result. Using a machine to do the work means we eliminate the point all together.
So, are AI insights possible? No. They are not. If the LLM is used in tactical ways in a process led by human researchers, then yes. But then they are not AI insights. The LLM is simply a tool in that circumstance. This is the best place for it to be. It is best to think of AI, in any situation, as a means to a very limited end. Beyond that, the rest is meaningless hype. In a business context, it is a serious liability to trust an LLM to do any form of analysis regarding people. Those that claim otherwise present a danger to an organization by making too much of a simple tool, misusing it, and then claiming to be an alternative to real, careful, creative research.
We will explore the appropriate use of AI tools in the next article.
OTHER FIELD NOTES
FIELD NOTE
The Problem with AI Insights Part 1: The Wrong Way to Use the Tool
Paul Hartley
Oct 30, 2025
This is part one of a multi-part series on artificial intelligence and its problems, potential, and uses in developing insights for use in strategy, innovation, and design. This article starts by revealing some of the most important problems. The next articles in the series will provide suggestions for the solutions and appropriate use.
Much has been made recently in the news, marketing claims, and business chatter about the possibility of AI insights. New start-ups have popped into existence to provide this as a service. Agencies are promising AI-backed insights delivered almost instantaneously. And many new tools have been developed for use in companies craving quick, reliable information about what people want, and how they behave. A lot of this is built on the stunning successes of ChatGPT, Google’s Gemini, and other large language models (LLMs) that can seemingly generate knowledge out of thin air, summarize a mass of dull data, or generate an image of anything you can imagine. Amazing as these LLMs are, are they able to help us understand more about ourselves—and by extension understand something about the people we are trying to serve in a business setting? Or is a lot of this just part of the hype cycle and the claims of AI insights a fascination coupled with an overestimation of what’s possible?
The answer is yes to both. Some carefully built and monitored LLMs are capable of being a powerful tool in the search for useful knowledge about customers, users and patients. But much of what is being said about AI driven insights is just hype—dangerous hype at that.
The current state of AI insights is grounded in a tendency to prefer the new shiny object over its older alternatives. We all love a new toy, and because for most people who are not aware of the long history of AI research and development, these seem like they have emerged fully formed from nothing. However, all of these tools have a longer history, and they are actually carrying the baggage of what came before. First amongst these issues is the prevailing belief that AI is somehow intelligent as we conventionally understand it, and that these tools, being machines, are superior to humans in some ways. These beliefs come from the old claims of machines well before computing, during the industrial revolution where mechanical machines were sold as doing more than a single man. But they also come from the stories we have told in books, films, and television shows where AI beings are able to outclass their human counterparts. The idea that these LLMs, which are software systems and server farms, are able to outperform people in their analytical capacity is really just an assumption underpinned by stories from the past.
The second issue is one that gets the business world buzzing: efficiency. AI tools promise efficiencies in speed and scale. Using these tools in their current forms, we are already able to churn through clean and sensible data more quickly. We can summarize meetings, take notes more easily, generate images and videos that replace graphic designers and photographers, and eliminate entire business units with cloud-based tools. All of this is undeniably true, and mostly useful. But when it comes to understanding people and trying to develop new products and services for them, we have to ask the question, does efficiency provide us with something more valuable than what we had before? Realistically, no. It does not. And the reason is because it is not a lack of speed that makes the task difficult. Good insights about humans, their thoughts, actions, and beliefs require patience and persistence, two things not offen associated with immediacy. Moreover, the efficiencies LLMs provide in a research and analysis process are actually quite limited. One can process more inputs, but only in small tightly controlled environments. Mostly this is about data management, and as a consequence, the preparations make rapidity impractical. The training process to accomplish a task is long and difficult. Just jamming a bunch of interviews into ChatGPT is not an analysis process. It is just an exercise in summarization, fraught with hallucinations, mistakes, and fundamental misunderstanding. The efficiencies AI can provide are undone by the errors and the arduous task of training it to do it properly.
Whether AI-driven insights are perceived as possible actually turns on non-technological points. Fundamentally, research about humans only begins in the discovery of facts and observations, where researching people and their actions is mostly an analytical procedure. If AI is even capable of developing insights about people that are more than mere factoids is a question answered in examining research methods and in the true structure and workings of an LLM. This is more of a question of design and AI’s use as a tool in what is still a fundamentally human-directed effort.
Starting in the 1950s, linguistic scholars like Noam Chomsky (yes, that Chomsky), Zellig S. Harris, Morris Halle, and others considered the idea that all of language, even the act of speaking, has some underlying code. They went looking for the generative grammar of language, a set of rules that were sufficient to explain how a brain can generate a finite, but incredibly vast and complex web of utterances, and to show the deep structure of all languages. While they were able to revolutionize linguistics, and ultimately provide many of the underpinnings for the semantic theories that contributed to natural language process (NLP) and LLM structures, they never did manage to find the laws of motion for language and speech. Their task failed to find these laws in language and completely missed how a complex system gives rise to a nearly infinite variation of possible things to say. Despite their failure to do so, the idea this was possible somewhat stuck around, reappearing in the promises of big data. The idea being that through all of the noise of a seemingly infinite amount of data, there were some basic laws that held everything together. While the methods changed, the idea stuck around, and attached itself to artificial intelligence.
This is the same idea lurking at the core of AI insights. While the desire is still there, the ability to fulfill this promise is entirely missing. There is still no strong indication of any deep logic holding human behaviour together—at least, none that serves to explain how and why people behave beyond the most superficial details. However, the belief that AI is capable of this task, or something close to it, is part of the promise of AI insights. The idea is that machines, with their superior abilities in managing masses of data, can develop an understanding unavailable to limited human researchers. What people might fail to do, the fantastical summary machines can surely do. But deep structure might be the wrong goal, and no more than wishful thinking. Things as complicated as language, and human behaviour, routinely resist simple explanations. If we cannot isolate and explain deep structure ourselves, we cannot train an AI system to find it. To assume AI will find it for us, is rather like building a unicorn trap hoping to have it prove they exist. Nothing would please me more than to be wrong about this. A generative grammar of human behaviour would certainly make the life of an anthropologist like me a lot easier. But sadly, it seems this is only a dream.
Complicating matters is the problem that the tools, as remarkable as they are, are quite limited and error prone. This is in part because of the technical problems preventing them from hallucinating. But it is also a factor of their training process and the data we use to train them. First of all, training data is never clean. It is filled with contradictions, inaccuracies, and incomplete information. Even the internet, as vast and rich as it is, is not the sum total of human knowledge. It is just an index to what we’ve said about what we’ve learned during our time on this planet. So it isn’t even complete. The internet certainly is not a repository for human behaviour. It is just a manifestation of one kind of human activity. This means the training data we feed AI systems is not sufficient to provide insights beyond factoids of what people are talking about online.
Next, the question needs to be asked if summarization is an analytical method. Most of the current LLMs are able to summarize large documents, or clusters of documents. However, they often fail the most basic test of their conclusions. I, and many of my colleagues, have conducted simple tests where we input documents we know in full and then evaluate the summary. The first concerning feature of ChatGPT and Gemini’s efforts (these are the ones we used) is the lack of repeatability. They never do the exact summary twice. And these were not small variations, but often entirely different hierarchies, betraying an entirely altered analytical procedure. Repeatability is a key component of analysis. If you cannot show your work and explain why that conclusion was reached, then you do not have control of the knowledge generated. The lack of analytical transparency means the insights are questionable from the start.
Analysis in the social sciences is a very difficult thing. It requires a deep understanding of the history of a particular perspective and method. For example, understanding how a food trend on TikToc grows and influences people—something where the data is mostly available to an LLM—requires understanding discursive principles captured in practice theory, an ethnography of performance, foodways, the social production of meaning, discursive analysis, and transnational social formation at a minimum. Each of these have long histories, contradictions, good and bad examples, and analytical responses repudiating the methodologies. It involves selecting the authors in each of these analytical traditions to be emulated and those to be avoided. It involves understanding how to take inspiration from an analysis discussing political speech in France in the 1960s and connect it to the mechanics of identity formation amongst Bulgarian and Algerian football fans in the 1990s. Whilst these works seem unconnected, each of them contain a way to dig into a key focal point of an analysis about how influencers capture attention and elicit a response. A trained ethnographic analyst can do this work. An LLM cannot. In our experiment, we also tried training LLMs with the relevant social theories to see if it is possible to get an LLM to do this work. The results were mixed, as you would expect. And since we could not see how it managed the training data and then used it to apply the analysis, the results were not trustworthy. It used the language of these theories, but did not provide adequate results.
This is just the very beginning. To produce an explanation of what is going on in the thoughts and actions of a group of people also involves an assessment of context—something an AI neither has knowledge of, or has any ability to interact with. Much of what we can learn about the basics of human behaviour emerges out of a particular framing context. Without this context, true understanding is impossible. This context is often the very object of study, and is usually captured in the term “culture.” It exists between what is said, or in what is not said. It is only apparent after a degree of fluency in the culture has been achieved. It requires “being there,” in the words of anthropologist Clifford Geertz. AI systems cannot do any of this. They cannot read between the lines. They cannot immerse themselves in a culture. They cannot be there and experience something for themselves. Consequently, what they can be fed, interviews, documents, and recordings, are less than 90% of what is discoverable in that moment of contact with that group of people. So, they are working with too little information to credibly produce anything revealing about what is going on.
The lack of transparency into the process also invalidates the results—in fact, it invalidates the entire process. If we cannot understand how the insight was developed, we cannot understand what it is saying or what it means. Real analytical work in the social sciences is conducted under the oversight of the peer-review process. Half of an insights document, whether it is written for a business or academic audience, needs to be a statement of how the results were achieved. AI generated insights lack this. They lack the explanation that allows us to understand what is going on, where it is to be found elsewhere, and, most importantly, it lacks the answer to the question of why it is happening at all. As a consequence, they are weak factoids, and not insights. The fact they are accepted as insights at all is grounded on the misplaced trust we have in all-seeing machines and their mythical abilities to do what humans cannot. Scratch the surface with the questions that aid understanding, and they fall apart completely. Ultimately, summarization cannot be an analytical procedure, and clustering observations is an insufficient analytical process.
Insights about people, their behaviours, beliefs, actions, and thoughts, are nothing in themselves. What we are really doing when we deliver insights to a business audience is helping one group of people better understand another. The fundamental work is not to build a data set or to provide superficial factoids. It is to develop a connection between customers/users/patients and the audience in the business to help those in the company develop something for these people in the most efficient, impactful, and cost-effective way. The audience has to believe what they are being told, and they need to understand why they need to change what they are doing in light of what they learned. They need to understand what it means to think, act, and speak like the customers/users/patients. It means the insights do more work than just transfer knowledge: they are the catalyst for action within a company. This is the foundation of a radically human-centric method. Insights connect two groups of people in a productive way, eliminating barriers and misunderstandings, to enable coordinated action and mitigate failure. Using an LLM to do the bulk of the analysis is simply not human-centric. It means the connection is now mediated by a machine that does not understand. It cannot be a tool in connecting the two groups of people.
Putting a machine in this position demonstrates a lack of respect for people at both ends of the process. The use of these systems to generate insights results in the virtualization (and trivialization) of their lives and voices as they are reduced to a synthetic representation. But this is not the LLM’s problem. It is the fault of the users, as these are just tools. The failure lies in the hands of companies looking for efficiencies where they do not belong. It lies in the fetishization of technology and the desire to put it into places where it is not needed to achieve a good result. It also lies in the failure of the moment, where in a rush to participate in the AI bubble, for fear of being left out, many are allowing the continuation of multi-billion dollar implementation schemes (driven by a handful of firms) with the goal of dehumanizing organizations and the economy to achieve more efficient robotic production, performance and consumption metrics across all sectors of society.
AI insights are an effective intrusion of corporate-held technologies into a conversation between people. Their use as replacements for human effort is a denaturing of the very connection insights are meant to develop. While LLMs can be fantastic tools, and can play an important role in this process, setting them as direct alternatives to human analysis results in a profound error. Human insights are supposed to foster understanding between groups of people. They are meant to explain the seemingly unexplainable and build the bridges capable of turning two groups into a single community. This involves eliminating the objectification of people and allowing us to achieve a common understanding. It is the process of turning ‘us/them’ into just ‘us.’ The use of LLMs to do this work keeps the distance and adds another barrier: the idea that people are data points and not just fellow travellers through life. The only way we can get LLMs to do the work of ethnographizing people is to turn people into something they can understand. This means alienating ourselves entirely and becoming what the machine needs. We must turn ourselves into the data to be computed.
The other fundamental mistake of automated research is the idea that the answer is the purpose. This is like elementary students wanting to shirk the work of repetition. It isn’t the answer that matters, very often the value lies in the doing of the task. The task of understanding is the process of learning. There is no cheat code, no prompt, and no efficient shortcut to this. We need humans to do the work of introducing us to ourselves. We have to learn to use these AI tools better, not to assume they can or should do the work themselves. Human insights are a process, not a result. Using a machine to do the work means we eliminate the point all together.
So, are AI insights possible? No. They are not. If the LLM is used in tactical ways in a process led by human researchers, then yes. But then they are not AI insights. The LLM is simply a tool in that circumstance. This is the best place for it to be. It is best to think of AI, in any situation, as a means to a very limited end. Beyond that, the rest is meaningless hype. In a business context, it is a serious liability to trust an LLM to do any form of analysis regarding people. Those that claim otherwise present a danger to an organization by making too much of a simple tool, misusing it, and then claiming to be an alternative to real, careful, creative research.
We will explore the appropriate use of AI tools in the next article.
OTHER FIELD NOTES
FIELD NOTE
The Problem with AI Insights Part 1: The Wrong Way to Use the Tool
Paul Hartley
Oct 30, 2025
This is part one of a multi-part series on artificial intelligence and its problems, potential, and uses in developing insights for use in strategy, innovation, and design. This article starts by revealing some of the most important problems. The next articles in the series will provide suggestions for the solutions and appropriate use.
Much has been made recently in the news, marketing claims, and business chatter about the possibility of AI insights. New start-ups have popped into existence to provide this as a service. Agencies are promising AI-backed insights delivered almost instantaneously. And many new tools have been developed for use in companies craving quick, reliable information about what people want, and how they behave. A lot of this is built on the stunning successes of ChatGPT, Google’s Gemini, and other large language models (LLMs) that can seemingly generate knowledge out of thin air, summarize a mass of dull data, or generate an image of anything you can imagine. Amazing as these LLMs are, are they able to help us understand more about ourselves—and by extension understand something about the people we are trying to serve in a business setting? Or is a lot of this just part of the hype cycle and the claims of AI insights a fascination coupled with an overestimation of what’s possible?
The answer is yes to both. Some carefully built and monitored LLMs are capable of being a powerful tool in the search for useful knowledge about customers, users and patients. But much of what is being said about AI driven insights is just hype—dangerous hype at that.
The current state of AI insights is grounded in a tendency to prefer the new shiny object over its older alternatives. We all love a new toy, and because for most people who are not aware of the long history of AI research and development, these seem like they have emerged fully formed from nothing. However, all of these tools have a longer history, and they are actually carrying the baggage of what came before. First amongst these issues is the prevailing belief that AI is somehow intelligent as we conventionally understand it, and that these tools, being machines, are superior to humans in some ways. These beliefs come from the old claims of machines well before computing, during the industrial revolution where mechanical machines were sold as doing more than a single man. But they also come from the stories we have told in books, films, and television shows where AI beings are able to outclass their human counterparts. The idea that these LLMs, which are software systems and server farms, are able to outperform people in their analytical capacity is really just an assumption underpinned by stories from the past.
The second issue is one that gets the business world buzzing: efficiency. AI tools promise efficiencies in speed and scale. Using these tools in their current forms, we are already able to churn through clean and sensible data more quickly. We can summarize meetings, take notes more easily, generate images and videos that replace graphic designers and photographers, and eliminate entire business units with cloud-based tools. All of this is undeniably true, and mostly useful. But when it comes to understanding people and trying to develop new products and services for them, we have to ask the question, does efficiency provide us with something more valuable than what we had before? Realistically, no. It does not. And the reason is because it is not a lack of speed that makes the task difficult. Good insights about humans, their thoughts, actions, and beliefs require patience and persistence, two things not offen associated with immediacy. Moreover, the efficiencies LLMs provide in a research and analysis process are actually quite limited. One can process more inputs, but only in small tightly controlled environments. Mostly this is about data management, and as a consequence, the preparations make rapidity impractical. The training process to accomplish a task is long and difficult. Just jamming a bunch of interviews into ChatGPT is not an analysis process. It is just an exercise in summarization, fraught with hallucinations, mistakes, and fundamental misunderstanding. The efficiencies AI can provide are undone by the errors and the arduous task of training it to do it properly.
Whether AI-driven insights are perceived as possible actually turns on non-technological points. Fundamentally, research about humans only begins in the discovery of facts and observations, where researching people and their actions is mostly an analytical procedure. If AI is even capable of developing insights about people that are more than mere factoids is a question answered in examining research methods and in the true structure and workings of an LLM. This is more of a question of design and AI’s use as a tool in what is still a fundamentally human-directed effort.
Starting in the 1950s, linguistic scholars like Noam Chomsky (yes, that Chomsky), Zellig S. Harris, Morris Halle, and others considered the idea that all of language, even the act of speaking, has some underlying code. They went looking for the generative grammar of language, a set of rules that were sufficient to explain how a brain can generate a finite, but incredibly vast and complex web of utterances, and to show the deep structure of all languages. While they were able to revolutionize linguistics, and ultimately provide many of the underpinnings for the semantic theories that contributed to natural language process (NLP) and LLM structures, they never did manage to find the laws of motion for language and speech. Their task failed to find these laws in language and completely missed how a complex system gives rise to a nearly infinite variation of possible things to say. Despite their failure to do so, the idea this was possible somewhat stuck around, reappearing in the promises of big data. The idea being that through all of the noise of a seemingly infinite amount of data, there were some basic laws that held everything together. While the methods changed, the idea stuck around, and attached itself to artificial intelligence.
This is the same idea lurking at the core of AI insights. While the desire is still there, the ability to fulfill this promise is entirely missing. There is still no strong indication of any deep logic holding human behaviour together—at least, none that serves to explain how and why people behave beyond the most superficial details. However, the belief that AI is capable of this task, or something close to it, is part of the promise of AI insights. The idea is that machines, with their superior abilities in managing masses of data, can develop an understanding unavailable to limited human researchers. What people might fail to do, the fantastical summary machines can surely do. But deep structure might be the wrong goal, and no more than wishful thinking. Things as complicated as language, and human behaviour, routinely resist simple explanations. If we cannot isolate and explain deep structure ourselves, we cannot train an AI system to find it. To assume AI will find it for us, is rather like building a unicorn trap hoping to have it prove they exist. Nothing would please me more than to be wrong about this. A generative grammar of human behaviour would certainly make the life of an anthropologist like me a lot easier. But sadly, it seems this is only a dream.
Complicating matters is the problem that the tools, as remarkable as they are, are quite limited and error prone. This is in part because of the technical problems preventing them from hallucinating. But it is also a factor of their training process and the data we use to train them. First of all, training data is never clean. It is filled with contradictions, inaccuracies, and incomplete information. Even the internet, as vast and rich as it is, is not the sum total of human knowledge. It is just an index to what we’ve said about what we’ve learned during our time on this planet. So it isn’t even complete. The internet certainly is not a repository for human behaviour. It is just a manifestation of one kind of human activity. This means the training data we feed AI systems is not sufficient to provide insights beyond factoids of what people are talking about online.
Next, the question needs to be asked if summarization is an analytical method. Most of the current LLMs are able to summarize large documents, or clusters of documents. However, they often fail the most basic test of their conclusions. I, and many of my colleagues, have conducted simple tests where we input documents we know in full and then evaluate the summary. The first concerning feature of ChatGPT and Gemini’s efforts (these are the ones we used) is the lack of repeatability. They never do the exact summary twice. And these were not small variations, but often entirely different hierarchies, betraying an entirely altered analytical procedure. Repeatability is a key component of analysis. If you cannot show your work and explain why that conclusion was reached, then you do not have control of the knowledge generated. The lack of analytical transparency means the insights are questionable from the start.
Analysis in the social sciences is a very difficult thing. It requires a deep understanding of the history of a particular perspective and method. For example, understanding how a food trend on TikToc grows and influences people—something where the data is mostly available to an LLM—requires understanding discursive principles captured in practice theory, an ethnography of performance, foodways, the social production of meaning, discursive analysis, and transnational social formation at a minimum. Each of these have long histories, contradictions, good and bad examples, and analytical responses repudiating the methodologies. It involves selecting the authors in each of these analytical traditions to be emulated and those to be avoided. It involves understanding how to take inspiration from an analysis discussing political speech in France in the 1960s and connect it to the mechanics of identity formation amongst Bulgarian and Algerian football fans in the 1990s. Whilst these works seem unconnected, each of them contain a way to dig into a key focal point of an analysis about how influencers capture attention and elicit a response. A trained ethnographic analyst can do this work. An LLM cannot. In our experiment, we also tried training LLMs with the relevant social theories to see if it is possible to get an LLM to do this work. The results were mixed, as you would expect. And since we could not see how it managed the training data and then used it to apply the analysis, the results were not trustworthy. It used the language of these theories, but did not provide adequate results.
This is just the very beginning. To produce an explanation of what is going on in the thoughts and actions of a group of people also involves an assessment of context—something an AI neither has knowledge of, or has any ability to interact with. Much of what we can learn about the basics of human behaviour emerges out of a particular framing context. Without this context, true understanding is impossible. This context is often the very object of study, and is usually captured in the term “culture.” It exists between what is said, or in what is not said. It is only apparent after a degree of fluency in the culture has been achieved. It requires “being there,” in the words of anthropologist Clifford Geertz. AI systems cannot do any of this. They cannot read between the lines. They cannot immerse themselves in a culture. They cannot be there and experience something for themselves. Consequently, what they can be fed, interviews, documents, and recordings, are less than 90% of what is discoverable in that moment of contact with that group of people. So, they are working with too little information to credibly produce anything revealing about what is going on.
The lack of transparency into the process also invalidates the results—in fact, it invalidates the entire process. If we cannot understand how the insight was developed, we cannot understand what it is saying or what it means. Real analytical work in the social sciences is conducted under the oversight of the peer-review process. Half of an insights document, whether it is written for a business or academic audience, needs to be a statement of how the results were achieved. AI generated insights lack this. They lack the explanation that allows us to understand what is going on, where it is to be found elsewhere, and, most importantly, it lacks the answer to the question of why it is happening at all. As a consequence, they are weak factoids, and not insights. The fact they are accepted as insights at all is grounded on the misplaced trust we have in all-seeing machines and their mythical abilities to do what humans cannot. Scratch the surface with the questions that aid understanding, and they fall apart completely. Ultimately, summarization cannot be an analytical procedure, and clustering observations is an insufficient analytical process.
Insights about people, their behaviours, beliefs, actions, and thoughts, are nothing in themselves. What we are really doing when we deliver insights to a business audience is helping one group of people better understand another. The fundamental work is not to build a data set or to provide superficial factoids. It is to develop a connection between customers/users/patients and the audience in the business to help those in the company develop something for these people in the most efficient, impactful, and cost-effective way. The audience has to believe what they are being told, and they need to understand why they need to change what they are doing in light of what they learned. They need to understand what it means to think, act, and speak like the customers/users/patients. It means the insights do more work than just transfer knowledge: they are the catalyst for action within a company. This is the foundation of a radically human-centric method. Insights connect two groups of people in a productive way, eliminating barriers and misunderstandings, to enable coordinated action and mitigate failure. Using an LLM to do the bulk of the analysis is simply not human-centric. It means the connection is now mediated by a machine that does not understand. It cannot be a tool in connecting the two groups of people.
Putting a machine in this position demonstrates a lack of respect for people at both ends of the process. The use of these systems to generate insights results in the virtualization (and trivialization) of their lives and voices as they are reduced to a synthetic representation. But this is not the LLM’s problem. It is the fault of the users, as these are just tools. The failure lies in the hands of companies looking for efficiencies where they do not belong. It lies in the fetishization of technology and the desire to put it into places where it is not needed to achieve a good result. It also lies in the failure of the moment, where in a rush to participate in the AI bubble, for fear of being left out, many are allowing the continuation of multi-billion dollar implementation schemes (driven by a handful of firms) with the goal of dehumanizing organizations and the economy to achieve more efficient robotic production, performance and consumption metrics across all sectors of society.
AI insights are an effective intrusion of corporate-held technologies into a conversation between people. Their use as replacements for human effort is a denaturing of the very connection insights are meant to develop. While LLMs can be fantastic tools, and can play an important role in this process, setting them as direct alternatives to human analysis results in a profound error. Human insights are supposed to foster understanding between groups of people. They are meant to explain the seemingly unexplainable and build the bridges capable of turning two groups into a single community. This involves eliminating the objectification of people and allowing us to achieve a common understanding. It is the process of turning ‘us/them’ into just ‘us.’ The use of LLMs to do this work keeps the distance and adds another barrier: the idea that people are data points and not just fellow travellers through life. The only way we can get LLMs to do the work of ethnographizing people is to turn people into something they can understand. This means alienating ourselves entirely and becoming what the machine needs. We must turn ourselves into the data to be computed.
The other fundamental mistake of automated research is the idea that the answer is the purpose. This is like elementary students wanting to shirk the work of repetition. It isn’t the answer that matters, very often the value lies in the doing of the task. The task of understanding is the process of learning. There is no cheat code, no prompt, and no efficient shortcut to this. We need humans to do the work of introducing us to ourselves. We have to learn to use these AI tools better, not to assume they can or should do the work themselves. Human insights are a process, not a result. Using a machine to do the work means we eliminate the point all together.
So, are AI insights possible? No. They are not. If the LLM is used in tactical ways in a process led by human researchers, then yes. But then they are not AI insights. The LLM is simply a tool in that circumstance. This is the best place for it to be. It is best to think of AI, in any situation, as a means to a very limited end. Beyond that, the rest is meaningless hype. In a business context, it is a serious liability to trust an LLM to do any form of analysis regarding people. Those that claim otherwise present a danger to an organization by making too much of a simple tool, misusing it, and then claiming to be an alternative to real, careful, creative research.
We will explore the appropriate use of AI tools in the next article.
OTHER FIELD NOTES