AI has quickly become part of the everyday toolkit for marketers and analysts. From content generation to research assistance, general-purpose AI tools like ChatGPT and Grok promise to make complex work faster, easier, and more scalable.
And in many ways, they do.
After spending significant time using modern AI models—particularly ChatGPT 5.2—to analyze large datasets contained in Excel spreadsheets, a clear pattern has emerged: standard AI tools are incredibly powerful in some areas of data analysis, and surprisingly unreliable in others.
This article isn’t a critique of AI. It’s a firsthand account of where these tools excel, where they struggle, and how to use them more effectively in real-world marketing and data workflows.
What Do You Mean by “Standard AI Tools”?
For clarity, this discussion focuses on general-purpose, chat-based AI tools—the kind most marketers and everyday users now have easy access to.
They are perhaps best described by what they are not and they are not purpose-built analytics platforms, statistical engines, or domain-specific machine learning models.
Where Standard AI Tools Truly Shine: Data Harvesting and Workflow Design
One of the most impressive strengths of modern AI is not just analyzing data—but figuring out how to collect it in the first place.
In one project, I was working with a large list of companies and wanted to gather keyword data directly from their websites. The original approach was manual and inefficient: visiting sites, extracting information, and compiling results by hand. That would take a lifetime given I was dealing with upwards of 10,000 companies.
I asked ChatGPT to crawl the sites for me. However, even with a paid account, there are token limitations to consider so ChatGPT proposed a solution I hadn’t even considered: let’s just code out a web crawler using Python. Honestly, a great idea and something I didn’t even think of – crazy stuff, I was blown away. I said yes please do that and it kicked out the code in about 10 seconds with folders, file sets, dependencies – fricken everything.
The crawler would:
- Visit each company website automatically
- Extract keyword-related content (H1, H2 headers, some text content)
- Compile the data into a structured dataset in a CSV.
The script worked extremely well. It was fast, scalable, and produced clean raw data ready for analysis. This is where standard AI tools are genuinely transformative. Not only did ChatGPT think of the best solution given the constraints. Brilliant stuff, seriously.
In this phase of the process, AI wasn’t just helpful—it was a force multiplier, a problem solver.
Where Things Start to Break: Contextual Classification at Scale
Ok, so I’ve got the rew crawl data. Great. However, problems began to emerge when I asked it to actually analyze the data in a contextual way. It seems that when these AI tools are asked to move from exploration into high-volume, judgment-based classification – they start to struggle a bit.
Returning to the keyword dataset example, the next step was to categorize companies based on the keyword data collected from their websites.
Two approaches were tested:
- Direct keyword matching (rule-based logic)
- Contextual classification using AI reasoning
The results were interesting.
When classification relied on direct keyword matches, the results were consistent and reliable. I would provide keyword lists to use as references, and it would use those against the crawler data to classify the companies. However, when the task shifted to using AI’s “smarts” to infer categories contextually, performance degraded – significantly.
The AI struggled to consistently reference and weigh the raw keyword data when making higher-level decisions. In other words, the more discretion the task required, the less dependable the output became.
The Company Matching Problem: When AI Gets Confident—and Wrong
Another revealing example involved comparing two large company databases from different sources to identify overlap.
At first, the task seemed straightforward:
- Match companies with identical names
- Match companies with identical website URLs
In these cases, AI performed exactly as expected.
Where things became interesting—and risky—was in fuzzy matching, where company names didn’t perfectly align. For example:
- “ABC Construction”
- “ABC Construction Inc.”
Here, AI provided real value. It could reasonably infer that these were likely the same entity, saving a significant amount of manual review time.
But then came the failures.
In one instance, ChatGPT concluded—with 85% certainty—that Indiana University and Purdue University were the same organization.
They are not.
This exposed a critical limitation: AI confidence is not a proxy for correctness. In large datasets, even a small percentage of confident errors can undermine the entire analysis.
The Core Limitation: Discretionary Judgment and Anything at Scale
Across these examples, a consistent limitation emerges.
Standard AI tools struggle when judgement must be relied on repeatedly and at scale. I don’t have any proof – but I suspect this has something to do with processing and token limitations of any given prompt/thinking session. It starts taking short cuts to save time and token usage. As a result, the outcome is just less accurate.
And it seems that “scale” has a huge impact. When I would feed ChatGPT a single website URL and ask it to classify it into my pre-compiled lists – it nailed it every single time. Ask Chat to do the same on a list of 1000 URLs contained in an excel file? Nope. It just takes short cuts again – even when I tell it to use the same process and take its time. It just doesn’t work.
This is probably because these tools are optimized for conversational reasoning, not for analyzing data at scale. They reason probabilistically, not statistically. And they lack persistent mechanisms to validate assumptions against ground truth at scale.
As a result, they can sound persuasive while being wrong—sometimes very wrong.
Why Purpose-Built AI Tools Perform Better
In the end, it’s probably unfair to task an AI chat bot to perform these kinds of tasks at scale. They just weren’t built for this type of work. This is where specialized AI tools come into play.
Outside of standard chatbots, there are AI systems specifically designed for:
• Entity resolution
• Record linkage
• Classification and clustering
• Data validation and confidence scoring
These tools are built with:
• Structured logic
• Repeatable evaluation criteria
• Transparent thresholds
• Auditable outputs
I’ve got some research to do into some more purpose-built tools for this (I’ve started – and I’ll do a follow-up blog on this when I find one!). Feeding my data into these tools would have been the smarter path forward.
Final Thoughts: Powerful Tools, Clear Boundaries
Standard AI tools like chat bots represent a massive leap forward in accessibility and productivity. When used correctly, they can dramatically accelerate data analysis and uncover insights that would otherwise take far longer to reach.
But they are not universal solutions.
The future of data analysis isn’t about choosing one AI tool—it’s about knowing which tool belongs at which stage of the process.
When dealing with large Excel data sets, marketers and analysts should understand the limitations of these tools and look for something more purpose built – learn from my experience.
