Do humans hate being AI-ded?

Aug 30, 2024

Artificial Intelligence (AI) is advancing rapidly, but integrating it into our daily routines and work environments is still difficult. While there's a lot of buzz around AI, especially with big language models and generative AI, many people and businesses are struggling to see the promised benefits. This article explores where AI adoption stands today, looking at common misconceptions and practical obstacles. By understanding both the limitations and the opportunities of AI, we can work more effectively with AI in a variety of settings.

Business transformation efforts involving AI are hitting roadblocks, and many people are tired of the constant talk about AI, especially since they don't see many concrete benefits. In fact, using the term "artificial intelligence" in product descriptions can actually decrease purchase intent1. This suggests a bias against AI - these systems often fail to perform routine tasks as expected, and when they do, the results are often less than ideal. So do we hate the AI tools that are supposed to help us, or are we simply not using them correctly?

Common misconceptions

Since the release of ChatGPT, countless articles and guides have been written (or generated) on how to craft the perfect prompts - often referred to as "prompt engineering" - to get the desired output. Much of this advice begins with phrases like "What most people get wrong about AI". Given how mysterious the system is, it's no surprise that we're all trying to figure out what's going on inside this black box.

You're not alone if you haven't seen the promised productivity gains. For a while, it seemed unclear what specific problem generative AI (GAI) could actually solve. This uncertainty is similar to what Benedict Evans pointed out when he questioned whether we're facing another bubble, similar to the dot-com boom or the blockchain hype.2 The problem of unpredictable outcomes leads to frustration and eventual abandonment. As a result, usage of the tool declines because it doesn't live up to inflated expectations.

After two years, generative AI hasn't lived up to the high expectations some of us had. And there's still no sign of AGI (artificial general intelligence). Meanwhile, some are warning against unbridled AI development, likening the competition to a digital arms race among tech giants.3 According to experts from Bain Company: “Generative AI is not producing nearly the economic value or boost to productivity we were told it would. In fact, it’s fairly likely BigTech is wasting Billions and the energy (and water) this gamble for AI supremacy requires, a commitment of Capex that’s nearly unimaginable to previous tech cycles.”4

At a conference focused on generative AI success stories, Ethan Mollick wryly pointed out that these systems are inherently weird, while corporate America is trying to make them less so.5 As companies like Google and Microsoft work to embed AI into every possible ecosystem, we may see some surprising results. This could finally reveal what AI is truly capable of. But in building these tools, we can't avoid the hard work of figuring out where AI really fits in the market.The reason is clear: So far, large language models (LLMs) are generating huge costs, while their disruptive value is being questioned by potential users. “Generative AI is unprofitable, unsustainable, and fundamentally limited in what it can do thanks to the fact that it's probabilistically generating an answer.”6

Organizations looking to deploy enterprise products and increase productivity face a critical challenge. They often struggle to identify the best use cases for AI. On the other hand it was observed that “Individual workers, who are keenly aware of their problems and can experiment a lot with alternate ways of solving them, are far more likely to find powerful and targeted uses.“7 Foundation models, which can learn from minimal examples, have shown the ability to perform in-context learning and even generalize without specific examples.

But what's most surprising is that even the creators of these models often don't fully understand what they're capable of. People who aren't LLM researchers have been able to uncover and exploit capabilities simply by spending more time with the models and applying them to domains that researchers might overlook, as for example, Jasper's use of AI in copywriting.8

No one can predict exactly how much efficiency AI can bring to an organization, especially for a particular employee performing specific tasks. These tasks are often shaped not by job descriptions or best practices, but by how the individual performs their work, often finding workarounds due to software that isn't fully integrated or analog processes that haven't been digitized. This issue is further complicated by the power distance within organizations. While it's relatively easy to negotiate for the best tool on the market, the decision makers are not the ones who use those tools every day. Factors such as integration with other tools, dependencies, licensing costs, and employee training all need to be carefully considered.

This leads us to the realization that the main obstacle to adopting AI assistants in the workplace is our understanding of what they should do. It's a rational conclusion, given that we've been conditioned to expect perfect answers after two decades of using Google search. But LLMs are different. As Doug Shapiro points out, they are “primarily concept machines, not answer machines. (…) Instead, they are very well suited to the opposite: conceptual, low-stakes, iterative tasks where the quality of output is easily verifiable.”9 Their randomness and unpredictability give them an almost human-like behavior. After all, they have been trained on a vast collection of human writing, complete with inherent biases and errors-a mirror image of how we think.

The roots of our perception

When smartphones first came out, the need for apps wasn't immediately obvious. But today, most people rarely use their phones to make calls. The creators of these devices launched them with the hope that users would explore and figure out the best ways to use them. This approach is common among disruptive startups, which aim to find the right market fit by tweaking their products based on user feedback. Often, this means cutting certain features or changing the product entirely to make it profitable. For example, YouTube originally started as a video dating app, and TikTok started as a tool for creating lip dubs.

Similarly, companies looking to reduce customer support labor costs have turned to automated solutions such as chatbots. However, these bots often struggle with unusual scenarios because they are designed to handle more common issues based on predefined scripts. As a result, when users faced unique issues, they had to request to speak with a real person. Even then, human support agents sometimes found it difficult to decide which script or policy to follow.

The idea of providing a chat interface to AI systems seems obvious now. It allows us to interact directly with a vast repository of knowledge, asking questions as if we were talking to a real person. This approach is far more effective than the earlier concept of home assistants. The limitations of early models like Alexa and Siri, which lacked capabilities and struggled with understanding, made many people skeptical of AI. These early missteps gave the impression that AI was a flawed concept for quite some time - until advances proved otherwise.

It’s quite ironic that for the past 20 years we've been at the mercy of Google search, which has conditioned us to think in terms of reverse engineering our questions to find answers in search results. We became adept at crafting complex keyword strings and using Boolean syntax to navigate the web. As natural language processing advanced, however, the need for such complex search methods diminished, allowing for simpler and more intuitive queries.

Google, secure in its dominant market position and the advertising revenue it generated, seemed to lack the impetus to innovate further. It slept on the rise of its competitors (according to Eric Schmidt, because of its work-from-home policy10). However, this traditional approach to Web search is now showing signs of obsolescence. Emerging solutions like Perplexity show what Google could have become for the majority of users - providing direct, accurate, and reliable answers, rather than just a list of potential resources.

What LLMs are not

We tend to have high expectations for what a true AI assistant should do, often expecting something almost godlike - an entity that never errs, never misleads, and always delivers exactly what it promises. It's no surprise, then, that there have been many disappointments with the results these AI systems produce. Issues such as hallucinations, bias, and the constraints imposed by community guidelines have certainly not helped manage these expectations.

Language models like ChatGPT (where GPT stands for generative pretrained transformer) didn't just appear out of nowhere. They are the result of years of development in natural language processing, which has gained significant momentum since the late 2010s. The key driver behind this progress was the introduction of transformer architectures, first described in the influential paper "Attention Is All You Need" by the Google Brain team.11

Ermira Murati, CTO of OpenAI, wrote in her publication “Language & Coding Creativity” that: “Darwin saw the drive to acquire language as “the instinctive tendency to acquire an art,” to communicate by some medium. No baby has ever needed a book of grammar to learn a language. They absorb what they hear and through the maze of the mind play it back. People spend their lives speaking exquisitely without understanding a subjunctive clause or taking a position on split infinitives. A child learns by experiencing patterns, learning what is most likely to make sense in a new context.”12

The same problems can be observed with forgetfulness and lack of consistency in language models due to their low context memory. Much like Google returns different results for the same search query based on factors such as your profile, location, and cookies, ChatGPT generates different responses based on the context of the conversation and the specific wording of your query. You can ask the same question three times and get three different answers because the tool doesn't pull from a fixed database; instead, it generates a new answer each time.

It's important to remember that ChatGPT (and similar models) is not a calculator. You might assume that it should excel at tasks that machines are typically good at, such as performing precise calculations. After all, the calculator was the precursor to the computer, and the future of AI was once defined by machines like Deep Blue beating humans at chess-a feat that depended on sheer computational power. However, the core of Large Language Models (LLMs) lies in understanding context through language, using probabilities to predict and generate text. As with any probabilistic system, the results aren't always right, but they aren't completely wrong either.

The problem with LLMs is that they weren't designed to do math, because they work with tokens, which are parts of words or numbers. Using probabilities is not always what you want when working on tasks that require precise and reliable results. Murati clarified this issue by writing that “With this mathematical representation of patterns, GPT-3 can carry out many tasks, such as generating computer code, translating language, classifying information, searching semantically through a large corpus of text, even though it was built to do just one thing: predict the next word in a sequence of words.13

By the same token, ChatGPT is not a search engine, and its lack of connection to real-time Internet browsing disappointed many early users. This begs the question: why use a tool that isn't in sync with the latest news? This frustration probably stems from a misunderstanding of what the tool was designed to do. In response to popular demand, OpenAI has started working on a project called SearchGPT, but it's important to remember that ChatGPT was never intended to work like a search engine.

Moreover, ChatGPT doesn't act like a human assistant - although it can mimic one, depending on how you direct the conversation. As Ethan Mollick argues14, the assistant is not designed to give you a perfect answer, because that's not the best use of its capabilities. Imagine having a conversation with brilliant minds like Adam Smith or Milton Friedman. You wouldn't ask them for simple facts (which you can easily find on your own); you'd want explanations, introductions to complex ideas, and an exchange of ideas. The value is in generating new ideas, not just reconstructing old ones. Conversations can uncover more depth by bringing the wizard deeper into your specific context.

Product as a Service

A recent report on the O'Reilly platform confirms that the model itself is not a product. While there may be subscriptions and users reporting glitches, LLMs can be integrated into any existing product or service.15 A better use case, as Apple is attempting, is to integrate these models into a broader ecosystem. By relying on APIs to handle data, more constraints are imposed, which is beneficial for maintaining quality output. Apple is taking a different approach than others in the industry.

As Benedict Evans notes, “Apple has shown a bunch of cool ideas for generative AI, but much more, it is pointing to most of the big questions and proposing a different answer - that LLMs are commodity infrastructure, not platforms or products.”16 Apple has built an LLM without a chatbot interface, abstracting the model as an API call. This approach positions the LLM as a tool for enabling new features and capabilities, with design and product management dictating its functions and user interactions, rather than treating it as a platform or an oracle that users query.

In the realm of generative AI, while 2023 was about exploring the potential of these models, 2024 is likely to focus on delivering tangible results and creating real business value. Given the rapid pace of AI development, building your own model from scratch seems increasingly pointless. The key takeaway seems to be: avoid building your own large language model and be cautious about fine-tuning. Instead, focus on building an ecosystem around the best existing LLMs. Emad Mostaque, the former CEO of Stability AI, recently wrote an in-depth post titled "How to Think About AI" discussing future directions. One of the key debates among tech entrepreneurs revolves around centralized versus decentralized and closed versus open AI models.17

As companies explore generative AI, they're finding a slight dip in performance relative to expectations. However, five areas show promise: sales, software development, marketing, customer service, and onboarding. On the other hand, applications in legal, operations, and HR have been less effective. Companies often purchase third-party solutions when they are available, but they also invest in customization. As with many new technologies, many applications are built in-house because off-the-shelf options don't always meet expectations.18

Challenges in Adoption

The hype around ChatGPT reaching 100 million users led many people to try the free version, but the limitations of GPT-3.5 compared to the more powerful paid versions left many disappointed. These free tools are designed to provide a "safe" way to use AI in the workplace, which means they are often more limited than what cutting-edge models can actually achieve. As a result, many people underestimate AI's current capabilities and don't fully grasp how advanced it has become.19 Those who tried earlier versions concluded that they weren't good enough for their tasks and quickly stopped using them, not realizing how much these tools have improved over time.

ChatGPT's integration into Bing, as well as its role as a copilot in Office suite and Azure, made it seem like the ultimate assistant for everything. But like any general-purpose technology, it can't specialize in everything, nor can it excel at everything. For more sophisticated tasks, you need tools that are specifically tailored to those needs. Given the high cost of ownership, it's unrealistic to expect a powerful tool for free, even though large companies may suggest otherwise to attract users. While the tool's minimalist design seems intuitive, truly understanding how it behaves and how best to use it requires some dedication and time.

Note that the AI assistant doesn't work like the Google search bar. Using it effectively requires a different mindset, but that shift can significantly improve your results. Take video creation, for example. Despite its potential, it still faces challenges like inconsistencies and a poor understanding of physics that force creators to spend hours tweaking prompts. Often, the output needs further refinement using external tools, so you can't expect everything to be perfect in one go.

In just the last three months, seven new AI models have been released, yet overall AI usage seems to be declining. Many people use these tools for entertainment - creating poems, fakes, sounds, or music. While asking GenAI to generate cat memes or deep fakes may seem trivial given the significant power and resources required to run the servers, it's important to recognize that these models are still evolving. The current GPT-4o model represents a significant advancement over GPT-3.5, which was released nearly two years ago. While other models like Meta, Gemini, and Mistral are catching up, many consider Claude 3.5 Sonnet the leading model in the market today.

Generative AI might not be suitable for every application, and its widespread integration could be a misuse of resources, given its energy demands. As Ethan Mollick has suggested20, it may be the "worst" AI we currently have, but that doesn't mean we should dismiss its potential. The models are constantly improving, and understanding their limitations is crucial to using them effectively.

With great power comes great electricity bill

Most users benefit from free access to AI provided by tech giants, but this raises a significant question about return on investment (ROI). For instance, the AI ecosystem needs to generate approximately $600 billion in revenue to justify the current expenditures on essential infrastructure, such as GPUs and data centers. This concern leads to critical reflections on whether the ongoing investment in AI development is truly sustainable or if the costs outweigh the benefits.21

Despite early success in gaining traction, there are serious warnings of another potential bubble forming in the AI industry. “Sam Altman may be to AI, what Sam Bankman-Fried was to crypto. A situation where so much deception and over-promising leads to many moral and ethical failings. OpenAI is also the manipulation of Silicon Valley over-promising personified, a common VC tactic of over-promising and repetition that stretches back well before the dawn of the internet.”22 According to studies by Bain & Company, executives are most concerned about the quality and capabilities of AI. However, by 2024, many are shifting their focus to delivering real value. Bain’s research found that by early 2024, 87% of companies surveyed were already developing, piloting, or deploying generative AI in some capacity.

Still many companies continue to develop their own generative AI solutions because existing options are either not mature enough or lack the necessary specificity. This situation is likely to change as technology evolves. For instance, JPMorgan Chase has introduced a generative AI platform called LLM Suite for its asset and wealth management employees, which will function as a research analyst.

As the saying goes, "Don't count your chickens before they hatch." While there’s great promise in AI, it’s premature to assume success or predict its full impact, especially when some of the most impressive implementations are still hidden and not fully understood. Ethan Mollick accurately notes that “It doesn’t help that the two most impressive implementations of AI for real work - Claude’s artifacts and ChatGPT’s Code Interpreter - are often hidden and opaque.”23 Investment in AI is only increasing, suggesting that even if AGI (artificial general intelligence) remains out of reach, AI labs are determined to continue to significantly advance AI systems in the coming years. Even in the unlikely scenario that today's AI systems are the best we'll ever see, these systems would still cause significant disruption as they become more integrated into our work and daily lives.

Most tech giants are playing the long game, investing more money in hopes of greater adoption. But it's uncertain how a potential recession, even a short one, might affect enthusiasm and funding opportunities for companies like OpenAI, Anthropic, and their peers. Many of these companies face the possibility of not surviving in the next three to five years.24

Looking ahead, it's clear that AI will continue to shape our world, but its influence may be more complex and diverse than the early buzz suggested. As the initial excitement fades, we're seeing a shift toward a more practical approach to AI adoption. Integrating AI, especially large language models and generative AI, into our daily routines and workspaces is a complicated, ongoing process. Despite AI's undeniable potential, its adoption has been met with both excitement and skepticism, successes and setbacks.

The real key to making AI work is not to see these tools as a silver bullet, but to understand what they can and can't do. Those who can use AI effectively as part of a larger set of tools, rather than relying on it alone, are likely to gain the most. Ultimately, the journey of AI adoption is far from over. While challenges remain, the potential for AI to enhance human capabilities, drive innovation, and solve difficult problems is enormous. By combining enthusiasm with critical thinking, we can harness the power of AI in ways that truly benefit everyone.

—Michael Talarek