July 19, 2024
This week in AI: OpenAI plays for keeps with GPTs

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on their own.

This week in AI, OpenAI held the first of what will presumably be many developer conferences to come. During the keynote, the company showed off a slew of new products, including an improved version of GPT-4, new text-to-speech models and an API for the image-generating DALL-E 3, among others.

But without a doubt the most significant announcement was GPTs.

OpenAI’s GPTs provide a way for developers to build their own conversational AI systems powered by OpenAI’s models and publish them on an OpenAI-hosted marketplace called the GPT Store. Soon, developers will even be able to monetize GPTs based on how many people use them, OpenAI CEO Sam Altman said onstage at the conference.

“We believe that if you give people better tools, they will do amazing things,” Altman said. “You can build a GPT … and then you can publish it for others to use, and because they combine instructions, expanded knowledge and actions, they can be more helpful to you.”

OpenAI’s shift from AI model provider to platform has been an interesting one, to be sure — but not exactly unanticipated. The startup telegraphed its ambitions in March with the launch of plugins for ChatGPT, its AI-powered chatbot, which brought third parties into OpenAI’s model ecosystem for the first time.

But what caught this writer off guard was the breadth and depth of OpenAI’s GPT building — and commercializing — tools out of the gate.

My colleague Devin Coldewey, who attended OpenAI’s conference in person, tells me the GPT experience was “a little glitchy” in demos — but works as advertised, more or less. GPTs don’t require coding experience and can be as simple or complex as a developer wishes. For example, a GPT can be trained on a cookbook collection so that it can ask answer questions about ingredients for a specific recipe. Or a GPT could ingest a company’s proprietary codebases so that developers can check their style or generate code in line with best practices.

GPTs effectively democratize generative AI app creation — at least for apps that use OpenAI’s family of models. And if I were OpenAI’s rivals — at least the rivals without backing from Big Tech — I’d be racing to the figurative warroom to muster a response.

GPT could kill consultancies whose business models revolve around building what are essentially GPTs for customers. And for customers with developer talent, it could make model providers that don’t offer any form of app-building tools less attractive given the complexities of having to weave a provider’s APIs into existing apps and services.

Is that a good thing? I’d argue not necessarily — and I’m worried about the potential for monopoly. But OpenAI has first-mover advantage, and it’s leveraging it — for better or worse.

Here are some other AI stories of note from the past few days:

  • Samsung unveils generative AI: Just a few days after OpenAI’s dev event, Samsung unveiled its own generative AI family, Samsung Gauss, at the Samsung AI Forum 2023. Consisting of three models — a large language model similar to ChatGPT, a code-generating model and an image generation and editing model — Samsung Gauss is now being used internally with Samsung’s staff, the tech company said, and will be available to public users “in the near future.”
  • Microsoft gives startups free AI compute: Microsoft this week announced that it’s updating its startup program, Microsoft for Startups Founders Hub, to include a no-cost Azure AI infrastructure option for “high-end,” Nvidia-based GPU virtual machine clusters to train and run generative models. Y Combinator and its community of startup founders will be the first to gain access to the clusters in private preview, followed by M12, Microsoft’s venture fund, and startups in M12’s portfolio — and potentially other startup investors and accelerators after that.
  • YouTube tests generative AI features: YouTube will soon begin to experiment with new generative AI features, the company announced this week. As part of the premium package available to paying YouTube subscribers, users will be able to try out a conversational tool that uses AI to answer questions about YouTube’s content and makes recommendations, as well as a feature that summarizes topics in the comments of a video.
  • An interview with DeepMind’s head of robotics: Brian spoke with Vincent Vanhoucke, Google DeepMind’s head of robotics, about Google’s grand robotic ambitions. The interview touched on a range of topics, including general-purpose robots, generative AI and — of all things — office Wi-Fi.
  • Kai-Fu Lee’s AI startup unveils model: Kai-Fu Lee, the computer scientist known in the West for his bestseller “AI Superpowers” and in China for his bets on AI unicorns, is gaining impressive ground with his own AI startup, 01.AI. Seven months after its founding, 01.AI — valued at $1 billion — has released its first model, the open source Yi-34B.
  • GitHub teases customizable Copilot plan: GitHub this week announced plans for an enterprise subscription tier that will let companies fine-tune its Copilot pair-programmer based on their internal codebase. The news constituted part of a number of notable tidbits the Microsoft-owned company revealed at its annual GitHub Universe developer conference on Wednesday, including a new partner program as well as providing more clarity on when Copilot Chat — Copilot’s recently unveiled chatbot-like capability — will officially be available.
  • Hugging Face’s two-person model team: AI startup Hugging Face offers a wide range of data science hosting and development tools. But some of the company’s most impressive — and capable — tools these days come from a two-person team that was formed just in January, called H4.
  • Mozilla releases an AI chatbot: Earlier this year, Mozilla acquired Fakespot, a startup that leverages AI and machine learning to identify fake and deceptive product reviews. Now, Mozilla is launching its first large language model with the arrival of Fakespot Chat, an AI agent that helps consumers as they shop online by answering questions about products and even suggesting questions that could be useful in product research.

More machine learnings

We’ve seen in many disciplines how machine learning models are able to make really good short term predictions for complex data structures after perusing many previous examples. For example it could extend the warning period for upcoming earthquakes, giving people a crucial extra 20-30 seconds to get to cover. And Google has shown that it’s a dab hand at predicting weather patterns as well.

Several figured from the post showing how MetNet integrates data into its ML-based predictions. Image Credits: Google

MetNet-3 is the latest in a series of physics-based weather models that look at a variety of variables, like precipitation, temperature, wind, and cloud cover, and produce surprisingly high-resolution (temporal and spatial) predictions for what will likely come next. A lot of this kind of prediction is based on fairly old models, which are accurate some times but not others, or can be made more accurate by combining their data with other sources — which is what MetNet-3 does. I won’t get too far into the details, but they put up a really interesting post on the topic last week that gives a great sense of how modern weather prediction engines work.

In other highly specific sciences news, researchers from the University of Kansas have made a detector for AI-generated text… for journal articles about chemistry. Sure, it isn’t useful to most people, but after OpenAI and others hit the brakes on detector models, it’s useful to show that at the very least, something more limited is possible. “Most of the field of text analysis wants a really general detector that will work on anything,” said co-author Heather Desaire. “We were really going after accuracy.”

Their model was trained on articles from the American Chemical Society journal, learning to write introduction sections from just the title and just the abstract. It was later able to identify ChatGPT-3.5-written intros with near-perfect accuracy. Obviously this is an extremely narrow use case, but the team points out they were able to set it up fairly quickly and easily, meaning a detector could be set up for different sciences, journals, and languages.

There isn’t one for college admission essays yet, but AI might be on the other side of that process soon, not deciding who gets in but helping admissions officers identify diamonds in the rough. Researchers from Colorado University and UPenn showed that an ML model was able to successfully identify passages in student essays that indicated interests and qualities, like leadership or “prosocial purpose.”

Students won’t be scored this way (again, yet) but it’s a much-needed tool in the toolbox of administrators, who must go through thousands of applications and could use a hand now and then. They could use a layer of analysis like this to group essays or even randomize them better so all the ones who talk about camping don’t end up in a row. And the research exposed that the language students used was surprisingly predictive of certain academic factors, like graduation rate. They’ll be looking more deeply into that, of course, but it’s clear that ML-based stylometry is going to stay important.

It wouldn’t do to lose track of AI’s limitations, though, as highlighted by a group of researchers at the University of Washington who tested out AI tools’ compatibility with their own accessibility needs. Their experiences were decidedly mixed, with summarizing systems adding biases or hallucinating details (making them inappropriate for people unable to read the source material) and inconsistently applying accessibility content rules.

Employee people with disabilities and inclusion work together in office.

At the same time, however, one person on the autism spectrum found that using a language model to generate messages on Slack helped them overcome a lack of confidence in their ability to communicate normally. Even though her coworkers found the messages somewhat “robotic,” it was a net benefit for the user, which is a start. You can find more info on this study here.

Both preceding items bring up thorny issues of bias and general AI weirdness in a sensitive area, though, so it’s not surprising that some states and municipalities are looking at establishing rules for what AI can be used for in official duties. Seattle, for instance, just released a set of “governing principles” and toolkits that must be consulted or applied before an AI model can be used for official purposes. No doubt we’ll see differing — and perhaps contradictory — such rulesets put into play at all levels of governance.

Inside VR, a machine learning model that acted as a flexible gesture detector helped create a set of really interesting ways to interact with virtual objects. “If using VR is just like using a keyboard and a mouse, then what’s the point of using it?” asked lead author Per Ola Kristensson. “It needs to give you almost superhuman powers that you can’t get elsewhere.” Good point!

You can see in the video above exactly how it works, which when you think about it makes perfect intuitive sense. I don’t want to select “copy” then “paste” from a menu using my mouse finger. I want to hold an object in one hand, then open the palm of the other and boom, a duplicate! Then if I want to cut them, I just make my hand into scissors?! This is awesome!

Image Credits: EPFL

Last, speaking of Cut/Paste, that’s the name of a new exhibition at Swiss university EPFL, where students and professors looked into the history of comics from the 1950s on and how AI might enhance or interpret them. Obviously generative art isn’t quite taking over just yet, but some artists are obviously keen to test out the new tech, despite its ethical and copyright conundra, and explore its interpretations of historic material. If you’re lucky enough to be in Lausanne, check out Couper/Coller (the catchy local version of the ubiquitous digital actions).

Source link