5 things AI image generators still struggle with – Digital Trends

wp header logo 2209

AI image generators like Dall-E, Stable Diffusion, Midjourney, and Bing Image Creator produce amazing results, but sometimes they can be incredibly frustrating. With simple prompts containing just a few words, an AI can output impressive images that appear to be professional photographs and convincing art in various styles. However, the same prompt will occasionally create some horrific creature or hilariously flawed rendering.
Negative prompts might help reduce the likelihood of these errors, but complexity can’t always save you. Even AI experts struggle with misshapen creatures and unworldly scenes, requiring long hours of refining prompts or touching-up images with a traditional photo editor. For the time being, if you look carefully in the right areas of an image, there’s a good chance you’ll be able to identify if it was made by a machine.
AI developers have made progress in the struggle to teach artificial intelligence tools how human hands should look, but there’s plenty of room for improvement. If fingers aren’t featured prominently, it’s easy to miss errors, but it’s an ongoing problem.
One of the first and best AI image generators available to the public, OpenAI’s Dall-E, created these pictures of people holding hands. At first glance, it might look fine. On closer inspection, some problems become apparent. Beware of extra fingers, weird fingernails, and merged digits.
Complicated grips and interlaced fingers are even more challenging. Don’t be surprised if your AI images come back with classic glitches referred to as “hand salad” or “balls of fingers.”
You might expect that text would be easy for a computer to generate. You see evidence of words on screens daily when you pick up the phone or open a browser. Early computers, unlike the top gaming PCs of today, couldn’t display graphics of any kind. Everything was text or numbers.
Yet displaying actual letters and symbols as printed or written words is surprisingly tricky for an AI image generator. It might sound like an easy problem to solve, but it isn’t. An app can’t just overlay plain text. To be convincing, the text style, shading, angle, and perspective must match the rest of the scene.
In the example, a relatively new AI image generator, Leonardo AI, made a valiant effort with a vintage billboard for Jack Rabbit Slim’s diner. After multiple tries, the AI managed to spell out “Jack Rabbit’s,” which is quite close to the request. The vintage photograph style was spot-on in each image, but the letters and words were mostly flawed.
It’s often said that the eyes are the windows to the soul. We rely so much on eye contact that it could be the most critical detail in creating a realistic portrait. But many AI tools have difficulty rendering human eyes.
Bing Image Creator did a decent job with the studio background and posing a multigenerational family photo. However, almost every person has bizarre eyes that look like they’ve been inserted by aliens, or perhaps these smiling people are in the process of transforming into unearthly creatures.
Humans are great with tools and not only the digital variety like AI. We quickly master any physical tool within our grasp. An AI, on the other hand, struggles to understand what they are and how they’re used.
Midjourney is an AI image generator that’s making fantastic progress in solving problems with human faces and hands. However, when prompted to show a mechanic tightening a bolt with a wrench, the tool is entirely absent. Fingernails are added to gloves in one case, and a light bulb somehow appears in another.
Scissors are too complicated for Bing Image Creator in this closeup render of hair being cut. They are only open in one image and never appear to be in the act of cutting.
When people smile and laugh, that usually improves a picture, making it pleasant and fun. When given a simple prompt like two students smiling and laughing, an AI can turn this into nightmare fuel with multiple rows of teeth and other strange distortions.
Leonardo AI allows you to choose between several models, and some handle teeth well. The popular Stable Diffusion 2.1 model needed some help to get teeth right. With some negative prompting, the issue was resolved. There are solutions to these AI image problems, but it still takes work to get good results.
In the early days of AI art, the results were weird and wonderful, creating beauty and horror with equal abandon. The errors are becoming less noticeable with each new update, and many problems can be overcome with some refinement.
With so many AI tools available, it’s easy to try another system. Many AI image generators allow negative prompts or other options to adjust the algorithm and get better results.
You may need to run through several attempts to get a usable picture, particularly if there’s a focus on faces or hands. When you want to include print or written words, be prepared to spend time in an image editor erasing the AI’s nonsense letters and blending in the correct text.
The good news is that many AI image generators are free, and subscription models are relatively inexpensive. Within a year, these lingering problems could be resolved, allowing you to use an AI render as a finished art piece or a replacement for a photograph.
Thought you could point out an AI-generated image? Well, this viral image tricked lots of folks online this weekend — and you just might be one of them.
The absurd image of the Pope in a puffy white coat that spread across Twitter was, in fact, generated with Midjourney. It quickly became a meme, but very few people were commenting on the true source of the image.
Microsoft isn’t slowing down its momentum in generative AI. Just a month since it launched the ChatGPT-based Bing Chat, the company is now introducing Bing Image Creator, which brings text-to-image generation right to your browser.
Bing Image Creator lets you create images from text using DALL-E, which is OpenAI’s own text-to-image AI model. Microsoft says it’s using “an advanced” version of DALL-E, though the company didn’t provide specifics about how it was different than the current DALL-E 2 model. This isn’t dissimilar, though, to how Bing Chat was announced, which had been running on GPT-4 before the new model had even been announced.
Grammarly, one of the biggest names in writing tools, is adding AI-generated text to its repertoire on the heels of the wild popularity of ChatGPT. Known as GrammarlyGO, this new tool is focused on improving writing rather than replacing the writer.
GrammarlyGO will roll out in beta form to existing users in April. All tiers, including developers, business, education, and premium users, will have access. You can even use GrammarlyGO with a free account.
Upgrade your lifestyleDigital Trends helps readers keep tabs on the fast-paced world of tech with all the latest news, fun product reviews, insightful editorials, and one-of-a-kind sneak peeks.