DeepSeek, the Chinese AI company, unveils Janus Pro, a family of multimodal LLMs designed to rival OpenAI's DALL-E 3 in image generation. Janus Pro 1B and 7B models demonstrate impressive performance, outperforming Stable Diffusion 3 Medium and DALL-E 3 in benchmark tests. The models are trained on a massive dataset and utilize a unique architecture that decoupled visual encoding. DeepSeek highlights the advancements made in overcoming limitations of the original Janus model, such as improved performance on short prompts and image generation quality. However, the company acknowledges that the models still have room for improvement, particularly in handling fine-grained tasks due to the limited input resolution.
Barely a week after DeepSeek's R1 LLM caused a stir in Silicon Valley, the Chinese company returns with a new release claiming to challenge OpenAI's DALL-E 3 . Janus Pro 1B and 7B are a family of multimodal large language models (LLMs) designed to handle both image generation and vision processing tasks. Similar to DALL-E 3 , you provide Janus Pro with an input prompt, and it generates a corresponding image.
DeepSeek achieves this by decoupling visual encoding into a separate pathway while maintaining a single transformer architecture for processing. Detailing the model and its architecture, the researchers behind the neural network noted that the original Janus model demonstrated promise but suffered from 'suboptimal performance on short prompts, image generation, and unstable text-to-image generation quality.' With Janus Pro, DeepSeek claims to have overcome many of these limitations by leveraging a large dataset and targeting higher parameter counts.When pitted against various multimodal and task-optimized models, the startup asserts that Janus Pro 7B narrowly outperforms both Stable Diffusion 3 Medium and OpenAI's DALL-E 3 in the GenEval and DPG-Bench benchmarks. However, it's important to note that image analysis tasks are confined to 384x384 pixels. DeepSeek claims its Janus Pro image models surpass the performance of both OpenAI's DALL-E 3 and Stability AI's SD3-Medium. Much like DeepSeek V3, the model maker claims to have achieved these results using only a few hundred GPUs running the HAI-LLM framework on PyTorch. The process, outlined in the aforementioned paper, states that 'the whole training process took about 7/14 days on a cluster 16/32 nodes for 1.5B/7B model, each equipped with eight Nvidia A100 (40GB) GPUs.' Training times may have been accelerated by the reuse of earlier models rather than training an entirely new one from scratch.We have reached out to DeepSeek for clarification. While competitive with other multimodal LLMs and diffusion models, DeepSeek acknowledges that there's still room for improvement. 'In terms of multimodal understanding, the input resolution is limited to 384x384, which affects its performance in fine-grained tasks, such as OCR,' the researchers explained. Meanwhile, regarding image generation, they point out that the limited resolution also results in images lacking fine details.The Janus codebase is available under an MIT license, with the use of the Pro models subject to DeepSeek's Model License, which can be found DeepSeek restricts new accounts amid cyberattack. The company asserts that the LLMs represent 'an excellent AI advancement' and utilize technologies that are 'fully export control compliant.' If you're interested in trying out either of the Janus Pro models, DeepSeek provides a pair of quick-start scripts on their website
Artificial Intelligence Deep Learning Llms Image Generation DALL-E 3 Stable Diffusion Openai Deepseek Janus Pro
United Kingdom Latest News, United Kingdom Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
DeepSeek isn't done yet with OpenAI – image-maker Janus Pro is gunning for DALL-E 3Crouching tiger, hidden layer(s)
Read more »
DeepSeek Chatbot Overwhelmed by User Demand, Sparks AI TurmoilDeepSeek, a free chatbot similar to ChatGPT, has experienced server overload due to a surge in users, leading to widespread complaints. The sudden popularity, attributed to media coverage following Nvidia's AI chip crash, has raised concerns about DeepSeek's potential reliance on OpenAI's models. The situation has triggered tech turmoil, with US officials investigating claims of knowledge distillation by DeepSeek. DeepSeek's responses regarding sensitive topics like human rights in Xinjiang have also sparked debate.
Read more »
DeepSeek's Open-Source AI Model Sparks US Stock Market Sell-OffA Chinese company's open-source AI model, developed at a fraction of the cost of American counterparts, has triggered a major sell-off in the US stock market, raising concerns about China's growing dominance in artificial intelligence.
Read more »
DeepSeek's Efficient AI Model Shakes Up the Chip MarketDeepSeek, a Chinese AI company, released a surprisingly efficient AI model (R1) that utilizes less hardware, potentially challenging Nvidia's dominance in the AI chip market. While US tech shares initially dipped due to concerns over the model's implications, they have since recovered somewhat. Nvidia and OpenAI maintain their confidence in the need for powerful hardware for advanced AI development. Despite DeepSeek's breakthrough, OpenAI CEO Sam Altman believes that more computing power is essential for their ambitious AGI goals.
Read more »
DeepSeek Shakes Up AI Landscape with Cost-Effective ModelDeepSeek, a new Chinese AI app, has sent shockwaves through the tech industry, triggering a massive sell-off of major tech stocks and raising questions about America's dominance in AI. DeepSeek's developers claim to have built the latest model for a mere $5.6 million, a fraction of the cost incurred by AI giants like OpenAI, Google, and Anthropic. This development has stunned many Silicon Valley observers, prompting discussions about the potential impact on the future of AI and the global tech landscape.
Read more »
OpenAI says it has evidence China’s DeepSeek used its model to train competitorWhite House AI tsar David Sacks raises possibility of alleged intellectual property theft
Read more »