Advancements in LLMsAdvancements in LLMs
Spread the love

Large Language Models have taken world by storm, open access to LLMs is needed by the humanity in order to take full advantage of these for development of cutting edge applications for health, commerce, education and other areas across the world.

There are many reasons why we need LLMs. Here are a few:

  • To understand and generate text: LLMs can be used to understand and generate text in a way that is more natural and human-like than previous AI technologies. This can be used for a variety of applications, such as creating chatbots, generating marketing copy, or writing creative content.
  • To classify and categorize content: LLMs can be used to classify and categorize content, such as news articles, blog posts, or social media posts. This can be used to help users find the information they are looking for, or to track trends and patterns in data.
  • To perform sentiment analysis: LLMs can be used to perform sentiment analysis, which is the process of identifying the emotional tone of a piece of text. This can be used to understand how people feel about a product, service, or idea.
  • To translate languages: LLMs can be used to translate languages, which can be helpful for businesses that operate in multiple countries or for people who want to communicate with people who speak other languages.

Overall, LLMs are a powerful tool that can be used for a variety of applications. As they continue to develop, we can expect to see even more uses for them in the future.

Here are some specific examples of how LLMs are being used today:

  • Chatbots: LLMs are being used to power chatbots that can have conversations with humans in a natural way. This is being used by businesses to provide customer service, to sell products, or to gather feedback from customers.
  • Content generation: LLMs are being used to generate content, such as blog posts, articles, and even creative writing. This is being used by businesses to create marketing materials, by journalists to produce news stories, and by writers to create books and other works of fiction.
  • Research: LLMs are being used by researchers in a variety of fields, such as medicine, law, and finance. They are being used to analyze large amounts of data, to identify trends, and to make predictions.

As LLMs continue to develop, we can expect to see even more uses for them in the future. They have the potential to revolutionize the way we interact with computers and the way we access information.

OpenLLaMA: An Open Reproduction of LLaMA

Fully Open Source LLaMA 13B is here! 

OpenLM research has kindly been working on releasing LLaMA model under fully permissive opensource licence of LLaMA large language model. These models share the exact same code and hyper-parameters as the original LLaMA model, but are trained on the RedPajama dataset.

The 13B model weights in both Jax and PyTorch have just been released. There one small catch: The tokeniser used in the model treats empty spaces by merging them which makes the model poor for code generation.

However, as Andrej Karpathy pointed out in the State of GPT talk, LLaMA remains to be one of the best open source models for building on top. The same makes OpenLLaMA even more exciting since it also allows commercial usage.

I’d also be curious to see if the authors consider training the future releases on SlimPajama dataset, which is a cleaned version of the RedPajama dataset with 49% smaller size. 

Lets get started with downloading the model weights and I’m keen to play with the model:

https://github.com/openlm-research/open_llama

https://huggingface.co/openlm-research/open_llama_13b

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Microsoft Introducing ORCA 13B

Microsoft has introduced a new open-source LLM named Orca 13B. This 13-billion parameter model is intended to compete with other large language models like GPT-4. Here are some key questions and answers about this:

What is Orca 13B?
Orca 13B is a large language model developed by Microsoft. It’s designed to learn from and compete with other large language models like GPT-4. It has 13 billion parameters and is (soon to be) open-source, making it accessible to the wider AI community.

How is Orca 13B different from GPT-4?
The main difference between Orca 13B and GPT-4 is in the way they learn and reason. Orca 13B learns from complex explanation traces and step-by-step thought processes from GPT-4. This allows Orca 13B to have a better understanding of the reasoning process, which helps it provide more detailed and accurate responses. This learning approach significantly enhances Orca 13B’s reasoning and comprehension skills compared to GPT-4.

What kind of tasks is Orca 13B trained on?
To enhance Orca’s learning process, the research team uses the Flan 2022 Collection. The team samples tasks from this extensive collection to ensure a diverse mix of challenges. These tasks are then sub-sampled to generate complex prompts, which serve as queries for large foundation models. This approach creates a diverse and rich training set for Orca 13B.

How does Orca 13B perform compared to other models?
Evaluations have shown that Orca 13B outperforms state-of-the-art instruction-tuned models like Vicuna-13B, showing an improvement of over 100% on BigBench Hard (BBH). Orca 13B also exhibits competitive performance on academic exams in zero-shot settings, indicating its potential for real-world applications.

What is the significance of Orca 13B’s development?
The introduction of Orca 13B represents a significant breakthrough in advancing instruction-tuned models. By learning from step-by-step explanations and scaling tasks with complex prompts, Orca 13B achieves significant advancements in instruction-tuned models. This holds promise for fully unlocking the potential of large language models and driving progress in natural language processing.

Full paper: https://arxiv.org/pdf/2306.02707.pdf

Stay tuned for more updates on AI developments and breakthroughs!

Technology Innovation Institute Introduces Falcon LLM

Technology Innovation Institute has publicly released the source code and the model’s weights for research and commercial use.
For researchers and developers this will make Falcon 40B much more accessible and easier to understand the model’s behavior within hours rather than weeks. For use of Falcon 40B, see our TII Falcon LLM License Version 1.0.

Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.

 

Compute Infrastructure
Hardware

Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.

Software

Falcon-40B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)

License
Falcon-40B is made available under the Apache 2.0 license.

From <https://huggingface.co/tiiuae/falcon-40b>

Use Cases

• Generate creative text and solve complex problems.
• Used in chatbots, customer service operations, virtual assistants, language translation, content generation, and sentiment analysis.
• Broad use cases are foreseen by Falcon, although we are most excited about applications to reduce and automate “repetitive” work.
• Falcon will help Emirati companies and startups become more efficient, streamlining internal processes and giving back time for employees to focus on what matters.
• At an individual level, chatbots embedding Falcon will be able to assist users in their daily lives.

Falcon LLM Update

You can now run the most competent open-source LLM allowing commercial use – Hugging Face just dropped a few resources to get started in minutes:

* A comprehensive blog post on how to run and fine-tune Falcon https://huggingface.co/blog/falcon
* A live demo to tinker with Falcon https://huggingface.co/spaces/HuggingFaceH4/falcon-chat
* New hosted inference endpoints for Falcon 7B and 40B models
* A sneak peek into a new iOS app from Hugging Face to host and run LLMs locally (see video)

Falcon 40B is the reigning open-source LLM on the Open LLM Leaderboard, with the 7B version being the best in its weight class. The driver behind Falcon’s performance lies in its training data, which unlike other LLMs is predominantly based on a novel large dataset called RefinedWeb.

Guanaco – Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputs

Researchers at the University of Washington have developed an open-source chatbot called Guanaco. Guanaco is designed to rival the performance of ChatGPT, but it can be trained much faster and with fewer resources. Guanaco is named after a South American relative of the llama, and it is built on the LLaMA language model. It uses a novel fine-tuning method called QLoRA.

Guanaco is an open-source language model that can perform as well as ChatGPT, but can be trained in a single day using QLoRA, a language model fine-tuning technique.
Guanaco and similar open-source models are challenging the notion that expensive training is necessary for state-of-the-art language models.
However, a recent study has found that these models struggle to perform well on tasks they haven’t been explicitly exposed to, lagging behind more advanced models.
In short, Guanaco is a promising new language model that is more efficient and affordable than ChatGPT, but it is still too early to say whether it will be able to fully replace more advanced models.

 

References

https://github.com/Guanaco-Model/Guanaco-Model.github.io

https://guanaco-model.github.io/

https://the-decoder.com/guanaco-is-a-chatgpt-competitor-trained-on-a-single-gpu-in-one-day/

https://www.binance.com/en/feed/post/579695

By Hassan Amin

Dr. Syed Hassan Amin has done Ph.D. in Computer Science from Imperial College London, United Kingdom and MS in Computer System Engineering from GIKI, Pakistan. During PhD, he has worked on Image Processing, Computer Vision, and Machine Learning. He has done research and development in many areas including Urdu and local language Optical Character Recognition, Retail Analysis, Affiliate Marketing, Fraud Prediction, 3D reconstruction of face images from 2D images, and Retinal Image analysis in addition to other areas.