Tracking Advancements in LLMsTracking Advancements in LLMs
Spread the love

An LLM is a type of neural network based on the Transformer architecture, which was first introduced in a 2017 paper by Google engineers titled “Attention is All You Need.” The primary objective of this model is to predict the most probable next text. A model’s complexity and effectiveness can be determined by the number of parameters it possesses. These parameters correspond to the number of variables taken into account by the model while generating output.


There are many open-source language models that are deployable on-premise or in a private cloud, which translates to fast business adoption and robust cybersecurity. 

Some large language models in this category are:

  • NeMO LLM
  • XLNet
  • Cohere
  • GLM-130B

Most of the leading language model developers are American, but there are successful examples from China and Europe as they work to catch up on generative AI.

We have many new LLMs coming up at a very fast pace. It is important to keep track of these to make best use of these developments. Here’s just a few of these, which are setting trend:-


BARD stands for “Big AI Research Dialogue”. It is a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. It is trained on a massive amount of text data, and is able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, it can provide summaries of factual topics or create stories.

Here is a comparison of BARD, PALM, GPT-4, LLAMA, and other models:



Training Data



175 billion

Books, articles, code, and other forms of text

Generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.


540 billion

Books, articles, code, and other forms of text

Generate text, translate languages, write different kinds of creative content, answer your questions in an informative way, and perform many kinds of tasks, including coding, writing different kinds of creative content, and following your instructions thoughtfully.


175 billion

Books, articles, code, and other forms of text

Generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.


13 billion

Books, articles, code, and other forms of text

Generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

As you can see, BARD, PALM, GPT-4, and LLAMA are all large language models with a wide range of capabilities. They can all generate text, translate languages, and write different kinds of creative content. However, they also have different strengths and weaknesses. For example, BARD is better at following instructions and completing requests thoughtfully, while PALM is better at performing many kinds of tasks, including coding, writing different kinds of creative content, and following your instructions thoughtfully. Ultimately, the best model for you will depend on your specific needs.

GPT-3 and GPT-4 Models

GPT-3 and GPT-4 are both large language models (LLMs) developed by OpenAI. GPT-3, released in 2020, is considered a trendsetter in the field of LLMs due to its massive size and impressive performance on a variety of natural language processing tasks. With 175 billion parameters, it was at the time the largest LLM ever created. Its ability to generate coherent and human-like responses to prompts has been widely praised and has led to a surge in interest in the field of natural language processing.

GPT-4 is the successor to GPT-3 and is currently in development by OpenAI. While details about the model are still scarce, it is expected to be even larger than its predecessor, potentially reaching a trillion parameters. GPT-4 is expected to continue the trend of pushing the boundaries of what is possible with LLMs and further advance the field of natural language processing.


ChatGPT is a language model that is based on the GPT-3 architecture and was trained by OpenAI to generate human-like text in response to prompts given by users. GPT-3 and GPT-4 are also language models based on the same architecture and trained by OpenAI to generate human-like text. However, GPT-3 and GPT-4 are much larger and more complex than ChatGPT and can perform a wider range of language tasks. ChatGPT is designed specifically for use in conversational applications, while GPT-3 and GPT-4 can be used for a wider range of applications such as language translation, text summarization, and more.

LLaMA Model

The LLaMa model is a language model that serves as a foundation for predicting the next token in a sequence of words. This means that given a sequence of words, the LLaMa model can predict the most probable next word. You may have seen this in action when ChatGPT completes your sentences.

What sets the LLaMa model apart is that, despite being 13 times smaller than the GPT-3 model, it performs better on most benchmarks. This is particularly impressive considering the high performance of models like GPT-3 and ChatGPT. The LLaMa model’s smaller size also makes it possible to run a ChatGPT-like model on a local computer or even a Raspberry Pi, as demonstrated by one user. However, the LLaMa model was initially intended for research purposes only, and Meta required researchers to request model checkpoints. Although the model’s weights have been leaked and are now accessible to anyone, the model is still subject to a non-commercial license.

Memory requirements for different sizes of llama models are :






Full model takes 31.17 GB, 4.21 GB, fully quantized/compressed



Full model takes 60.21 GB, compressed model takes 8.14 GB



Full model takes 150.48 GB, compressed model takes 20.36 GB



Full model takes 432.64 GB, compressed model takes 40.88 GB


Alpaca Model

The Alpaca model is essentially a modified and fine-tuned version of the LLaMA model, designed to follow instructions similar to ChatGPT. What makes the Alpaca model truly remarkable is that the entire fine-tuning process cost less than $600, whereas training the GPT-3 model in 2020 cost around $5,000,000. While the LLaMA model provided much of the foundation, the achievement of fine-tuning it into a ChatGPT-like model for such a low cost is truly astounding.

So how did Taori et al. manage to accomplish this feat? Interestingly enough, they received some inadvertent assistance from OpenAI. Initially, they only had 175 self-instruction tasks, but they were able to modify these tasks using OpenAI’s text-davinci-003 model to create 52,000 instruction-following examples that they could use for supervised fine-tuning. By doing so, they were able to fine-tune a model that is almost as good as ChatGPT without requiring human feedback, significantly reducing the cost of production.




 Vicuna-13B is an open-source chatbot that was trained by fine-tuning the LLaMA model using conversations shared by users and collected from ShareGPT. Preliminary evaluations were conducted using GPT-4 as a judge, which revealed that Vicuna-13B achieves a quality level of more than 90%* that of OpenAI ChatGPT and Google Bard. In addition, it outperforms other models like LLaMA and Stanford Alpaca in over 90%* of cases. The cost of training Vicuna-13B was approximately $300, making it significantly less expensive than other models. We have made the training and serving code, along with an online demo, available for non-commercial use.


Mobile LLM

Enabling large language models to operate on mobile devices is crucial since these devices are the ones we interact with on a daily basis. With over 2.5 billion active devices running on Android operating system, imagine how incredible it would be to have LLM support on these devices.

ThaMLC LLM for Android is a solution that enables large language models to be natively deployed on Android devices. It also provides a productive framework for everyone to optimize model performance according to their use cases. The best part is that everything runs locally, and it is accelerated with the native GPU on the phone.

MLC LLM is a universal solution that allows any language model to be natively deployed on a wide range of hardware backends and native applications, along with a framework that allows everyone to optimize model performance for their own specific use cases. We even put the Vicuna 7b model on a Samsung Galaxy S23 powered by the Snapdragon 8 Gen 2 Mobile Platform, and it’s amazing to see how quickly our team brought support to this new platform in just one week, thanks to Apache TVM Unity and ML compilation.


MLC | Bringing Hardware Accelerated Language Models to Android Devices

mlc-llm/android at main · mlc-ai/mlc-llm · GitHub


IMAGEBIND: One Embedding Space To Bind Them All.

Learns a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data.

An open source project by Meta-FAIR.

Blog post:


MPT Model

MPT is a new family of open-source, commercially usable LLMs from MosaicML. Trained on 1T tokens of text+code, MPT models match and – in many ways – surpass LLaMa-7B. This release includes four models: MPT-Base, Instruct, Chat, and StoryWriter.

For full technical details on the models, datasets, and training regimes and links to all of the different artifacts we released today, check out our blog:

Why did we do this? These models are demonstrations of our tools for training, finetuning, and serving custom LLMs. Our friends at Replit used the exact same tools to train their SOTA code generation model last week.

MPT-7B-Base is a decoder-style transformer with 6.7B parameters – designed up to be finetuned and customized for your use-case.
MPT-7B-Instruct is a commercially-usable instruction-following model finetuned on Dolly+HHRLHF.
MPT-7B-Chat is a chatbot finetuned on Alpaca & friends.
MPT-7B-StoryTeller-65k+ is finetuned on books w/context 65k; it writes awesome fiction.

To highlight StoryWriter: Its final training stage has a 65k token context, 32x LLaMa and 2x GPT-4. This crazy length works out of the box with our LLM Foundry on standard GPUs.

Technical details time! How did we do this? We started with our own custom variant of the transformer architecture, modified for speed and efficiency (no surprise from us). And then we trained on a ton of data on 440 A100s for 9.5 days.

This is the culmination of a two-year journey at MosaicML: we built great infrastructure (MosaicML platform), tools for training (Composer, StreamingDataset), and model code/checkpoints (LLM Foundry).

Microsoft Research’s Gorilla

The world of LLMs is witnessing a new milestone with Microsoft Research’s introduction of Gorilla, a fine-tuned LLaMA model explicitly designed for API calls, surpassing the performance of GPT-4 in writing API calls.

Despite the recent advancements in LLMs, their potential to effectively use tools via API calls has remained largely untapped. This is where Gorilla steps in – given a natural language query, Gorilla can generate more than 1,600 semantically and syntactically correct API calls across Hugging Face, Torch Hub, and TensorFlow.

Gorilla’s strength lies in its ability to adapt to API changes during inference. It also significantly reduces the issue of hallucination by using a retrieval system that pulls the relevant API call from a large database.

Integrating the retrieval system with Gorilla showcases the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs.

To evaluate Gorilla’s capabilities, a comprehensive dataset called APIBench was introduced, consisting of HuggingFace, TorchHub, and TensorHub APIs.

Together with other agent-like libraries such as Hugging Face’s Transformers Agent, one can significantly extend LLM’s abilities, automating whole workflows with minimal human intervention.

GitHub repo

By Hassan Amin

Dr. Syed Hassan Amin has done Ph.D. in Computer Science from Imperial College London, United Kingdom and MS in Computer System Engineering from GIKI, Pakistan. During PhD, he has worked on Image Processing, Computer Vision, and Machine Learning. He has done research and development in many areas including Urdu and local language Optical Character Recognition, Retail Analysis, Affiliate Marketing, Fraud Prediction, 3D reconstruction of face images from 2D images, and Retinal Image analysis in addition to other areas.