Ghost 8B Beta

A large language model was developed with goals including excellent multilingual support, superior knowledge capabilities and cost efficiency.

Introduction

Ghost 8B Beta is a large language model developed with goals that include excellent multilingual support, superior knowledge capabilities, and cost-effectiveness. The model comes in two context length versions, 8k and 128k, along with multilingual function tools support by default.

The Ghost 8B Beta model outperforms prominent models such as Llama 3 8B Instruct, GPT 3.5 Turbo in the lc_winrate score. In addition, it also outperforms Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large when comparing the winrate score of AlpacaEval 2.0, *.

Thoughts

We believe that it is possible to optimize language models that are not too large to achieve better capabilities in terms of cross-linguistic understanding and solving complex tasks. The potential of these models is often mentioned as being cost-effective when deployed and operated at the production level for both large businesses and startups. By doing this well, we can partly eliminate worries about the cost of GPUs that hinder the development of useful A.I ideas and products for humans.

Specifications

  • Name: Ghost 8B Beta.
  • Version: disl-0x5 (aka: d0x5).
  • Model size: 8 billion parameters.
  • Context length: 8K, 8192 / 128K, 131072.
  • Languages: ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡จ๐Ÿ‡ณ Chinese.
  • Main tasks: as a pretrained model, chat, multi-tasking and function tools.
  • Distributions: Standard (BF16), GGUF, AWQ.
  • Developed by: Ghost X, Hieu Lam.
  • Status: Moderating / Previewing.

Techniques

Ghost 8B Beta has been fine-tuned with a recipe given the funny name “Teach the little boy how to cook Saigon Pho”.

This technique is designed to include model training with 3 main stages:

  • In stages 1 and 2, the “multilingual buffer” method (in short) will be used to perform fine-tuning. This method helps achieve efficiency in terms of language understanding and knowledge sharing for the model with low training costs and does not require too much sample data.
  • In stage 3, the model will be refined based on human feedback.

Note, with this recipe we are able to reproduce this model exactly. More specifically, all training source code (both forks and patches to libraries) is archived and reproducible.

Evaluation

At an overall level, to evaluate whether a model is effective and high quality or not, it must go through rigorous evaluations that are often trusted by the community such as AlpacaEval 2, MT-Bench, MMLU-Pro, etc.

Ghost 8B Beta will also be run through similar evaluations. In addition, because it is a multilingual model and supports function tools, the number of reviews of this model will be wider and more numerous. Finally, we will update the model evaluations scores here so everyone can easily follow them.

AlpacaEval 2.0

Overview based on evaluation results from this AlpacaEval 2.0:

  • In the Length-controlled win rates score (namely: lc_winrate). The Ghost 8B Beta model outperformed outstanding models such as the Llama 3 8B Instruct, GPT 3.5 Turbo (06/13) and GPT 3.5 Turbo (06/11). In addition, its score is quite close to larger models such as Mixtral 8x7B v0.1 and Gemini Pro.
  • In the win rates score (namely: winrate). The Ghost 8B Beta model outperformed Claude 3 Opus (02/29), Claude 3 Sonnet (02/29), GPT-4, GPT-4 (03/14), and Mistral Large (02/24).

Of course, the AlpacaEval 2.0 review highlights the “Length-controlled (LC) win rates” score, so the “win rates” here are for reference only.

Model Nameavg_lengthlc_winratewinratestandard_error
GPT-4 Preview (11/06)204950.0050.000.00
Claude 3 Opus (02/29)138840.5129.111.39
Claude 3 Sonnet (02/29)142034.8725.561.34
Llama 3 70B Instruct191934.4233.181.39
Gemini Pro145624.3818.181.16
Mixtral 8x7B v0.1146523.6918.261.19
Ghost 8B Beta (d0x5)243023.1229.141.32
Llama 3 8B Instruct189922.9222.571.26
GPT 3.5 Turbo (06/13)133122.3514.090.00
GPT 3.5 Turbo (11/06)79619.309.170.00
Mistral 7B Instruct v0.2167617.1114.721.08

A quick talk about AlpacaEval 2.0

AlpacaEval: An Automatic Evaluator for Instruction-following Language Models

AlpacaEval 2.0 with length-controlled win-rates (paper) has a spearman correlation of 0.98 with ChatBot Arena while costing less than $10 of OpenAI credits run and running in less than 3 minutes. Our goal is to have a benchmark for chat LLMs that is: fast (< 5min), cheap (< $10), and highly correlated with humans (0.98). Here’s a comparison with other benchmarks: LC AlpacaEval is the most highly correlated benchmark with Chat Arena.

LC AlpacaEval is the most highly correlated benchmark with Chat Arena.

There’s more

Other evaluations will be done soon and updated here later.

Notes

Contact

Follow Ghost X to stay updated with the latest information.