Tsinghua Tang Jie's new work WebGLM: 10 billion parameters, mainly online search, performance exceeds OpenAI WebGPT

2023-06-24 03:13:48

Source: Qubit

The new work of Tsinghua Tang Jie team is here:

WebGLM, a Internetwork question-and-answer chat robot with 10 billion parameters (the paper was selected for KDD2023).

You can ask it any question, and it will list links to relevant articles on the Internet (such as Wikipedia, related official websites) and sort out the answers.

for example:

What is the core technology of ChatGPT?

or:

Who proposed Music Transformer? What is its principle?

or:

How about Genshin Impact 3.5?

How can you live in a first-tier city without a high-paying job? (manual dog head)

……

It can give reasonable answers.

According to reports, in the performance comparison test, the level of WebGLM has been higher than OpenAI's 13.5 billion parameter WebGPT, and in human evaluation, it is even comparable to the 175 billion parameter model.

So, how is it trained?

Tsinghua Department WebGLM that can access the Internet

According to reports, the goal of WebGLM is to enhance the pre-trained large language model through Web search and retrieval functions, while enabling efficient actual deployment.

To this end, the author develops based on three strategies.

The first is the Large Model Augmented Retriever.

It is mainly used to enhance the retrieval ability of model-related network content, and find relevant references in the case of a given query, so as to better answer questions accurately later.

It has two stages: coarse-grained web search and fine-grained LLM-enhanced dense retrieval.

Followed by Bootstrap Generator.

It uses the ability of GLM (such as the bilingual open source pre-training model GLM-130B released by Tsinghua University) to generate responses to questions and provide detailed answers.

Using this generator, the authors obtain WebGLM-QA - a LLM bootstrap citation and long-range QA dataset.

It is cleaned and filtered through strategies such as context learning, and finally includes 45k high-quality filtered samples and 83k noise samples.

The backbone of WebGLM is a GLM model trained on this dataset.

Finally, there is a scorer based on human preferences.

It evaluates the quality of generated responses by prioritizing human preferences over costly expert feedback, ensuring the system produces useful and engaging content.

The above three components finally form the pipeline of WebGLM in order:

It can be seen that there are exactly three modules, corresponding to the three parts introduced above, among which:

The LLM enhanced retriever will use the top five most relevant pages as a reference source, let the bootstrap generator generate multiple answers, and finally the scorer selects the one that is most likely to meet human preferences as the final output.

Performance exceeds OpenAI WebGPT

In addition to WebGLM itself, Tang Jie's team also proposed an evaluation standard for a network-enhanced question answering system. The evaluation objects include both references and final answers.

Among them, the former measures the five dimensions of relevance, information density, authenticity (no factual errors), toxicity (excluding information such as violent pornography) and the degree of social prejudice; the latter measures fluency, correctness, citation accuracy, and objectivity. and redundancy.

They used the 272 questions provided by the WebGPT (from OpenAI, fine-tuned based on GPT-3) demo website for comparative evaluation, and recruited 15 volunteers with a master's degree to score.

The final result is as follows:

("Rel.", "Den."... respectively correspond to the 10 indicators mentioned above.)

It can be seen that although WebGLM's search results are slightly inferior to WebGPT-175B, they are much better than Perplexity.ai and WebGPT-13B (reference evaluation on the left).

It is worth mentioning that the WebGLM retrieval process only uses some traditional word-based algorithms and two Contrievers whose cumulative parameters do not exceed 300M.

In addition, WebGLM is also significantly better than WebGPT-13B in terms of computing performance and time consumption, and is comparable to 175B.

In terms of final results, WebGLM achieved the highest scores in terms of fluency, authenticity, and redundancy, and its correctness index was close to WebGPT-175B, much higher than Perplexity.ai and WebGPT-13B.

According to the authors, this shows that WebGLM can achieve higher performance at a lower cost.

Deployment and Training

WebGLM is released as open source.

To deploy it, you need to obtain a key from the SerpAPI official website, which is used to obtain search results during the search process.

The weights of the retriever can be downloaded from Tsinghua Cloud.

There are two ways to run the model: one is the command line interface, the other is the form of Web service, and there are two optional models including WebGLM-2B and WebGLM-10B.

You can also train WebGLM yourself, the official training data of the generator and retriever has been provided for download~

Paper address:

GitHub homepage:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#Crypto Market Pullback
259k Popularity
#Jackson Hole Meeting
5k Popularity
#Gate Alpha ESPORTS Points Airdrop
4k Popularity
#Institutions Hold 10M+ ETH
22k Popularity
#MicroStrategy Loosens Stock Rules
19k Popularity

sitemap