📢 Gate Square Exclusive: #PUBLIC Creative Contest# Is Now Live!
Join Gate Launchpool Round 297 — PublicAI (PUBLIC) and share your post on Gate Square for a chance to win from a 4,000 $PUBLIC prize pool
🎨 Event Period
Aug 18, 2025, 10:00 – Aug 22, 2025, 16:00 (UTC)
📌 How to Participate
Post original content on Gate Square related to PublicAI (PUBLIC) or the ongoing Launchpool event
Content must be at least 100 words (analysis, tutorials, creative graphics, reviews, etc.)
Add hashtag: #PUBLIC Creative Contest#
Include screenshots of your Launchpool participation (e.g., staking record, reward
Tsinghua Tang Jie's new work WebGLM: 10 billion parameters, mainly online search, performance exceeds OpenAI WebGPT
Source: Qubit
The new work of Tsinghua Tang Jie team is here:
WebGLM, a Internetwork question-and-answer chat robot with 10 billion parameters (the paper was selected for KDD2023).
for example:
It can give reasonable answers.
According to reports, in the performance comparison test, the level of WebGLM has been higher than OpenAI's 13.5 billion parameter WebGPT, and in human evaluation, it is even comparable to the 175 billion parameter model.
Tsinghua Department WebGLM that can access the Internet
According to reports, the goal of WebGLM is to enhance the pre-trained large language model through Web search and retrieval functions, while enabling efficient actual deployment.
To this end, the author develops based on three strategies.
The first is the Large Model Augmented Retriever.
It is mainly used to enhance the retrieval ability of model-related network content, and find relevant references in the case of a given query, so as to better answer questions accurately later.
It has two stages: coarse-grained web search and fine-grained LLM-enhanced dense retrieval.
Followed by Bootstrap Generator.
It uses the ability of GLM (such as the bilingual open source pre-training model GLM-130B released by Tsinghua University) to generate responses to questions and provide detailed answers.
Using this generator, the authors obtain WebGLM-QA - a LLM bootstrap citation and long-range QA dataset.
It is cleaned and filtered through strategies such as context learning, and finally includes 45k high-quality filtered samples and 83k noise samples.
The backbone of WebGLM is a GLM model trained on this dataset.
Finally, there is a scorer based on human preferences.
It evaluates the quality of generated responses by prioritizing human preferences over costly expert feedback, ensuring the system produces useful and engaging content.
The above three components finally form the pipeline of WebGLM in order:
The LLM enhanced retriever will use the top five most relevant pages as a reference source, let the bootstrap generator generate multiple answers, and finally the scorer selects the one that is most likely to meet human preferences as the final output.
Performance exceeds OpenAI WebGPT
In addition to WebGLM itself, Tang Jie's team also proposed an evaluation standard for a network-enhanced question answering system. The evaluation objects include both references and final answers.
Among them, the former measures the five dimensions of relevance, information density, authenticity (no factual errors), toxicity (excluding information such as violent pornography) and the degree of social prejudice; the latter measures fluency, correctness, citation accuracy, and objectivity. and redundancy.
They used the 272 questions provided by the WebGPT (from OpenAI, fine-tuned based on GPT-3) demo website for comparative evaluation, and recruited 15 volunteers with a master's degree to score.
The final result is as follows:
It can be seen that although WebGLM's search results are slightly inferior to WebGPT-175B, they are much better than Perplexity.ai and WebGPT-13B (reference evaluation on the left).
It is worth mentioning that the WebGLM retrieval process only uses some traditional word-based algorithms and two Contrievers whose cumulative parameters do not exceed 300M.
In addition, WebGLM is also significantly better than WebGPT-13B in terms of computing performance and time consumption, and is comparable to 175B.
In terms of final results, WebGLM achieved the highest scores in terms of fluency, authenticity, and redundancy, and its correctness index was close to WebGPT-175B, much higher than Perplexity.ai and WebGPT-13B.
According to the authors, this shows that WebGLM can achieve higher performance at a lower cost.
Deployment and Training
WebGLM is released as open source.
The weights of the retriever can be downloaded from Tsinghua Cloud.
There are two ways to run the model: one is the command line interface, the other is the form of Web service, and there are two optional models including WebGLM-2B and WebGLM-10B.
You can also train WebGLM yourself, the official training data of the generator and retriever has been provided for download~
Paper address:
GitHub homepage: