When will the new giant be born? Large model awaits "watershed"

Question

**Source|** Zero One Finance**Author|** **Shen Zhuoyan**![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-64a15b01a1-dd1a6f-7649e1) *Image source: Generated by Unbounded AI tool*Since 2023, the hottest word in the technology circle is ChatGPT and the large-scale model technology behind it.Previously, there were Baidu Wenxin Yiyan, Alibaba Cloud Tongyi Qianwen, Huawei Pangu, HKUST Xunfei Xinghuo, etc. Recently, Li Kaifu entered the bureau to establish Zero One Wanwu, and Volcano Engine launched the "Volcano Ark". In just a few months, it has become a trend for various enterprises to develop and release large-scale model applications.There are more than 80 domestic large-scale models with a parameter scale of 1 billion or more, and the number is still increasing rapidly. The preparation atmosphere for a commercial war around large models is already extremely strong.Whether it is a big giant company or a small giant company, they all need such actions to show their sensitivity to cutting-edge technology and their long-term accumulation. By launching the application earlier, you can test the valuable data on the interaction between the large model and the user one day earlier, and "accumulate a lot of food, and the picture is the king" in the future competition.The key to the big model is the AI field elements - algorithms, computing power, data, and scenarios/applications. Algorithms represent strategies, computing power determines the upper limit and sets a threshold, and data equals military rations also symbolizes the distinction between good and bad. In addition to the three elements, the scene/application represents the direction of dispatching troops.The "Hundred Models War" is about to break out. Will a giant company with all the elements evolve into an infinite involution of technological capabilities? Can the small giants on the vertical track consolidate their leading position with the help of large models? Among the new players getting tickets, who could be a serious contender for industry dominance?## **General large-scale model "strength watershed" has not yet appeared**Players of large models are mainly divided into three categories: one is the Internet (Baidu, Ali, Tencent, etc.) and industry giants (China Telecom and China Unicom, etc.) Smart companies (SenseTime, Yuncong, Guangyuewai, etc.), and the last category are scientific research institutes represented by Shanghai Artificial Intelligence Laboratory, Fudan University, Harbin Institute of Technology, etc.According to public data, as of the beginning of July 2023, there are more than 80 large-scale models with parameters above 1 billion in my country, and they are still increasing rapidly. The more large models with this amount of parameters, the higher the competition threshold will be.Most of the large-scale models that have been released so far are general-purpose large-scale models. There are two main reasons: one is that the competition of large-scale models is still unclear, and the purely technical level has not widened the generation gap, and industry participants have the opportunity to dominate the world; The application-oriented large-scale model for the public has not yet appeared, and there is a lack of clear direction guidance. Before the "Chat GPT moment" of the domestic large-scale model appears, it is both an active and passive choice to join the general-purpose large-scale model.What's more, it is very likely that a new giant will emerge from the field of large models.Zhou Hongyi believes that large models must be "universal", and only general use can enter thousands of households, empower hundreds of industries, and lead the new revolution of artificial intelligence.What is left unfinished is how much investment and cooperation is required to become the leader of the new revolution. Regardless of whether the big model is a blue ocean or a red ocean market, there must be an ecological structure in which big fish lead and small fish cooperate. However, the watershed between big fish and small fish has not yet emerged.Judging from the current situation, a large model with a scale of 1 billion parameters can be regarded as the threshold for entry, and a large model with a scale of 10 billion parameters can be considered to have the ability to compete in the world, but even a large model with a scale of 100 billion parameters is far from being the best. The leading level of dust.The amount of parameters is not an overwhelming force that determines the battlefield situation. Factors such as resource scheduling capabilities, long-term experience accumulation, and large scientific research investment are all long-standing core differences in large-scale model competition.To benchmark against Open AI, it is necessary to see that behind the explosion of Chat GPT is Microsoft's comprehensive support in data, computing power, and massive funds, so that it has accumulated a lot of money in the future.Large-scale models are long-term investment industries, which simply means "burning money". The accumulation of computing power, algorithms, and data is not achieved overnight. After the model is released, it needs repeated training and agile iterations, and finally evolves into a "mature body".In the real environment, is the player of the big model driven by technology or profit? Open AI is the most famous large-scale model company in the world. Even with the explosive product Chat GPT, its commercialization ability is still worrying. As a technology company with a market value approaching US$30 billion, it will be at the center of the AI wave in 2023. Open AI’s revenue so far is still less than US$200 million.The initial investment is only the initial cost, and every subsequent training requires real money. How many companies can accept the pitiful return on investment in the large-scale model competition? The success of Chat GPT proves that the large model has been opened up on the product path, but it does not mean a great success at the commercial level.At least in terms of input-output ratio, the Internet giants have a greater relative advantage. They have enough motivation and resources to support the strategic losses in the early stage, just like Alibaba Cloud back then.As for how long it will take to burn money and when to see a gratifying return on investment, big companies don't know, nor do VCs of start-up companies. This is a gamble that can leave the game at any time, and the chips are billions of dollars.For the large-scale model players who "everyone has their own advantages", they should first explore the application layer and open the test as soon as possible. Who can accumulate more precious interaction data will be the breaking point of the next competition.## **Vertical needs and vertical difficulties**The competition for general large-scale models is more about the competition for the right to formulate infrastructure, while the vertical large-scale models rely on open-source large-scale models or API interfaces in specific scenarios to form differentiated competitiveness in segmented industries, focusing more on scenarios application.On the battlefield of general-purpose large-scale models, as time goes on, some players who are weak will gradually fall behind, and eventually there will only be a few general-purpose large-scale models, which will play the role of infrastructure. At the same time, these large models are still facing the problem of homogeneity, and the application layer still depends on the vertical large models.The general-purpose large model is like a collection of multiple vertical large models. The more training scenarios, the stronger the "generality" of the general-purpose large model.As the first company in China to release Chat GPT-like products, Baidu has an urgent need for large-scale model vertical application layers. Li Yanhong said: "More important than the number of large models is the application, which is a breakthrough in the application of vertical fields. The key point of the new international competition strategy is not how many large models there are, but how many native applications on the large models. These applications To what extent has the production efficiency been improved."According to Li Yanhong's metaphor, large models, especially general-purpose large models, are like the operating system in the AI era. All applications will be developed around the large model, above which is the application layer, including various AI native applications.In the final analysis, the so-called "universal" is only a relative concept, and there is no general-purpose model that is completely applicable to all fields and has sufficient industry depth. Taking Chat GPT as an example, it is still some industries with a high fault tolerance rate that are really widely used. Even if the solution given by the large model is wrong, the error is limited to a relatively limited range. However, in scenarios such as heavy industry, aerospace, and medical care, the loss caused by a mistake is immeasurable, that is, Chat GPT cannot meet the vertical and professional requirements of specific scenarios.To take into account the requirements of verticality and professionalism, data is a flaw, and there are fewer industries with sufficient data depth and a stable moat. Whether the data of these industries are easy to obtain and whether the obtained data can meet the ever-changing requirements of specific industries are difficult to specifically assess.Internet giants have a large amount of network data such as e-commerce, social networking, and search, but the types of data are not comprehensive enough, and the quality of data is not guaranteed. The Chinese corpus that can be used for training still needs a lot of mining work.Recently, in the fields of government affairs, public security, and medical care, vertical large-scale models are being implemented one after another. For example, Yunzhisheng self-developed the "mountain and sea" large-scale model in the field of smart medical care, combined with full-stack intelligent voice interaction technologies such as front-end sound signal processing, voiceprint recognition, speech recognition, and speech synthesis, it is expected to improve the efficiency of doctors' electronic medical record entry More than 400%, saving more than 40% of consultation time for a single patient, and improving doctor's outpatient efficiency by more than 66%.Based on its own official documents, policy documents, government affairs guides and other data as professional training data, TRS has created a large model of government affairs.In the financial field, Hang Seng Electronics will start planning and designing financial large-scale model products at the end of March 2023. At the end of June, Hang Seng Electronics and its subsidiary Hang Seng Juyuan released a new digital intelligence financial product based on large language model technology - financial intelligent assistant Photon and a new upgraded intelligent investment research platform WarrenQ. debut.Tencent, an Internet giant with multiple industry resources, is betting on multiple sides. In late June, Tencent announced MaaS service solutions covering 10 industries including finance, cultural tourism, government affairs, and education, with a total of over 50 solutions.At the same time, the data required by the vertical large model is often not limited to the industry. Some businesses may require data integration from another or more industries. The model training and application depend on the cross-industry cooperation of enterprises or the resources of Internet giants. integrate.## **Computing power: Powerful bricks fly? **In the gold rush in the western United States in the 19th century, it was a probabilistic event that the gold diggers could actually make money, while it was an inevitable result that the shovel sellers made money.In the AI gold rush, the battlefield situation of the large model is still unclear, and the players are still advancing, but the "shovel seller" has already won. Relying on the trend of AI chips and large models, Nvidia has widened the gap with its competitor AMD, and its market value has entered the "trillion dollar club".Open AI CEO Sam Altman proposed a new version of Moore's Law, that is, the computing power of global AI will double every 18 months. Maintaining these calculations requires the support of AI training chips, and Nvidia's market share in this area exceeds 90%.Nvidia's AI chip products were frantically snapped up by major technology companies around the world: In March 2023, Microsoft announced that it had helped OpenAI build a new computing center with tens of thousands of A100s; in May, Google launched an H100 with 26,000 pieces The computing cluster ComputeEngineA3. In addition, according to information from China National Finance Securities, ByteDance has ordered more than US$1 billion in GPUs this year, and it is estimated that there are 100,000 pieces of A100 and H800 that have arrived and have not arrived. Tens of thousands of H800 chips are also used in the new version of Tencent Cloud High Performance Computing Service Center released by Tencent.Nvidia CFO Kress said that the current market demand for AI computing power has exceeded the company's expectations for the next few quarters, and there are too many orders to fulfill.Of course, it is useless for us to envy the money Nvidia makes.The domestic GPU track is also catching up. There are not only Internet giants self-developed AI chips, such as Baidu AI chip Kunlun, Tencent video processing chip "Canghai" and AI chip "Zixiao", etc., but also Suiyuan Technology, Tianshu Zhixin, Emerging companies such as Moore Threads that develop general-purpose GPUs. General-purpose GPUs are used for various general-purpose tasks, including highly parallel computing capabilities and large-scale computing cores. There has also been great progress in recent years, and the gap with high-performance GPUs is gradually narrowing.Wu Hequan, an academician of the Chinese Academy of Engineering, suggested that under the coordination of the national science and technology and industrial plans, a reasonable division of labor should be formed to form a joint computing power, and the computing power platform of the national laboratory should be opened to support various large-scale model training. At the same time, it is suggested to form a computing power alliance to concentrate existing high-end GPUs. Computing resources provide the computing power required for large model data training.In addition to high-performance GPUs, lower-cost computing platforms are also considered new market opportunities. Recently, Jiuzhang Yunji revealed that it will continue to cooperate with state-owned cloud manufacturers, and include a large number of intelligent computing centers in the market as partners, and provide customers with an AI model research and development platform integrating software and hardware. The cost of customers will be tied to computing power. Certainly.Computing power is the basis for the development of large models, and it is a necessary condition but not a sufficient condition. The maximum role that computing power can play still depends on the direction of use. Only when algorithm innovation, data resource construction, and training framework iteration go hand in hand, can it be possible to create "powerful bricks flying".## **Policy: Guidance and regulation at critical moments**The period of the AI explosion coincides with the critical moment of algorithm governance and algorithm filing in our country.As early as 2021, the "Guiding Opinions on Strengthening the Comprehensive Governance of Internet Information Service Algorithms" put algorithm filing management as an important part of the improvement of the supervision system. The Management Regulations clearly stipulate or mention that "algorithm recommendation service providers with public opinion attributes or social mobilization capabilities shall perform filing procedures."In April 2023, the Cyberspace Administration of China drafted the "Administrative Measures for Generative Artificial Intelligence Services (Draft for Comment)" for public consultation. In June, the "2023 Legislative Work Plan of the State Council" issued by the State Council showed that the draft artificial intelligence law was prepared to be submitted to the Standing Committee of the National People's Congress for deliberation.The "Generative Artificial Intelligence Service Management Measures (Draft for Comment)" mentioned that before using generative artificial intelligence products to provide services to the public, it should be reported to the State Network in accordance with the "Regulations on the Security Evaluation of Internet Information Services with Public Opinion Attributes or Social Mobilization Capabilities". The information department shall apply for a security assessment, and perform the algorithm filing, modification, and cancellation filing procedures in accordance with the "Internet Information Service Algorithm Recommendation Management Regulations".This is also one of the reasons why there are no large-scale model products available to the public.Professor Chen Bing, deputy dean of the Law School of Nankai University and a special researcher at the China New Generation Artificial Intelligence Development Strategy Research Institute, believes that pre-regulation will not necessarily damage technological innovation, but it should be noted that due to prior review, it will increase the number of enterprises to a certain extent. If the scope of prior review is not properly set, it may inhibit the research and development and training efficiency of generative AI products, and objectively lead to a slowdown in the development of generative AI.Since the risks of artificial intelligence cannot be perfectly estimated in advance, and the supervision after the event may cause huge damage, my country currently adopts full-process supervision of the development of artificial intelligence.Under the regulation of the whole process, the compliance cost of large-scale model players will undoubtedly increase, and the record-filing system also urges players in the game to seek record-filing first in order to promote products to the market earlier, objectively speeding up the speed of big waves. The gradual improvement of laws and regulations is accompanied by the process of industry reshuffle and the weak being left behind, which can also bring the moment of clearing the clouds to the sun earlier.