A Study Based on Large Language Model (2024)

11institutetext: School of Software, Zhejiang University 22institutetext: College of Computer Science and Technology, Zhejiang University
22email: {huangzww, lijuan18, longjin, wangjj2018, mingchentz, 22351088, zhiqiangliu, mjw.cs, zhang.wen}@zju.edu.cn

Zhiwei Huang 11  Juan Li11  Long Jin11  Junjie Wang11  Mingchen Tu11  Yin Hua11  Zhiqiang Liu11  Jiawei Meng22  Wen ZhangCorresponding author.11

Abstract

As the development of academic conferences fosters global scholarly communication,researchers consistently need to obtain accurate and up-to-date information about academic conferences.Since the information is scattered, using an intelligent question-answering system to efficiently handle researchers’ queries and ensure awareness of the latest advancements is necessary. Recently, Large Language Models (LLMs) have demonstrated impressive capabilities in question answering, and have been enhanced by retrieving external knowledge to deal with outdated knowledge.However, these methods fail to work due to the lack of the latest conference knowledge. To address this challenge, we develop the ConferenceQA dataset, consisting of seven diverse academic conferences. Specifically, for each conference, we first organize academic conference data in a tree-structured format through a semi-automated method.Then we annotate question-answer pairs and classify the pairs into four different types to better distinguish their difficulty.With the constructed dataset, we further propose a novel method STAR (STructure-Aware Retrieval) to improve the question-answering abilities of LLMs, leveraging inherent structural information during the retrieval process.Experimental results on the ConferenceQA dataset show the effectiveness of our retrieval method.The dataset and code are available at https://github.com/zjukg/ConferenceQA.

Keywords:

Conference datasetLarge language modelRetrieval augmentation.

1 Introduction

The rapid advancement of computer science has led to an increase in research presented at academic conferences, which are crucial for academic exchange. Given the vast and dispersed nature of conference information, querying is a more efficient method for information retrieval than navigating multiple sources.

Recent advancements in Large Language Models (LLMs) [21, 2, 7] have significantly impacted various NLP tasks, including question answering. LLMs demonstrate capabilities like chain-of-thought reasoning[3] and in-context learning[6], enhanced by increasing model parameters and extensive training data. After instruction fine-tuning [5], LLMs excel in conversational tasks and information retrieval[4].

Despite the success of LLMs, they are related to incompleteness, untimeliness, unfaithfulness, and having limitations in updating timely and domain-specific expertise.This necessitates research efforts to integrate LLMs with external knowledge sources, such as knowledge bases (KBs)[8], search engines[9] and databases[10].Regarding academic conference queries, due to the missing external conference knowledge, LLMs fail to access the latest academic conference information in question answering, such as academic conferences in 2022 and later ones.Existing retrieval methods are efficient but primarily focus on plain text[11], triples[15], and tables [16], which does not align well with the structured nature of conference websites, complicating direct application for conference-specific queries.

In this paper, we introduce ConferenceQA, a benchmark comprising seven recent top-tier academic conferences, these conferences span various research domains such as web science, natural language processing, machine learning, databases, artificial intelligence, and the semantic web, providing a comprehensive dataset that organizes information across all stages of the conferences.To construct this dataset, we initially employ a semi-automatic method to convert the conference information into a tree structure. Subsequently, we utilize ChatGPT to simulate roles with diverse backgrounds, enabling us to generate role-specific questions. These questions are then carefully filtered and annotated with answers to ensure the dataset’s reliability. Additionally, we document the sources of the answers to further enhance the dataset’s credibility.Besides, we categorize the questions into four types given the complexity of getting answers.

On the constructed ConferenceQA dataset, we introduce STAR (STructure-Aware Retrieval), a method leveraging LLMs for hierarchical data, and then proceed to conduct a study on conference QA.Our method generates a textual description for each path based on both its surrounding structural information and its own textual information.We conduct experiments using various LLMs,along with different retrievers.Compared to path retrieval, structural-aware retrieval shows an average relative F1 score improvement of 15.50% across different LLMs and 17.03% when using different retrievers. This highlights the effectiveness of STAR on the tree-structured ConferenceQA dataset.

A Study Based on Large Language Model (1)

Our contributions can be summarized as follows:

  1. 1.

    We construct a benchmark called ConferenceQA, organizing conference information in a tree structure, to assist evaluate question answering about academic conferences.

  2. 2.

    We introduce a novel method called STAR. By utilizing the structural information around nodes to generate textual descriptions, and using these descriptions for retrieval, it can effectively enhance answer performance.

  3. 3.

    We conduct experiments on the ConferenceQA dataset, proving that LLMs enhanced with retrieval methods could successfully answer questions about academic conferences and our STAR method consistently outperforms path retrieval method, offering meaningful insights.

2 Dataset Construction

In this section, we introduce the construction of the ConferenceQA dataset. We select the conference information of seven typical academic conferences in 2022 or 2023 to build the dataset based on their official website, where the most accurate information about the conferences is stored. Each conference is assigned to one data annotator with relevant experience in the realm of academic conferences.We use three steps, including hierarchical data transformation, QA pair generation and question classification, to construct each conference dataset. The overview of the construction process is shown in Fig.1.

2.1 Hierarchical Data Transformation

Data transformation in the ConferenceQA dataset involves standardizing the diverse formats of academic conference data sourced from official conference websites into a unified tree structure. Each conference page combines unstructured text, like conference introductions and paper submission guidelines, with structured data such as payment and schedule details. To manage this format variability, we employ a semi-automated method to create tree-structured data for each conference.

Specifically, the automated component converts structured table data into a tree format using ChatGPT, as shown in Fig.1, where registry information is transformed. For other structured data, such as accepted papers with consistent schemas (title, authors, abstract), we employ web crawlers to fetch HTML pages and convert them into corresponding tree-structured data based on the HTML tags. The manual component involves annotating inter-page relationships. Annotators assign page titles to tree nodes based on the linkage among pages, evident in navigation bars and subpage links like ‘calls’, ‘proceedings’ and ‘programs’. Additionally, subtitles within pages are identified and designated as child nodes under the relevant page titles. These manual steps are essential to maintain the dataset’s quality and coherence.

Ultimately, we obtain seven conference datasets organized in a tree-structured format. They are served as accurate and rigorous knowledge sources.

2.2 QA Pair Generation

This step involves generating reliable question-answer pairs through role creation, LLM-generated questions, and manual annotation. For each conference, we utilize ChatGPT to simulate the roles of conference participants, generating relevant questions which are then manually filtered and annotated with answers and their sources to ensure realism and reliability.

We use ChatGPT to create 20 roles characterized by specific attributes such as age, research direction, position, publication history, and conference attendance experience, mimicking real-life researchers with diverse backgrounds interested in the conferences.With these roles, we prompt ChatGPT to engage in role-playing scenarios, generating five varied questions per conference. These questions cover different areas of interest or uncertainty relevant to the roles’ diverse backgrounds. To avoid redundancy and enhance question diversity, we iteratively prompt the model. Specifically, we use the results generated by the ChatGPT as examples for the next iteration and encourage the ChatGPT to generate more diverse questions. In the final step, we manually review and filter the questions to eliminate duplicates and unrealistic queries. We then annotate the answers based on our tree-structured data, ensuring the reliability of the dataset by documenting the source of each answer within the constructed academic conference data.

2.3 Question Classification

To assess the model’s capability in handling questions of varying difficulty, we design a scheme to classify the question-answer pairs based on two criteria: the method used to generate the answer and the complexity of paths required to arrive at the correct answer.

Extraction vs. Reasoning: This category evaluates the process of answer generation. Answers directly pulled from the dataset are labeled as extraction, whereas answers that necessitate reasoning beyond the dataset content are labeled as reasoning. Reasoning questions are more challenging than extraction questions because, unlike direct extraction, reasoning questions require the model to have the capability to infer the relationship between the retrieved paths and the question.

Atomic vs. Complex: This category assesses the complexity of paths needed to generate the answer. Answers that depend on a single path are termed atomic, while those requiring multiple paths are termed complex. Complex questions are more difficult than atomic questions because, instead of a single path, complex questions require recalling multiple paths to derive an answer.

Combining these dimensions results in four levels of difficulty: extraction-atomic, extraction-complex, reasoning-atomic, and reasoning-complex. This classification is vital for analyzing the model’s performance across different complexities and reasoning demands.

2.4 Dataset Validation

Following data construction, a thorough validation process is conducted by three independent assessors who evaluate each QA pair across three critical dimensions. The first dimension assesses the alignment between each question and its answer, ensuring the answer accurately addresses the question. Concurrently, the second dimension examines the reliability of the answer source, ensuring it provides the necessary information for the question. The third dimension evaluates the practical relevance of each question, ensuring it reflects real-world needs and concerns. If a QA pair fails to meet the criteria in any dimension, as agreed upon by at least two assessors, it is marked for removal and redesign. This rigorous process ensures each QA pair is validated comprehensively, maintaining the quality and reliability of the dataset. Detailed statistics of the selection process for each conference are shown in Table LABEL:tab:dataset.

Conference#Paths#Depth#EA#EC#RA#RC
WWW2023151277.0132271736
ACL2023143069.0529213025
ICML202347158.5226272819
SIGMOD202363387.4639272334
IJCAI2023158006.1328261333
ICDE202397369.1428242221
ISWC202235947.5333422518
Avg99167.8331282327

3 Method

In this section, we discuss LLM-based methods for academic conference question-answering. The prevalent approach involves using an external knowledge source for retrieval[15, 10, 13], where the reader’s query q𝑞qitalic_q extracts relevant content c𝑐citalic_c from a domain-specific knowledge base, and this content is then combined with the query for the LLM to generate an answer. This retrieval-based method can be formalized as a=LLM(q,c)𝑎𝐿𝐿𝑀𝑞𝑐a=LLM(q,c)italic_a = italic_L italic_L italic_M ( italic_q , italic_c ) where c=Retriever(q,𝒦)𝑐𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑟𝑞𝒦c=Retriever(q,\mathcal{KB})italic_c = italic_R italic_e italic_t italic_r italic_i italic_e italic_v italic_e italic_r ( italic_q , caligraphic_K caligraphic_B ). It optimizes the retriever, suchthat for each question q𝑞qitalic_q, the model can give an answer a𝑎aitalic_a that has high accuracy or relevancy with a correct answer.Our approach adheres to this retrieval-based model but is adapted for our conference’s tree-structured dataset. We preprocess this structured data to facilitate content retrieval and introduce a novel method named STAR (STructure-Aware Retrieval), which effectively integrates structural and semantic data for improved retrieval performance.

3.1 Tree-structured Data Processing

The tree-structured data is hierarchically arranged, with each node representing a page or a section heading, and each leaf node corresponding to its specific content. For retrieval, we pair each leaf node with its root node to provide additional context to the LLM. Paths in the tree use the ‘>>’ field to denote hierarchical relationships and contain both structural and semantic information. An example path is: WWW2023>>Attendees>>Registration>>Register Fee>>Virtual Conference>>ACM Members>>$300.After the tree-structured data processing, the knowledge source for retrieval could be represented as a set of paths that 𝒫={p1,p2,,pm}𝒫subscript𝑝1subscript𝑝2subscript𝑝𝑚\mathcal{P}=\{p_{1},p_{2},...,p_{m}\}caligraphic_P = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } where m𝑚mitalic_m is the number of paths in the dataset.

3.2 Path Retrieval

Upon receiving a query input q𝑞qitalic_q, the retriever selects a subset of paths from 𝒫={p1,p2,,pm}𝒫subscript𝑝1subscript𝑝2subscript𝑝𝑚\mathcal{P}=\{p_{1},p_{2},...,p_{m}\}caligraphic_P = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } that are relevant to q𝑞qitalic_q.Following established methods [20], we use a dense retriever based on a dual encoder framework. This framework employs an encoder to transform both the query q𝑞qitalic_q and each pathp𝒫𝑝𝒫p\in\mathcal{P}italic_p ∈ caligraphic_P into embeddings. The similarity between the query and path embeddings is assessed using cosine similarity, and the top-k𝑘kitalic_k paths with the highest similarity scores are retrieved, as expressed in (1), where E denotes the embedding function.

c=topk({cos(E(q),E(p))|p𝒦𝒫})𝑐𝑡𝑜𝑝𝑘conditional-setE𝑞E𝑝𝑝𝒦𝒫c=topk(\{\cos(\textbf{E}(q),\textbf{E}(p))|p\in\mathcal{KP}\})italic_c = italic_t italic_o italic_p italic_k ( { roman_cos ( E ( italic_q ) , E ( italic_p ) ) | italic_p ∈ caligraphic_K caligraphic_P } )(1)
A Study Based on Large Language Model (2)

3.3 Structure-aware Retrieval

The limitation of treating a single path as the retrieval object is that it disconnects the structural relationships among paths. For example, the relationship between an author’s name and their affiliated institution is lost when paths are retrieved independently.

To overcome this, we introduce a novel method called STAR (STructure-Aware Retrieval). As shown in Fig.2, STAR employs ChatGPT to iteratively generate textual descriptions for each path desp𝑑𝑒subscript𝑠𝑝des_{p}italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, from the root to individual nodes, in a top-down manner. We enhance the retrieval process by incorporating structural information in the user input, which includes siblings, parent path descriptions, and the query path itself. This approach helps maintain the contextual relevance of each path, which is crucial for recognizing relationships like those between an author and their institution. For instance, when generating path descriptions, we not only consider the node’s immediate context but also integrate the structural significance of related nodes. This includes the siblings of a node and their parent nodes, ensuring a comprehensive representation of each path’s context. To avoid the loss of information about the siblings of leaf nodes, we append the text of their parent node to each sibling of the leaf nodes. Ultimately, this method effectively preserves and utilizes structural relationships, enhancing the retrieval process.

Thus we can construct a knowledge source of path descriptions 𝒦𝒫desp={(p,desp)|p𝒦𝒫}𝒦subscript𝒫𝑑𝑒subscript𝑠𝑝conditional-set𝑝𝑑𝑒subscript𝑠𝑝𝑝𝒦𝒫\mathcal{KP}_{des_{p}}=\{(p,des_{p})|p\in\mathcal{KP}\}caligraphic_K caligraphic_P start_POSTSUBSCRIPT italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { ( italic_p , italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) | italic_p ∈ caligraphic_K caligraphic_P }, containing pairs of paths and their descriptions. For retrieval, we use the similarity between the query and each path description as the score for that path. We then retrieve the top-k𝑘kitalic_k paths with the highest similarity scores to the query q𝑞qitalic_q. The embedding of the element is denoted by E, and this process is formalized as shown in (2).

c=topk({cos(E(q),E(desp))|(p,desp)𝒦𝒫desp})𝑐𝑡𝑜𝑝𝑘conditional-setE𝑞E𝑑𝑒subscript𝑠𝑝𝑝𝑑𝑒subscript𝑠𝑝𝒦subscript𝒫𝑑𝑒subscript𝑠𝑝c=topk(\{\cos(\textbf{E}(q),\textbf{E}(des_{p}))|(p,des_{p})\in\mathcal{KP}_{%des_{p}}\})italic_c = italic_t italic_o italic_p italic_k ( { roman_cos ( E ( italic_q ) , E ( italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) ) | ( italic_p , italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) ∈ caligraphic_K caligraphic_P start_POSTSUBSCRIPT italic_d italic_e italic_s start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT } )(2)

4 Experiments

In this section, we conduct question answering experiments on conference datasets to explore: 1) How does the STAR perform with different LLMs? 2) How does the STAR perform with different retrievers? 3) How does the STAR perform with different academic conferences?

4.1 Experimental Details

Based on the constructed ConferenceQA, we use currently popular LLMs, including Bloom (7B) [31], GPT-J (6B) [30], Flan-T5 (xl and xxl) [29], LLaMA2 (7B and 13B) [7], Mistral (7B) [25] and ChatGPT, as the main evaluation backbone to assess the performance of mainstream LLMs. For ChatGPT, we employ GPT-3.5-turbo and access it via API111 from https://api.openai.com/.We employ BM25[1], SentenceBert[26], DPR[27], ANCE[28] and text-embedding-ada-002 as our retriever.In addition, we use Chroma222https://github.com/chroma-core/chroma as our vector database and employ cosine similarity for matching. In all experiments, we select the top 5 paths retrieved.

4.2 Evaluation Metrics

In line with prior studies, we assess the QA capabilities of LLMs using the F1 score and the exact match (EM) score. Specifically, we employ GPT-4 to compute the EM, referred to as EM-GPT4.

The F1 score quantifies the overlap between the predicted and correct answers by calculating the harmonic mean of precision and recall.

The EM-GPT4 score evaluates the proportion of instances where the LLM’s predicted answer exactly matches the correct answer. Given the generative nature of LLMs, slight textual variations in responses might still represent the same answer. We use GPT-4, a highly advanced LLM known for its semantic understanding capabilities, to precisely assess if the LLM’s response matches the golden answers.

LLMsF1EM-GPT4
EAECRARCEAECRARC
Bloom-7B119.60-0.3611.19+1.9617.01+0.3611.58-0.2030.27+0.4215.03+4.4141.70-3.4317.58-1.36
GPT-J-6B14.53+1.768.81+3.1115.52+2.468.42-0.2519.11+7.0412.93+5.5334.16+5.0313.08+1.94
Flan-T5-xl27.74+7.8514.68+2.7736.03+0.9619.00+2.8935.50+9.5620.78+0.8659.38+3.9725.74+2.27
Flan-T5-xxl32.31+8.9714.01+9.6937.08+4.320.86+0.9840.81+10.6918.58+12.6455.76+11.7125.36+3.58
LLaMA2-7B14.05+2.2312.09+1.2512.47+3.008.48+0.1221.32+0.159.22+2.7723.81+5.839.89-1.15
LLaMA2-13B29.57+2.8220.92+4.0025.71+4.2013.64+2.8341.16+6.2624.02+2.2155.00+6.5320.23+4.46
Mistral-7B30.75+4.3123.67+4.6925.87+4.1115.91-0.3743.33+10.5327.58+13.9559.90+6.8929.23+1.55
GPT-3.5-turbo28.35+7.521.54+4.8324.66+9.6216.21+0.7840.53+13.4325.10+9.0349.97+11.3425.75+1.45

4.3 Experimental Results Analysis

Effect of Different LLMs We analyzed the performance of various LLMs on different types of questions to understand their perception capabilities and limitations. The results, shown in Table 2, provide several insights: (1) Our STAR method significantly improves the answering performance across various LLMs. For instance, on models like Bloom-7B1, GPT-J-6B, and GPT-3.5-turbo, F1 scores increased by 4%, 14.9%, and 25.04% respectively, while EM-GPT4 scores improved by 0.04%, 24.65%, and 24.94%. The least improvement was on Bloom-7B1, suggesting its inherent limitations. However, substantial gains on other models demonstrate our method’s effectiveness. (2) There is an inconsistency between F1 and EM-GPT4 scores; lower F1 scores sometimes align with higher EM-GPT4 scores. This may be due to LLMs generating longer textual responses, affecting F1 accuracy but not EM-GPT4, which better evaluates semantic similarity. (3) The complexity of question types affects performance; atomic questions are simpler than complex ones. Atomic questions, akin to single-hop queries, generally show higher accuracy than multi-hop complex questions. Despite this, LLMs perform comparably or better on reasoning questions than on extraction, likely due to their robust contextual learning and reasoning capabilities. (4) Different LLMs show varied understanding of paths. For example, under the same retrieval conditions, Mistral-7B outperforms GPT-3.5-turbo. Generally, models with more parameters, like LLama2-13B and Flan-T5-xxl, achieve higher accuracy, supporting the notion that larger LLMs perform better.

RetrieversF1EM-GPT4
EAECRARCEAECRARC
BM2521.02+9.1416.81+10.2925.77-2.7914.72+1.3725.90+4.914.12+8.136.50+2.945.16+5.55
SentenceBERT38.62-3.7323.70-0.0112.97+2.0516.23-0.1839.83-2.5714.81+11.1229.79+17.6622.88-1.47
DPR30.56+3.7223.72+0.5927.35+2.6021.16-0.4330.95+5.9115.17-0.1452.70+0.3110.66+1.38
ANCE28.24+8.7217.41+4.7516.52+9.118.00+2.2641.66+6.1430.12+4.0750.84+0.4915.25+1.21
ada-00228.35+7.521.54+4.8324.66+9.6216.21+0.7840.53+13.4325.10+9.0349.97+11.3425.75+1.45

Effect of Different Retrievers We evaluated four retrievers—BM25 [1], SentenceBert [26], DPR [27], and ANCE [28]—using the gpt-3.5-turbo generator across four question types within the ConferenceQA dataset. The results, detailed in Table 3, reveal: (1) BM25 showed weak performance, especially with extraction-atomic and reasoning-complex questions. In contrast, dense retrievers like SentenceBERT, DPR, and ANCE significantly outperformed BM25, underscoring the advantages of dense retrieval methods. (2) Performance varied among dense retrievers: SentenceBERT was effective in extraction-atomic questions but less so in reasoning-atomic questions. DPR excelled in reasoning-atomic questions, while ANCE showed consistent performance across all question types. This indicates that selecting an appropriate retriever can significantly impact question-answering effectiveness. (3) While STAR occasionally had negative effects in some configurations, it generally enhanced performance across most settings, demonstrating its utility and reliability.

A Study Based on Large Language Model (3)

Effect of Different Conference Fig.3 shows the performance across various conferences using text-embedding-ada-002 and gpt-3.5-turbo as the retriever and generator, respectively. Key observations include: (1) There is notable variability in question difficulty across conferences, highlighting the diversity of our dataset. (2) Significant differences in difficulty are apparent between conferences; for example, the average EM-GPT4 score at ICML is 94.9% higher than at ACL,underscoring the importance of accounting for conference-specific characteristics in question-answering research.(3) Except for reasoning-atomic questions at SIGMOD and reasoning-complex questions at ISWC, our STAR method consistently outperforms traditional path retrieval, demonstrating its versatility and effectiveness across different conferences and question types.

5 Related Work

In academic data science, foundational resources such as CiteSeerX[19], a digital library for scientific literature, and Unarxive[18], which hosts over a million documents from arXiv.org, are crucial for scholarly communication. Zhang et al.[17] developed Maple, a benchmark for tagging scientific literature across 19 disciplines. However, there remains a notable gap in benchmarks specifically designed for academic conference QA, despite the increasing diversity and volume of literature datasets.

Simultaneously, augmenting language models with data from various knowledge bases has significantly improved performance across many NLP tasks[22, 23]. Techniques such as Atlas[11], which fine-tunes an encoder-decoder model with a retriever, and RETRO[12], which integrates retrieved texts into a decoder-only model, utilize large volumes of unstructured text. Other approaches like REPLUG[13] and FLARE[14] dynamically retrieve information based on context, treating LLMs as black boxes. In structured knowledge, methods include extracting triples from knowledge graphs for KGQA tasks[15, 10] and converting them into textual prompts for LLMs[24]However, the use of hierarchical data such as tree-structured data in retrieval augmentation is still limited.

6 Conclusion

In this work, we developed the ConferenceQA dataset, which organizes recent academic conference information into a tree-structured format to support question answering. We introduce a novel approach, STAR, that enhances question-answering performance by generating textual descriptions for each path within the tree, effectively utilizing both structural and textual data. The ConferenceQA dataset and STAR method have advanced the development of robust and adaptable academic conference question-answering systems. Future efforts will focus on integrating LLMs with tree-structured data to improve domain-specific knowledge access and reasoning.

Acknowledgements

This work is founded by National Natural Science Foundation of China (NSFC62
306276), Zhejiang Provincial Natural Science Foundation of China (No. LQ23F02
0017), Yongjiang Talent Introduction Programme (2022A-238-G), Ningbo Natural Science Foundation (2023J291), and Fundamental Research Funds for the Central Universities (226-2023-00138).

References

  • [1]S.Robertson, H.Zaragoza etal., “The probabilistic relevanceframework: Bm25 and beyond,” Foundations and Trends®in Information Retrieval, 2009.
  • [2]OpenAI, “Gpt-4 technical report,” 2023.
  • [3]J.Wei, X.Wang, D.Schuurmans, M.Bosma, F.Xia, E.Chi, Q.V. Le, D.Zhouetal., “Chain-of-thought prompting elicits reasoning in largelanguage models,” Advances in Neural Information Processing Systems, 2022.
  • [4]T.Kojima, S.S. Gu, M.Reid, Y.Matsuo, and Y.Iwasawa, “Large languagemodels are zero-shot reasoners,” Advances in neural informationprocessing systems, 2022.
  • [5]H.W. Chung, L.Hou, S.Longpre, B.Zoph, Y.Tay, W.Fedus, E.Li, X.Wang,M.Dehghani, S.Brahma etal., “Scaling instruction-finetunedlanguage models,” arXiv preprint arXiv:2210.11416, 2022.
  • [6]S.Min, X.Lyu, A.Holtzman, etal., “Rethinking the role of demonstrations: What makesin-context learning work?” arXiv preprint arXiv:2202.12837, 2022.
  • [7]H.Touvron, L.Martin, K.Stone etal.,“Llama 2: Openfoundation and fine-tuned chat models,” 2023.
  • [8]A.Modarressi, A.Imani, M.Fayyaz, and H.Schütze, “Ret-llm: Towards ageneral read-write memory for large language models,” 2023.
  • [9]T.Schick, J.Dwivedi-Yu, R.Dessì, R.Raileanu, M.Lomeli, L.Zettlemoyer, etal., “Toolformer: Language models can teachthemselves to use tools,” 2023.
  • [10]C.Hu, J.Fu, C.Du, S.Luo, J.Zhao, and H.Zhao, “Chatdb: Augmenting llmswith databases as their symbolic memory,” 2023.
  • [11]G.Izacard, P.Lewis, M.Lomeli, L.Hosseini, F.Petroni, T.Schick,J.Dwivedi-Yu, A.Joulin, S.Riedel, and E.Grave, “Atlas: Few-shot learningwith retrieval augmented language models,” 2022.
  • [12]S.Borgeaud, A.Mensch, J.Hoffmann, etal.,“Improving language models by retrieving from trillions of tokens,” inInternational conference on machine learning.PMLR, 2022.
  • [13]W.Shi, S.Min, M.Yasunaga, M.Seo, R.James, M.Lewis, L.Zettlemoyer, andW.-t. Yih, “Replug: Retrieval-augmented black-box language models,”arXiv preprint arXiv:2301.12652, 2023.
  • [14]Z.Jiang, F.F. Xu, L.Gao, Z.Sun, Q.Liu, J.Dwivedi-Yu, Y.Yang, J.Callan,and G.Neubig, “Active retrieval augmented generation,” arXivpreprint arXiv:2305.06983, 2023.
  • [15]P.Sen, S.Mavadia, and A.Saffari, “Knowledge graph-augmented language modelsfor complex question answering,” 2023.
  • [16]V.Zhong, C.Xiong, and R.Socher, “Seq2sql: Generating structured queriesfrom natural language using reinforcement learning,” arXiv preprintarXiv:1709.00103, 2017.
  • [17]Y.Zhang, B.Jin, Q.Zhu, Y.Meng, and J.Han, “The effect of metadata onscientific literature tagging: A cross-field cross-model study,” inProceedings of the ACM Web Conference 2023, 2023.
  • [18]T.Saier and M.Färber, “unarxive: a large scholarly data set withpublications’ full-text, annotated in-text citations, and links tometadata,” Scientometrics, 2020.
  • [19]C.L. Giles, K.D. Bollacker, and S.Lawrence, “Citeseer: An automaticcitation indexing system,” in Proceedings of the third ACM conferenceon Digital libraries, 1998.
  • [20]J.Ni, C.Qu, J.Lu, Z.Dai, G.H., etal., “Large dual encoders aregeneralizable retrievers,” arXiv preprint arXiv:2112.07899, 2021.
  • [21]T.Brown, B.Mann, N.Ryder, M.Subbiah, J.D. Kaplan, P.Dhariwal,A.Neelakantan, P.Shyam, G.Sastry, A.Askell etal., “Languagemodels are few-shot learners,” Advances in neural informationprocessing systems, 2020.
  • [22]K.Guu, K.Lee, Z.Tung, P.Pasupat, and M.Chang, “Retrieval augmentedlanguage model pre-training,” in International conference on machinelearning.PMLR, 2020.
  • [23]P.Lewis, E.Perez, A.Piktus, F.Petroni, etal.,“Retrieval-augmented generation for knowledge-intensive nlp tasks,”Advances in Neural Information Processing Systems, 2020.
  • [24]Y.Wu, N.Hu, G.Qi, S.Bi, J.Ren, A.Xie, and W.Song,“Retrieve-rewrite-answer: A kg-to-text enhanced llms framework for knowledgegraph question answering,” arXiv preprint arXiv:2309.11206, 2023.
  • [25]A.Q. Jiang, A.Sablayrolles, A.Mensch, etal., “Mistral 7b,” arXiv preprint arXiv:2310.06825, 2023.
  • [26]N.Reimers and I.Gurevych, “Sentence-bert: Sentence embeddings using siamesebert-networks,” arXiv preprint arXiv:1908.10084, 2019.
  • [27]V.Karpukhin, B.Oğuz, S.Min, P.Lewis, L.Wu, S.Edunov, D.Chen, andW.-t. Yih, “Dense passage retrieval for open-domain question answering,”arXiv preprint arXiv:2004.04906, 2020.
  • [28]L.Xiong, C.Xiong, Y.Li, K.-F. Tang, J.Liu, P.Bennett, J.Ahmed, andA.Overwijk, “Approximate nearest neighbor negative contrastive learning fordense text retrieval,” arXiv preprint arXiv:2007.00808, 2020.
  • [29]S.Longpre, L.Hou, T.Vu, A.Webson, etal.“The flan collection: Designing data and methods for effective instruction tuning,” in International Conference on Machine Learning.PMLR, 2023.
  • [30]B.Wang and A.Komatsuzaki, “Gpt-j-6b: A 6 billion parameter autoregressive language model,” 2021.
  • [31]T.LeScao, A.Fan, C.Akiki, E.Pavlick, etal., “Bloom: A 176b-parameter open-access multilingual language model,” 2022.
A Study Based on Large Language Model (2024)
Top Articles
Expert publications - About - The University of Queensland
Validexamdumps.com Review: Legit or Scam?
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
Chicago Neighborhoods: Lincoln Square & Ravenswood - Chicago Moms
biBERK Business Insurance Provides Essential Insights on Liquor Store Risk Management and Insurance Considerations
Revitalising marine ecosystems: D-Shape’s innovative 3D-printed reef restoration solution - StartmeupHK
Vichatter Gifs
Methodist Laborworkx
Indiana Immediate Care.webpay.md
8 Ways to Make a Friend Feel Special on Valentine's Day
OpenXR support for IL-2 and DCS for Windows Mixed Reality VR headsets
Where does insurance expense go in accounting?
Flights To Frankfort Kentucky
Lax Arrivals Volaris
Samsung Galaxy S24 Ultra Negru dual-sim, 256 GB, 12 GB RAM - Telefon mobil la pret avantajos - Abonament - In rate | Digi Romania S.A.
ᐅ Bosch Aero Twin A 863 S Scheibenwischer
Soccer Zone Discount Code
Divina Rapsing
Mission Impossible 7 Showtimes Near Marcus Parkwood Cinema
Carson Municipal Code
Allentown Craigslist Heavy Equipment
Project, Time & Expense Tracking Software for Business
Iroquois Amphitheater Louisville Ky Seating Chart
College Basketball Picks: NCAAB Picks Against The Spread | Pickswise
Shadbase Get Out Of Jail
Obituaries Milwaukee Journal Sentinel
Suspiciouswetspot
Makemv Splunk
Costco Jobs San Diego
Craigslist Auburn Al
Mega Millions Lottery - Winning Numbers & Results
Robot or human?
Baywatch 2017 123Movies
Wrigley Rooftops Promo Code
Craigslist Com Panama City Fl
Sun Tracker Pontoon Wiring Diagram
Disassemble Malm Bed Frame
Sallisaw Bin Store
Here's Everything You Need to Know About Baby Ariel
2Nd Corinthians 5 Nlt
Wgu Admissions Login
Actress Zazie Crossword Clue
Ronnie Mcnu*t Uncensored
Craigslist Marshfield Mo
Who Is Nina Yankovic? Daughter of Musician Weird Al Yankovic
The Hardest Quests in Old School RuneScape (Ranked) – FandomSpot
Assignation en paiement ou injonction de payer ?
Causeway Gomovies
Diablo Spawns Blox Fruits
Haunted Mansion Showtimes Near The Grand 14 - Ambassador
Dinargurus
Latest Posts
Article information

Author: Roderick King

Last Updated:

Views: 5915

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Roderick King

Birthday: 1997-10-09

Address: 3782 Madge Knoll, East Dudley, MA 63913

Phone: +2521695290067

Job: Customer Sales Coordinator

Hobby: Gunsmithing, Embroidery, Parkour, Kitesurfing, Rock climbing, Sand art, Beekeeping

Introduction: My name is Roderick King, I am a cute, splendid, excited, perfect, gentle, funny, vivacious person who loves writing and wants to share my knowledge and understanding with you.