6️⃣Prompt Compression with LLMLingua

RAG(Retrieval-Augmented Generation)에서는 입력 토큰이 가장 많은 리소스를 소비하며, 사용자 쿼리는 일반적으로 벡터 스토리지로 전송되어 가장 유사한 정보의 벡터 데이터를 가져옵니다. 사용자의 쿼리에 의해 벡터 스토리지에서 검색된 문맥이나 관련 문서에 따라 Prompt(input)는 수천 개의 토큰에 도달할 수도 있습니다.

Prompt compression은 가장 중요한 정보를 유지하면서 원본 프롬프트를 단축하고 언어 모델 응답 생성 속도를 높이는 데 사용되는 기술입니다.

프롬프트 압축의 기반이 되는 이론은 언어에는 종종 불필요한 반복이 포함된다는 것입니다.

LLMLingua

GitHub - microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.GitHub

LLMLingua는 잘 훈련된 컴팩트한 언어 모델(예: GPT2-small, LLaMA-7B)을 활용하여 프롬프트에서 필수적이지 않은 토큰을 식별하고 제거합니다. 이 접근 방식은 대규모 언어 모델(LLM)로 효율적인 추론을 가능하게 하여 성능 손실을 최소화하면서 최대 20배의 압축을 달성할 수 있습니다.

import os
from dotenv import load_dotenv  

!echo "OPENAI_API_KEY=<Your OpenAI Key" >> .env # OpenAI API Key 작성
load_dotenv()

True

from llama_index.core import ServiceContext, set_global_service_context
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
import tiktoken

OPENAI_MODEL_NAME = "gpt-3.5-turbo-16k"

llm = OpenAI(model=OPENAI_MODEL_NAME)

token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model(OPENAI_MODEL_NAME).encode
)
callback_manager = CallbackManager([token_counter])

service_context = ServiceContext.from_defaults(
    llm=llm, callback_manager=callback_manager
)
set_global_service_context(service_context)

#%pip install wikipedia

from llama_index.core import VectorStoreIndex, download_loader

WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(pages=['Premier League'])

print(len(documents))

retriever = VectorStoreIndex.from_documents(documents).as_retriever(similarity_top_k=3)

question = "잉글랜드 클럽이 유럽 대회에서 금지된 이유는 무엇인가요??"

relevant_documents = retriever.retrieve(question)

contexts = [n.get_content() for n in relevant_documents]
contexts

['=== Performance in international competition ===\n\nWith 48 continental trophies won, English clubs are the third-most successful in European football, behind Italy (49) and Spain (65). In the top-tier UEFA Champions League, a record six English clubs have won a total of 15 titles and lost a further 11 finals, behind Spanish clubs with 19 and 11, respectively. In the second-tier UEFA Europa League, English clubs are also second, with nine victories and eight losses in the finals. In the former second-tier UEFA Cup Winners\' Cup, English teams won a record eight titles and had a further five finalists. In the non-UEFA organized Inter-Cities Fairs Cup, English clubs provided four winners and four runners-up, the second-most behind Spain with six and three, respectively. In the newly created third-tier UEFA Europa Conference League, English clubs have won a joint-record one title so far.']

from llama_index.core.prompts import PromptTemplate

template = (
    "Given the context information below: \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Please answer the question: {query_str}\n"
)

qa_template = PromptTemplate(template)
prompt = qa_template.format(context_str="\n\n".join(contexts), query_str=question)

response = llm.complete(prompt)

response.text

'잉글랜드 클럽이 유럽 대회에서 금지된 이유는 1985년 헤이젤 스타디움 재앙 이후 발생한 폭력적인 행동과 안전 기준을 충족하지 못한 경기장 상태 때문입니다. 이 사건으로 인해 잉글랜드 클럽은 5년 동안 유럽 대회에서 경기를 할 수 없었습니다.'

print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)

Embedding Tokens:  13615 
 LLM Prompt Tokens:  0 
 LLM Completion Tokens:  0 
 Total LLM Token Count:  0

#%pip install llmlingua accelerate -q -U
#%pip install llama-index-postprocessor-longllmlingua

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.postprocessor.longllmlingua import LongLLMLinguaPostprocessor

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",
        "dynamic_context_compression_ratio": 0.8,
    },
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

retrieved_nodes = retriever.retrieve(question)
synthesizer = CompactAndRefine()

retrieved_nodes

[NodeWithScore(node=TextNode(id_='6ec9417e-decb-4290-a9bf-62630e844bb4', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='28ebd3bd-9540-4e8a-9bff-6976c3aae981', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='49f783bdfd0ac70d7235553dac9a4b23a646a54a3fa319a87aceffd06b3c1310'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5f5926b3-c364-4754-b171-a3620c0827d9', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='6a2a1d44721d8e629023dcd71977bcc3bdba201f052ded1ccd93271ba73f4178'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='25597b8b-b1c9-4769-bc2d-ee1bada663af', node_type=<ObjectType.TEXT: '1'>, metadata={}]

from llama_index.core import QueryBundle

new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)

original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])
original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts)
print()
print("Original Tokens:", original_tokens)
print("Compressed Tokens:", compressed_tokens)
print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

 ===Desite significant European success in the 197 and the late9s marked a low for footballs andters poor facilities,ism wasife, and English had been European in5 The First level of English since, wass A attendues, top English moved abroad.By9 theend was reverse9,s; UEFA European, lifted the five-year on English playing in European competitions in19 in Cupinners1 Report on safety, proposed to stadiums the, in09s, major English begun to commercial club administrationvenue Edwards Unitedolarur and were the this transformation The commercial imperative the clubs seeking to increase their power the away from League in so, to increase their voting and gain more arrangement50 ofship income in18. They demanded companies for of football re from in receivedyear in98, but by, deal price leading clubs taking ofash. Sch, whoations ofals each First5 rights800 in6,08.188 the clubs form a " but were eventually to, top taking theion deal Theations also in receive needed take whole of First Division of a By the90 the big again that had of stad.In the man, withbig) meetingaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League's position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.

Original Tokens: 2768
Compressed Tokens: 443
Compressed Ratio: 6.25x

token_counter.reset_counts()

response = synthesizer.synthesize(question, new_retrieved_nodes)

response

Response(response='English clubs were banned from European competition due to poor facilities, hooliganism, and a history of violence.', source_nodes=[NodeWithScore(node=TextNode(id_='d1b09aff-6ac7-40f0-9ef6-797a03e872e3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={})

response.response

'English clubs were banned from European competition due to poor facilities, hooliganism, and a history of violence.'

print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)

Embedding Tokens:  0 
 LLM Prompt Tokens:  525 
 LLM Completion Tokens:  23 
 Total LLM Token Count:  548

retrieved_nodes = retriever.retrieve(question)

new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)

contexts = [n.get_content() for n in new_retrieved_nodes]
prompt = qa_template.format(context_str="\n\n".join(contexts), query_str=question)
response = llm.complete(prompt)

new_retrieved_nodes

[NodeWithScore(node=TextNode(id_='bfc9ecb3-ad7d-4d50-a47f-86ab58af44db', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='ea6d1aeb9d0b0098d02bbc88f3338265ea944b0ec66744e065e8296a7dfe1068', text='\n ===Desite significant European success in the 197 and the late9s marked a low for footballs andters poor facilities,ism wasife, and English had been European in5 The First level of English since, wass A attendues, top English moved abroad.By9 theend was reverse9,s; UEFA European, lifted the five-year on English playing in European competitions in19 in Cupinners1 Report on safety, proposed to stadiums the, in09s, major English begun to commercial club administrationvenue Edwards Unitedolarur and were the this transformation The commercial imperative the clubs seeking to increase their power the away from League in so, to increase their voting and gain more arrangement50 ofship income in18. They demanded companies for of football re from in receivedyear in98, but by, deal price leading clubs taking ofash. Sch, whoations ofals each First5 rights800 in6,08.188 the clubs form a " but were eventually to, top taking theion deal Theations also in receive needed take whole of First Division of a By the90 the big again that had of stad.In the man, withbig) meetingaway from The Football League. Dyke believed that it would be more lucrative for LWT if only the larger clubs in the country were featured on national television and wanted to establish whether the clubs would be interested in a larger share of television rights money. The five clubs agreed with the suggestion and decided to press ahead with it; however, the league would have no credibility without the backing of The Football Association, and so David Dein of Arsenal held talks to see whether the FA were receptive to the idea. The FA did not have an amicable relationship with the Football League at the time and considered it as a way to weaken the Football League\'s position. The FA released a report in June 1991, Blueprint for the Future of Football, that supported the plan for the Premier League with the FA as the ultimate authority that would oversee the breakaway league.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=None)]

response.text

'English clubs were banned from European competition due to a series of incidents and issues in the 1980s. These included poor facilities, hooliganism, and a lack of safety measures in stadiums. The ban was imposed by UEFA, the governing body for European football, in an effort to address these problems and improve the overall image and safety of the sport. The ban lasted for five years, from 1985 to 1990.'

PreviousFine-tuning NextLlamaIndex Advanced

Last updated 1 year ago

hashtagLLMLingua

LLMLingua