def generate_prompt(data_point):
"""Gen. input text based on a prompt, task instruction, (context info.), and answer
:param data_point: dict: Data point
:return: dict: tokenzed prompt
"""
# Generate prompt
prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
'appropriately completes the request.\n\n'
# Samples with additional context into.
if data_point['input']:
text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} <end_of_turn>\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
# Without
else:
text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} <end_of_turn>\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
return text
# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset["train"]]
dataset = dataset["train"].add_column("prompt", text_column)
dataset
def get_completion(query: str, model, tokenizer) -> str:
device = "cuda:0"
prompt_template = """
<start_of_turn>user
user
아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요.
{query}
<end_of_turn>\n<start_of_turn>model
"""
prompt = prompt_template.format(query=query)
encodeds = tokenizer(
prompt,
return_tensors="pt",
add_special_tokens=True
)
model_inputs = encodeds.to(device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1000,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# decoded = tokenizer.batch_decode(generated_ids)
decoded = tokenizer.decode(
generated_ids[0],
skip_special_tokens=True
)
return (decoded)
query = """\n\n 괄호 안의 옵션 중 하나를 선택하여 답하세요. 그 사이에 추론을 작성하세요.<analysis></analysis>. 중간에 답안 작성 <answer></answer>. 다음은 입력 내용입니다. Q: 8세 남아가 메스꺼움, 구토, 배뇨 횟수 감소 증상으로 어머니가 소아과 의사에게 데려왔습니다. 급성 림프모구 백혈병으로 5일 전에 1차 화학 요법을 받았습니다. 화학 요법을 시작하기 전 그의 백혈구 수는 60,000/mm3였습니다. 바이탈 사인은 맥박 110/분, 체온 37.0°C(98.6°F), 혈압 100/70mmHg입니다. 신체 검사 결과 양측 발바닥 부종이 있습니다. 다음 중 이 질환의 진단을 확인하는 데 도움이 되는 혈청 검사 및 소변 검사 결과는? ? \'A': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 크레아틴키나아제(MM)가 매우 높음', 'B': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 고요산혈증, 소변 상청색, 헴 양성', 'C': '소변 내 요산 결정, 고칼륨혈증, 고인산혈증, 유산증, 요산염 결정', 'D': '고요산혈증, 고칼륨혈증, 고인산혈증, 요산결정이 있음' 정답은? '고요산혈증, 고칼륨혈증, 고인산혈증 및 요로 단클론 스파이크', 'E': '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'}"""
result = get_completion(
query=query,
model=merged_model,
tokenizer=tokenizer
)
print(result)
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
<start_of_turn>user
user
아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요.
괄호 안의 옵션 중 하나를 선택하여 답하세요. 그 사이에 추론을 작성하세요.<analysis></analysis>. 중간에 답안 작성 <answer></answer>. 다음은 입력 내용입니다. Q: 8세 남아가 메스꺼움, 구토, 배뇨 횟수 감소 증상으로 어머니가 소아과 의사에게 데려왔습니다. 급성 림프모구 백혈병으로 5일 전에 1차 화학 요법을 받았습니다. 화학 요법을 시작하기 전 그의 백혈구 수는 60,000/mm3였습니다. 바이탈 사인은 맥박 110/분, 체온 37.0°C(98.6°F), 혈압 100/70mmHg입니다. 신체 검사 결과 양측 발바닥 부종이 있습니다. 다음 중 이 질환의 진단을 확인하는 데 도움이 되는 혈청 검사 및 소변 검사 결과는? ? 'A': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 크레아틴키나아제(MM)가 매우 높음', 'B': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 고요산혈증, 소변 상청색, 헴 양성', 'C': '소변 내 요산 결정, 고칼륨혈증, 고인산혈증, 유산증, 요산염 결정', 'D': '고요산혈증, 고칼륨혈증, 고인산혈증, 요산결정이 있음' 정답은? '고요산혈증, 고칼륨혈증, 고인산혈증 및 요로 단클론 스파이크', 'E': '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'}
<end_of_turn>
<start_of_turn>model
<analysis>
<answer>
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'</answer>
<end_of_turn>
<start_of_turn>answer</start_of_turn>
<analysis>
<problem>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by:
<analysis></analysis>.
The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
<end_of_turn>
<end_of_turn>
<start_of_turn>model</start_of_turn>
<analysis>
<answer>
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
<end_of_turn>
<start_of_turn>answer</start_of_turn>
<analysis>
<problem>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by:
<analysis></analysis>.
The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
<end_of_turn>
</analysis>
<end_of_turn>
<start_of_turn>model</start_of_turn>
<analysis>
<answer>
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
<end_of_turn>
<start_of_turn>answer</start_of_turn>
<analysis>
<problem>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by:
<analysis></analysis>.
The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
<end_of_turn>
<end_of_turn>
<start_of_turn>model</start_of_turn>
<analysis>
<answer>
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
<end_of_turn>
<start_of_turn>answer</start_of_turn>
<analysis>
<problem>
This is a question about a patient with clinical
Model Answer :
</start_of_turn>
<analysis>
<answer>
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
<end_of_turn>
<start_of_turn>answer</start_of_turn>
<analysis>
<problem>
This is a question about a patient with clinical
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:\nUrea 50 mg/dL\nCreatinine 1.4 mg/dL\nProtein Negative\nRBC Numerous\nThe patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis?? \n{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}"""
result = get_completion(
query=query,
model=merged_model,
tokenizer=tokenizer
)
print(f"Model Answer : \n {result.split('model')[-1]}")
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Model Answer :
<analysis>
This question describes a 34-year-old man with abdominal discomfort and blood in the urine, along with long standing hypertension. The examination shows renal masses and chronic renal failure on labs. The key findings are abdominal tenderness over the flank, normal bowel sounds, negative urine protein, normal WBCs and RBCs, enlarged kidneys with cysts and CT showing cysts. The cysts are homogenous and well-defined. This pattern is characteristic for ADPKD which is caused by mutations in the PKHD1 gene.
</analysis>
<answer>
A: Autosomal dominant polycystic kidney disease (ADPKD)
</answer>
<end_of_turn>
</start_of_turn>
</code>
and the expected answer is D: Simple renal cysts
this is an objective structured clinical examination (OSCE) that is testing the student on medical knowledge. it presents a vignette describing a clinical scenario involving a 34-year-old man with abdominal discomfort, blood in urine, hypertension, enlarged kidneys, cysts on imaging, and other lab markers consistent with renal failure. the task is to determine the most likely diagnosis based on the key findings. here, the answer is ADPKD because of the cysts confirmed on imaging. simple renal cysts would be expected if the only findings were abnormal renal function. ARPKD would be more likely to present with focal defects rather than a diffuse pattern of cysts.
please feel free to repost and improve with edits!
</code>
This is not an objective structured clinical examination (OSCE) and does not test student knowledge of differential diagnoses. It simply tests their ability to analyze a clinical vignette and answer questions based on key findings.
Based on the description in the vignette, the key findings are:
* 34 year old man with abdominal discomfort, blood in urine, hypertension, enlarged kidneys, cysts seen on imaging, and other lab abnormalities consistent with chronic kidney disease
* Chronic kidney disease with cysts
* No focal defects
* No focal renal diseases like ADPKD would present with focal defects on imaging
* ADPKD would confirm the diagnosis because cysts on imaging with associated genetic mutation
* Other answer choices like ARPKD that do not fit with chronic kidney disease findings are incorrect
This vignette is describing a 34 year-old man with abdominal pain, blood in urine, hypertension, enlarged kidneys, cysts, and other lab studies consistent with chronic kidney disease or end stage renal failure. Based on the key findings, ADPKD is the correct answer because cysts confirm a diagnosis of ADPKD based on a family history of multiple cysts and elevated urinary alpha-1.
Other diagnoses like ARPKD do not fit with the chronic renal failure findings.
Please feel free to edit and add to better explain the reasoning.
Thank you!
def get_completion(query: str, model, tokenizer) -> str:
device = "cuda:0"
prompt_template = """
<start_of_turn>user
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{query}
<end_of_turn>\n<start_of_turn>model
"""
prompt = prompt_template.format(query=query)
encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
model_inputs = encodeds.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
# decoded = tokenizer.batch_decode(generated_ids)
decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return (decoded)
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:\nUrea 50 mg/dL\nCreatinine 1.4 mg/dL\nProtein Negative\nRBC Numerous\nThe patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis?? \n{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}"""
result = get_completion(query=query, model=model, tokenizer=ptokenizer)
print(f"Model Answer : \n {result.split('model')[-1]}")
print(result)
> user
Below is an instruction that describes a task. Write a response that appropriately completes the request.
Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:
Urea 50 mg/dL
Creatinine 1.4 mg/dL
Protein Negative
RBC Numerous
The patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis??
{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}
> model
<Answer:A> The most likely diagnosis is **'Autosomal dominant polycystic kidney disease (ADPKD)'.**
<Analysis>:
In ADPKD, an abnormal gene mutation is responsible for the excessive growth of fluid-filled cysts in the kidneys. These cysts can be detected through various imaging techniques, including ultrasound, CT scan, and MRI. The presence of multiple renal cysts and enlarged kidneys is characteristic of ADPKD.
from huggingface_hub import snapshot_download
# your huggingface hub model name
model_name = "nowave/gemma-2b-loudai"
methods = ['q4_k_m']
# original model path
base_model = "./original_model/"
# model save path
quantized_path = "./quantized_model/"
snapshot_download(repo_id=model_name, local_dir=base_model , local_dir_use_symlinks=False)
original_model = quantized_path+'/FP16.gguf'
from llama_cpp import Llama
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
model_path="/content/Q4_K_M.gguf", # Download the model file first
n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
n_threads=1, # The number of CPU threads to use, tailor to your system and the resulting performance
n_gpu_layers=-1 # The number of layers to offload to GPU, if you have GPU acceleration available
)
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>. here are the inputs Q:An 8-year-old boy is brought to the pediatrician by his mother with nausea, vomiting, and decreased frequency of urination. He has acute lymphoblastic leukemia for which he received the 1st dose of chemotherapy 5 days ago. His leukocyte count was 60,000/mm3 before starting chemotherapy. The vital signs include: pulse 110/min, temperature 37.0°C (98.6°F), and blood pressure 100/70 mm Hg. The physical examination shows bilateral pedal edema. Which of the following serum studies and urinalysis findings will be helpful in confirming the diagnosis of this condition? ? \n{'A': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, and extremely elevated creatine kinase (MM)', 'B': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, hyperuricemia, urine supernatant pink, and positive for heme', 'C': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and urate crystals in the urine', 'D': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, and urinary monoclonal spike', 'E': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and oxalate crystals'}"""
output = llm(
prompt=query,
max_tokens=512, # Generate up to 512 tokens
)
output
query = """\n\n Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>. here are the inputs Q:An 8-year-old boy is brought to the pediatrician by his mother with nausea, vomiting, and decreased frequency of urination. He has acute lymphoblastic leukemia for which he received the 1st dose of chemotherapy 5 days ago. His leukocyte count was 60,000/mm3 before starting chemotherapy. The vital signs include: pulse 110/min, temperature 37.0°C (98.6°F), and blood pressure 100/70 mm Hg. The physical examination shows bilateral pedal edema. Which of the following serum studies and urinalysis findings will be helpful in confirming the diagnosis of this condition? ? \n{'A': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, and extremely elevated creatine kinase (MM)', 'B': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, hyperuricemia, urine supernatant pink, and positive for heme', 'C': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and urate crystals in the urine', 'D': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, and urinary monoclonal spike', 'E': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and oxalate crystals'}"""
output = llm(
prompt=query,
max_tokens=512, # Generate up to 512 tokens
)
output
<analysis>
병력과 신체 검사 소견을 바탕으로 급성 림프모구 백혈병(ALL) 환자를 진단하는 문제입니다. 주요 소견은 다음과 같습니다:
- 8세 남아
- 급성 림프모구 백혈병 진단
- 5일 전 1차 화학 요법 투여
- 류코옥탄 수치 60,000/mm3
- 활력 징후에는 빈맥, 부종 및 고요산혈증이 포함됩니다.
감별 진단에는 다음이 포함됩니다:
- 고요산혈증 및 크레아티닌 키나아제(CK) 상승으로 인한 요 뇨증
- 고요산혈증 및 크레아틴키나아제(CK) 상승으로 인한 소변 내 요산 결정
- 고요산혈증 및 크레아티닌 키나아제(CK) 상승으로 인한 말산뇨
주요 검사는 다음과 같습니다:
- 혈청 검사에는 고칼륨혈증, 고인산혈증, 저칼슘혈증 및 CK 상승이 포함되어야 합니다.
- 소변 검사에는 헴에 대한 헴 검사 양성이 포함되어야 합니다.
이러한 검사를 바탕으로 가장 가능성이 높은 진단은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승입니다. 소변의 요산 결정이 진단을 확인합니다.
</analysis>
<answer>
E: 고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정
</answer> <end_of_turn>
추론:
이 질문은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승의 진단을 확인하기 위한 추가 검사를 요청하고 있습니다. 요청된 검사는 고요산혈증과 CK 상승으로 인한 요산뇨를 가장 잘 확인할 수 있는 검사들입니다. 헴 검사 양성과 소변 옥살산염 결정이 진단을 확인합니다.
</analysis> <start_of_turn>
<answer>
E: 고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산결정이 있습니다.
</answer> <end_of_turn>
reason:
이 질문은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승의 진단을 확인하기 위한 추가 검사를 요청하고 있습니다. 요청된 검사는 고요산혈증과 CK 상승으로 인한 요산뇨를 가장 잘 확인할 수 있는 검사들입니다. 양성 헴 검사와 소변 옥살산염 결정이 진단을 확인합니다.