<aside> 💡

요약

제한된 환경(On-device 등)에서 실행 가능한 LLM들을 조사하고, 다양한 Evaluation Set을 통해 각 모델의 정확도와 추론 시간을 분석해봄

Investigating LLMs that can run in resource-constrained environments (such as on-device) and analyzed the accuracy and inference time of each model through various evaluation sets

</aside>

팀 링크 : ‣ (외부 비공개)

Tiny LLM

https://github.com/hoonably/TinyLLM

<aside> 💡

시작은 Jetson Nano

However, 버전과 성능이 너무 낮아 비교할 수 있는 모델이 별로 없어서 Orin-nano로 진행

추가로 NVIDIA A100-SXM4-80GB로도 진행해 latency 비교

</aside>

Result

Models

Model Name	Affiliation	Model Size	Release Date	🔗 Link
Bloom	BigScience	560M	2022.11	Bloom
Bloomz	BigScience	560M	2022.11	Bloomz
Cerebras-GPT	Cerebras	590M	2023.03	Cerebras-GPT
Cerebras-GPT	Cerebras	256M	2023.03	Cerebras-GPT
Cerebras-GPT	Cerebras	111M	2023.03	Cerebras-GPT
Danube3	H2O	500M	2024.07	Danube3
Flan-T5	Google	Base	2023.01	Flan-T5
LaMini-GPT	MBZUAI	774M	2023.04	LaMini-GPT
LaMini-GPT	MBZUAI	124M	2023.04	LaMini-GPT
LiteLlama	ahxt	460M	N/A	LiteLlama
OPT	Meta	350M	2022.05	OPT
OPT	Meta	125M	2022.05	OPT
Pythia	EleutherAI	410M	2023.03	Pythia
Pythia	EleutherAI	160M	2023.03	Pythia
PhoneLM	mllmTeam	0.5B	2024.11	PhoneLM
Qwen1.5	Alibaba	0.5B	2024.02	Qwen1.5
Qwen2.5	Alibaba	0.5B	2024.09	Qwen2.5
SmolLM	Hugging Face	360M	2024.07	SmolLM
SmolLM	Hugging Face	135M	2024.07	SmolLM
TinyLlama	TinyLlama	1.1B	2023.12	TinyLlama

Evaluation Datasets

Dataset Name	Explanation	🔗 Link
ARC	Science question dataset for QA.<br>- ARC-e : ARC-easy	ai2_arc
OBQA	a QA dataset modeled after open-book exams, designed to test multi-step reasoning, commonsense knowledge, and deep text comprehension.	openbookqa
BoolQ	QA dataset for yes/no questions	boolq
PIQA	QA dataset for physical commonsense reasoning and a corresponding	piqa
SIQA	question-answering, designed to evaluate social commonsense reasoning about people's actions and their social implications.	social_i_qa
WinoGrande	fill-in-the-blank problems	winogrande
HellaSwag	Common sense natural language reasoning	hellaswag

Environment

Jetson Orin Nano 8GB RAM Link python: 3.10.2 CUDA: (추가 필요)

요약

팀 링크 : ‣ (외부 비공개)

Result

Models

Evaluation Datasets

Environment

Evaluation Result