RAN Resource Allocation을 위한 Prompt Engineering 과정

딥러닝 모델/LLM for Resource Allocation

RAN Resource Allocation을 위한 Prompt Engineering 과정

gksyb4235 2025. 12. 13. 17:02

이제 실제로 파이썬 환경에서 구축한 RAN의 PRB를 eMBB와 URLLC Slice에 얼마만큼의 비율로 결정할지를 결정해야 한다.

우선 ChatGPT로 General하게 Prompt를 구축해서 실행을 해봤는데, LLM이 제대로 된 답변을 수행하지 못했다.

특히, 모델의 사이즈가 작은 오픈소스 모델을 사용하면서 Prompt의 중요성을 체감했기에 Zero-base에서 Prompt를 수정해나가면서 과정들을 기록하고자 한다.

Prompt Engineering 과정에서 사용한 오프소스 모델은 Phi-3-mini-4k-instruct (3.8B)와 Llama-3.1-8B-instruct (8B) 모델이다.

오픈소스 모델들의 Network Domain 지식 체크하기

모델들은 RAN이 뭔지 알고 있을까?

prompt_template = f"""What is RAN in Mobile Network"""
llm_response = llm_model.invoke(prompt_template)
print(llm_response.content)

1. Llama-3.1-8B-instruct (8B)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is RAN in Mobile Network<|eot_id|><|start_header_id|>assistant<|end_header_id|>

RAN stands for Radio Access Network. 
It is a critical component of a mobile network that connects mobile devices to the core network and the internet. 
The RAN is responsible for managing the radio frequency (RF) signals that are transmitted between the mobile device and the base station, which is also known as the eNodeB (eNB) or cell tower.

The RAN consists of several key components:

1. **Base Station (eNodeB or eNB):** This is the tower or antenna that transmits and receives RF signals to and from mobile devices.
2. **Radio Network Controller (RNC):** This is a centralized unit that manages the base stations and controls the radio resources, such as frequency allocation and power control.
3. **NodeB:** This is a type of base station that is used in 3G and 4G networks.
4. **Small Cells:** These are small, low-power base stations that are used to provide coverage in areas with high demand or where traditional base stations are not feasible.
5. **Distributed Unit (DU) and Centralized Unit (CU):** These are components of the RAN that are used in 5G networks to improve performance and efficiency.

The RAN performs several functions, including:

Llama 8B 모델의 경우 RAN의 개념과 3G, 4G, 5G를 거치며 Base Station이 어떻게 변화했는지도 학습되어 있다.

답변을 하다가 멈췄는데, 이는 maximum token size를 256으로 제한했기 때문이다.

2. Phi-3-mini-4k-instruct (4B)

<|user|>
What is RAN in Mobile Network<|end|>
<|assistant|>
 RAN stands for Radio Access Network. It is a part of a mobile telecommunication system that implements a radio access technology. A typical RAN consists of the radio transceivers as well as the antennas at cell sites. The RAN connects individual devices, such as mobile phones or laptops, to other parts of a mobile network through radio connections.

The main purpose of the RAN is to provide wireless communication between the user's device and the core network (CN). The RAN can be divided into two main types:

1. GSM/EDGE (GSM Radio Access Network): This is the most widely used RAN technology, which uses Time Division Multiple Access (TDMA) and Enhanced Data rates for GSM Evolution (EDGE) for data transmission.

2. UMTS/HSPA (Universal Mobile Telecommunications System Radio Access Network): This technology uses Wideband Code Division Multiple Access (WCDMA) for data transmission and supports higher data rates compared to GSM/EDGE.

3. LTE (Long-Term Evolution) and 5G NR (New Radio): These are the latest RAN technologies, which use Orth

Phi-3-mini 모델 역시 RAN의 개념과 5G NR에 대해서 학습이 진행되었다.

그러나 8B 모델에 비해서는 답변이 조금 Rough한 느낌이기는 하다.

다음으로, 5G 기술의 개념 중 하나인 Nework Slicing과 eMBB, URLLC에 대해 물어보았다.

prompt_template = f"""What is Network Slicing and eMBB, URLLC?"""

요구하는 내용이 기니까 model의 maximum_token_size를 2048로 늘려서 질문했다.

1. Llama-3.1-8B-instruct (8B)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is Network Slicing and eMBB, URLLC?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Network slicing and eMBB, URLLC are key concepts in the context of 5G (Fifth Generation) wireless networks. Here's a brief overview of each:

1. **Network Slicing**:
   - **Definition**: Network slicing is a technology that allows multiple virtual networks to be created on top of a shared physical infrastructure. Each virtual network, or slice, is designed to support specific use cases or services with unique requirements.
   - **Purpose**: Network slicing enables service providers to offer customized services to different customers or industries, such as enhanced mobile broadband (eMBB), ultra-reliable low-latency communications (URLLC), mission-critical communications, and massive machine-type communications (mMTC).
   - **Key Features**:
     - **Isolation**: Each slice is isolated from other slices, ensuring that they do not interfere with each other.
     - **Customization**: Slices can be customized to meet specific requirements, such as bandwidth, latency, and security.
     - **Flexibility**: Slices can be created, modified, or deleted dynamically as needed.

2. **eMBB (Enhanced Mobile Broadband)**:
   - **Definition**: eMBB is a use case in 5G networks that provides high-speed, high-capacity mobile broadband services to support various applications, such as video streaming, online gaming, and social media.
   - **Key Features**:
     - **High-speed data rates**: eMBB supports data rates of up to 20 Gbps.
     - **Low latency**: eMBB aims to provide latency as low as 1 ms.
     - **Massive capacity**: eMBB can support a large number of users and devices.

3. **URLLC (Ultra-Reliable Low-Latency Communications)**:
   - **Definition**: URLLC is a use case in 5G networks that provides ultra-reliable and low-latency communications for mission-critical applications, such as industrial automation, remote healthcare, and autonomous vehicles.
   - **Key Features**:
     - **Ultra-reliability**: URLLC ensures that data packets are delivered with extremely high reliability, with a target packet loss rate of 10^-5 or lower.
     - **Low latency**: URLLC aims to provide latency as low as 1 ms.
     - **Predictable performance**: URLLC ensures predictable performance, even in challenging network conditions.

In summary, network slicing is a technology that enables the creation of multiple virtual networks on top of a shared physical infrastructure, while eMBB and URLLC are use cases in 5G networks that provide high-speed, high-capacity mobile broadband and ultra-reliable low-latency communications, respectively.

Llama 8B 모델은 전반적으로 공유 물리 인프라 위에 여러 가상 slice를 만든다는 개념은 잘 학습되었다.

또한 eMBB와 URLLC의 목적 구분도 나름 올바르게 수행했다.

다만, eMBB가 1ms latency 이내를 지향한다는 점에서 일단 명확한 오류가 있다.

실제 3GPP 기준으로 eMBB는 사용자 체감 지연이 10~20ms 수준으로 정의되어 있다.

2. Phi-3-mini-4k-instruct (4B)

<|user|>
What is Network Slicing and eMBB, URLLC?<|end|>
<|assistant|>
 Network slicing is a technology that allows multiple virtual networks to be created on top of a common physical infrastructure. It enables the efficient allocation of resources and customization of network services to meet specific requirements of different applications or use cases.

eMBB (enhanced Mobile Broadband) refers to a type of mobile communication service that provides high data rates and low latency for applications such as video streaming, online gaming, and other bandwidth-intensive activities. It aims to deliver a seamless and high-quality user experience for users who require large amounts of data transfer.

URLLC (Ultra-Reliable Low Latency Communication) is another type of mobile communication service designed to support critical applications that require extremely low latency and high reliability, such as remote surgery, autonomous vehicles, and industrial automation. URLLC ensures that data is transmitted with minimal delay and high accuracy, enabling real-time decision-making and control in these applications.

In summary, network slicing allows the creation of tailored virtual networks to meet the specific needs of different applications, while eMBB and URLLC are two types of mobile communication services that cater to different requirements, such as high data rates and low latency.

Phi 모델의 경우 답변을 매우 오른쪽으로 길게 준다.

내용을 살펴보면, 마찬가지로 Network slicing의 개념과 eMBB / URLLC의 용도 구분은 어느정도 올바르게 학습되었다.

(5G 내용이 없기는 하지만)

하지만 eMBB 특징 설명에서 low latency라고 한 부분과,

Network slicing과 eMBB / URLLC의 관계 설명에서 아쉬운 점이 보인다.

ChatGPT를 활용한 초기 Prompt

현재 내가 설정한 RAN쪽 문제는 기본적으로 총 200개의 PRB를 [100, 100]으로 eMBB와 URLLC에 할당한다.

이때, eMBB가 요구하는 자원이 더 많으므로, 과할당된 URLLC의 PRB를 얼마나 eMBB로 잘 넘기는지에 대한 문제이다.

한편, URLLC의 PRB를 eMBB로 과하게 넘기다보면, 어느순간 URLLC의 SLA가 급락하여 전체 SLA 만족률이 떨어지기 때문에, URLLC의 PRB를 적절한 수준으로 잘 넘기는지가 중요하다.

또한, UE의 수가 늘어날수록 URLLC가 eMBB로 넘길 수 있는 PRB의 수가 제한적이기 때문에,

이 역시 LLM이 잘 고려해서 Resource Allocation을 수행해야 한다.

prompt_template = f"""You are a intelligent 5G network slicing resource allocation manager 
responsible for dynamically allocating Physical Resource Blocks (PRBs) between eMBB and URLLC slices 
based on observed latency, SLA satisfaction, and recent system history, with the goal of maximizing overall SLA compiance while wtrictly proceting URLLC reliability

**SYSTEM INFO:**
- eMBB users: 50, URLLC users: 50
- Total PRB: 200

**SLA REQUIREMENTS:**
- eMBB: ≤100ms
- URLLC: ≤5ms

**GENERAL RESOURCE ALLOCATION POLICY**
- The system generally prioritizes improving eMBB performance by gradually
  shifting PRBs from URLLC to eMBB, since eMBB is typically the performance bottleneck.
- PRB adjustments must be conservative: only a ±1 PRB change is allowed per decision step.

**URLLC SAFETY CONSIDERATION (CRITICAL)*
- URLLC traffic exhibits a cliff-like behavior:
  once under-provisioned beyond a certain point, its SLA satisfaction
  can degrade sharply and abruptly.
- Therefore, URLLC must be continuously monitored and protected.

GUARDED DECISION LOGIC
- If URLLC SLA satisfaction is consistently high (above
  {int(num_urllc_users * 9.5)}/{num_urllc_users * 10}),
  URLLC may be considered safely over-provisioned, and 1 PRB may be shifted
  from URLLC to eMBB.
- If URLLC SLA satisfaction approaches or falls below this threshold,
  this indicates a high risk of SLA collapse.
  In this case, you must immediately reverse direction:
  shift 1 PRB from eMBB back to URLLC.

HARD SAFETY CONSTRAINT (HIGHEST PRIORITY)
- If URLLC SLA satisfaction is below
  {int(num_urllc_users * 9.5)}/{num_urllc_users * 10},
  you must allocate:
  PRB_embb -= 1
  PRB_urllc += 1

- This constraint overrides all optimization objectives.

**PAST 5-STEP DATA:**
Format: [ [PRB_embb, PRB_urllc], eMBB_Latency_avg(ms), URLLC_Latency_avg(ms), eMBB_Satisfied/Total, URLLC_Satisfied/Total ]
---
{history_str}
---

**OUTPUT REQUIREMENT:**
Return ONLY a single line of pure JSON. NO markdown, NO code blocks, NO explanations.

**OUTPUT FORMAT:**
{{"PRB_embb": X, "PRB_urllc": Y}}"""

이후, 가장 작은 모델인 Phi-3-4k-mini-instruct 모델에 대한 Prompt Engineering을 진행했다.

Prompt Engineering 과정

문제 1: 작은 모델은 산술 제약에 약하다

================================================================================
📋 LLM OUTPUT (Parsed JSON):
================================================================================
PRB_embb: 101
PRB_urllc: 100
================================================================================

================================================================================
📋 LLM OUTPUT (Parsed JSON):
================================================================================
PRB_embb: 104
PRB_urllc: 97
================================================================================

PRB가 총 200개이고, 이것을 할당하는 문제임에도 PRB 합이 200이 되지 않는 경우가 많았다.

이 부분은 Prompt에 제약조건으로 명시해주었다.

- The sum of PRBs allocated to eMBB and URLLC must always be exactly 200.

문제 2: Reasoning을 추가하면 디버깅은 쉬워지지만, 성능이 좋아진다고 보장되지는 않는다

**OUTPUT FORMAT:**
{{"PRB_embb": X, "PRB_urllc": Y, "reasoning": brief explanation in one sentence}}"""

**PAST 5-STEP DATA:**
Format: [ [PRB_embb, PRB_urllc], eMBB_Latency_avg(ms), URLLC_Latency_avg(ms), eMBB_Satisfied/Total, URLLC_Satisfied/Total ]
---
[ [103, 97], 126.37, 1.14, 61/100, 100/100 ]
[ [103, 97], 158.34, 1.17, 60/100, 100/100 ]
[ [103, 97], 96.71, 1.15, 71/100, 100/100 ]
[ [103, 97], 124.18, 1.12, 63/100, 100/100 ]
[ [103, 97], 112.17, 1.16, 69/100, 100/100 ]
---

What is the new PRB allocation and why?

**OUTPUT REQUIREMENT:**
Return ONLY a single line of pure JSON. NO markdown, NO code blocks, NO explanations.

**OUTPUT FORMAT:**
{"PRB_embb": X, "PRB_urllc": Y, "reasoning": brief explanation in one sentence}<|end|>
<|assistant|>
 ```json
{"PRB_embb": 102, "PRB_urllc": 98, "reasoning": "URLLC SLA satisfaction is consistently high, allowing a safe PRB shift from URLLC to eMBB."}
```
================================================================================

================================================================================
📋 LLM OUTPUT (Parsed JSON):
================================================================================
PRB_embb: 102
PRB_urllc: 98
================================================================================

Phi-3 (4B 모델)의 결과이다.

위 출력에서 현재 eMBB PRB와 URLLC PRB의 할당 비율이 [103, 97]이다.

reasoning 결과로 URLLC SLA가 잘 만족됨을 확인했다. 그래서 URLLC PRB를 eMBB로 넘기는데, 최종 출력이 [102, 98]이다.

기본적으로 LLM이 PRB_eMBB와 PRB_URLLC의 위치를 헷갈린다는 것을 의미한다.

**PAST 5-STEP DATA:**
Format: [ [PRB_embb, PRB_urllc], eMBB_Latency_avg(ms), URLLC_Latency_avg(ms), eMBB_Satisfied/Total, URLLC_Satisfied/Total ]
---
[ [103, 97], 96.71, 1.15, 71/100, 100/100 ]
[ [103, 97], 124.18, 1.12, 63/100, 100/100 ]
[ [103, 97], 112.17, 1.16, 69/100, 100/100 ]
[ [102, 98], 103.03, 1.16, 73/100, 100/100 ]
[ [102, 98], 115.89, 1.13, 63/100, 100/100 ]
---

What is the new PRB allocation and why?

**OUTPUT REQUIREMENT:**
Return ONLY a single line of pure JSON. NO markdown, NO code blocks, NO explanations.

**OUTPUT FORMAT:**
{"PRB_embb": X, "PRB_urllc": Y, "reasoning": brief explanation in one sentence}<|end|>
<|assistant|>
 ```json
{"PRB_embb": 102, "PRB_urllc": 98, "reasoning": "URLLC SLA satisfaction is consistently high, allowing a safe PRB shift to eMBB."}

동일하게, URLLC SLA가 완벽하게 만족되니 URLLC의 PRB를 eMBB로 넘긴다는 추론은 적절하다.

그러나, 직전의 PRB가 [102, 98]임을 이해하지 못하고, LLM은 다시 [102, 98]로 Resource Allocation을 진행했다.

이는 다음의 문제로 이어진다.

문제 3: Row-wise history는 사람이 보기에는 좋지만, 작은 모델에게는 불안정했다

Phi-3-mini-4k-instruct (4B) 모델에 다음과 같이 물어보았다.

# 2. 개선된 영어 프롬프트 (5개 히스토리 버전)
prompt_template = f"""Explain the temporal flow of the step data in detail.

**PAST 5-STEP DATA:**
Format: [ [PRB_embb, PRB_urllc], eMBB_Latency_avg(ms), URLLC_Latency_avg(ms), eMBB_Satisfied/Total, URLLC_Satisfied/Total ]
---
[ [102, 98], 90.88, 1.14, 61/100, 100/100 ]
[ [103, 97], 126.37, 1.14, 61/100, 100/100 ]
[ [103, 97], 158.34, 1.17, 60/100, 100/100 ]
[ [103, 97], 96.71, 1.15, 71/100, 100/100 ]
[ [103, 97], 124.18, 1.12, 63/100, 100/100 ]
---

"""

출력결과는 다음과 같았다.

<|user|>
Explain the temporal flow of the step data in detail.

**PAST 5-STEP DATA:**
Format: [ [PRB_embb, PRB_urllc], eMBB_Latency_avg(ms), URLLC_Latency_avg(ms), eMBB_Satisfied/Total, URLLC_Satisfied/Total ]
---
[ [102, 98], 90.88, 1.14, 61/100, 100/100 ]
[ [103, 97], 126.37, 1.14, 61/100, 100/100 ]
[ [103, 97], 158.34, 1.17, 60/100, 100/100 ]
[ [103, 97], 96.71, 1.15, 71/100, 100/100 ]
[ [103, 97], 124.18, 1.12, 63/100, 100/100 ]
---

<|end|>
<|assistant|>
 The provided data represents a series of measurements taken over five steps concerning the performance of two types of services: Enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low Latency Communications (URLLC). Each entry contains four pieces of information:

1. **PRB_embb**: This is likely an identifier for the particular PRB (Physical Resource Block) being used for eMBB services. In this case, it remains constant at 102 across all entries.

2. **PRB_urllc**: Similarly, this is an identifier for the PRB allocated to URLLC services. It also remains constant at 98 throughout the dataset.

3. **eMBB_Latency_avg(ms)**: This value represents the average latency experienced by eMBB services in milliseconds. The values recorded are 90.88 ms, 126.37 ms, 158.34 ms, 96.71 ms, and 124.18 ms respectively.

4. **URLLC_Latency_avg(ms)**: This value indicates the average latency for URLLC services, which is consistently low, at 1.14 ms, 1.14 ms, 1.17 ms, 1.15 ms, and 1.12 ms respectively.

5. **eMBB_Satisfied/Total**: This ratio shows the percentage of eMBB service requests that were satisfied out of the total number of requests. The satisfaction rates are 61%, 61%, 60%, 71%, and 63% respectively.

6. **URLLC_Satisfied/Total**: This ratio indicates the percentage of URLLC service requests that were satisfied, which is perfect at 100% for all five steps.

The temporal flow of the data suggests that there may be fluctuations in the average latency experienced by eMBB services, with some steps showing higher latency than others. However, the URLLC services maintain a consistent low latency, indicating high reliability and efficiency. The satisfaction rates for eMBB services show slight variations but remain relatively stable, while the URLLC services have a perfect satisfaction rate, reflecting their critical nature and the importance of meeting stringent latency requirements.

Phi 모델은 PRB latency, satisfaction 항목 등을 모두 언급하고, eMBB와 URLLC의 성능 차이를 명확하게 언급했다.

또한 "The values recorded are 90.88 ms, 126.37 ms, 158.34 ms, 96.71 ms, and 124.18 ms respectively."와 같이 수치 변화에 대한 정성적 요은 비교적 정확하게 인식하였다.

그러나 PRB 할당이라는 task에 대해 위 사항을 모두 고려하는 것은 4B, 8B 정도의 모델에 기대하기는 힘들다.

한 번의 추론이 3초 이내에 이뤄지는 특징을 고려했을 때, Past Data 값을 조금 더 작은 모델이 이해하기 쉽게 전달해야 한다.

따라서 Format을 다음과 같이 변경하였다.

1. Row-wise history + 명시적 라벨 구조 도입

2. 61/100과 같은 분수 구조의 변화 (분수 구조는 sLLM에게 취약함)

3. PRB 쌍을 PRB_eMBB_history와 PRB_URLLC_history로 분리

PRB_eMBB_history:   [102, 103, 103, 103, 103]
PRB_URLLC_history: [98,  97,  97,  97,  97]

eMBB_latency:   [90.88ms, 126.37ms, 158.34ms, 96.71ms, 124.18ms]
URLLC_latency:  [1.14ms, 1.14ms, 1.17ms, 1.15ms, 1.12ms]

eMBB_SLA_ratio:    [0.61, 0.61, 0.60, 0.71, 0.63]
URLLC_SLA_ratio:   [1.00, 1.00, 1.00, 1.00, 1.00]

이를 통해 의미를 완벽히 명시하고, 시간 흐름을 List 구조로 자동적으로 인식할 수 있게 하였다.

또한, 단위(ms)를 붙어 latency에 대한 의미 혼동을 없애고 SLA를 float 형태로 하여 판단하기 쉽게 하였다.

문제 4: 마지막에 질문을 다시 명시하면 성능이 약간 좋아졌다

긴 Prompt의 맨 마지막 부분에 "What is the new PRB allocation and why?"

What is the new PRB allocation and why?

이를 통해 LLM 모델의 목적을 한 번 더 언급함으로써 성능이 약간 증가함을 확인할 수 있었다.

결론 : 그럼에도 4B짜리 모델은 성능에 큰 한계가 있다.

**PAST 5-STEP DATA:**
---
PRB_eMBB_history:   [158, 159, 160, 161, 162]
    PRB_URLLC_history: [42, 41, 40, 39, 38]
    eMBB_latency:   ['96.79ms', '66.05ms', '83.61ms', '84.84ms', '96.90ms']
    URLLC_latency:  ['1.38ms', '1.32ms', '1.34ms', '1.37ms', '1.44ms']
    eMBB_SLA_ratio:    ['0.80', '0.87', '0.76', '0.77', '0.76']
    URLLC_SLA_ratio:   ['1.00', '1.00', '1.00', '1.00', '0.99']
---

What is the new PRB allocation and why?

**OUTPUT REQUIREMENT:**
Return ONLY a single line of pure JSON. NO markdown, NO code blocks, NO explanations.

**OUTPUT FORMAT:**
{"PRB_embb": X, "PRB_urllc": Y, "reasoning": brief explanation in one sentence}<|end|>
<|assistant|>
 ```json
{"PRB_embb": 163, "PRB_urllc": 37, "reasoning": "URLLC SLA satisfaction is below 95%, so PRBs are reallocated to maintain URLLC performance."}

위 경우 PRB가 [158, 42], [159, 41], [160, 40], [161, 39], [162, 38]로 URLLC의 PRB를 eMBB PRB로 재할당하였다.

그런데 reasoning은 URLLC SLA 만족률이 95% 이하라고 하면서, PRB를 URLLC의 성능을 유지하기 위해 재할당하였다고 한다.

그러너데 URLLC의 SLA 만족률은 95%가 아닌 100%를 유지하는 상황인데다,

URLLC의 성능 유지를 위해 PRB_URLLC를 PRB_eMBB로 재할당하는 것은 이상하다.

(결론적으로는 적절한 재할당이었다 하더라도)

Reasoning 위치를 바꾸자 성능이 더 나빠졌다

혹시 재할당 정책을 우선 결정하고, 그 다음에 reasoning 내용을 갖다 붙이는 게 아닌지 해서, 위치를 바꿔보았다.

================================================================================
📤 LLM OUTPUT (Raw Response):
================================================================================
<|user|>
You are a intelligent 5G network slicing resource allocation manager responsible for dynamically allocating Physical Resource Blocks (PRBs) between eMBB and URLLC slices based on observed latency, SLA satisfaction, and recent system history, with the goal of maximizing overall SLA compiance while wtrictly proceting URLLC reliability

**SYSTEM INFO:**
There are 10 eMBB users and 10 URLLC users.
The total number of available PRBs is 200.

**SLA REQUIREMENTS:**
- eMBB average latency must be 100 ms or lower.
- URLLC average latency must be 5 ms or lower.

**GENERAL RESOURCE ALLOCATION POLICY**
- The system generally prioritizes improving eMBB performance by gradually
  shifting PRBs from URLLC to eMBB, since eMBB is typically the performance bottleneck.
- PRB adjustments must be conservative: only a ±1 PRB change is allowed per decision step.

**URLLC SAFETY CONSIDERATION (CRITICAL)*
- URLLC traffic exhibits a cliff-like behavior:
  once under-provisioned beyond a certain point, its URLLC_SLA_ratio
  can degrade sharply and abruptly.
- Therefore, URLLC must be continuously monitored and protected.

GUARDED DECISION LOGIC
- If URLLC_SLA_ratio is consistently high (above 0.95),
  URLLC may be considered safely over-provisioned, and 1 PRB may be shifted
  from URLLC to eMBB.
- If URLLC_SLA_ratio approaches or falls below below 0.95,
  this indicates a high risk of SLA collapse.
  In this case, you must immediately reverse direction:
  shift 1 PRB from eMBB back to URLLC.

- The sum of PRBs allocated to eMBB and URLLC must always be exactly 200.
- This constraint overrides all optimization objectives.

**PAST 5-STEP DATA:**
---
PRB_eMBB_history:   [102, 101, 100, 100, 100]
    PRB_URLLC_history: [98, 99, 100, 100, 100]
    eMBB_latency:   ['112.76ms', '111.61ms', '124.88ms', '153.13ms', '120.67ms']
    URLLC_latency:  ['1.16ms', '1.18ms', '1.15ms', '1.12ms', '1.11ms']
    eMBB_SLA_ratio:    ['0.62', '0.66', '0.62', '0.64', '0.64']
    URLLC_SLA_ratio:   ['1.00', '0.99', '1.00', '1.00', '1.00']
---

What is the new PRB allocation and why?

**OUTPUT REQUIREMENT:**
Return ONLY a single line of pure JSON. NO markdown, NO code blocks, NO explanations.

**OUTPUT FORMAT:**
{"reasoning": brief explanation in one sentence, "PRB_embb": X, "PRB_urllc": Y}<|end|>
<|assistant|>
 ```json
{"reasoning": "URLLC_SLA_ratio is at the critical threshold, indicating potential SLA collapse; therefore, PRBs are reallocated to maintain URLLC safety.", "PRB_embb": 100, "PRB_urllc": 101}
```
================================================================================

================================================================================
📋 LLM OUTPUT (Parsed JSON):
================================================================================
PRB_embb: 100
PRB_urllc: 101
================================================================================

그 결과, LLM의 성능은 오히려 폭락했는데, 100번의 호출 중 대부분을 [100, 101]의 Resource Allocation을 주장하며, 제약 조건 중 하나인 PRB의 합은 200이라는 것을 만족시키지 못했다.

또한, URLLC_SLA_ratio가 계속해서 1.00을 유지하는데도, URLLC_SLA_ratio가 심각하게 저하됐다며 URLLC를 안전하게 하기 위해 PRB를 되돌린다고 한다. (그러면서 PRB_URLLC를 늘리지도 않음)

이처럼, reasoning의 위치와 같은 작은 차이에 따라 값이 매우 심하게 변화한다는 것은 그만큼 작은 사이즈의 LLM 모델은 불안정하다는 것을 보여준다.

최종적으로 얻은 Prompt Engineering 교훈

이번 실험에서 얻은 교훈은 다음과 같다.

1. 작은 모델에게 복잡한 policy를 자연어로 길게 설명하는 것은 효과적이지 않다

사람에게는 자세한 설명이 도움이 되지만, 작은 모델에게는 오히려 혼동을 줄 수 있다.

특히 다음과 같은 정보가 한 prompt 안에 섞이면 오류가 증가한다.

역할 설명 / 네트워크 배경지식 / SLA 조건 / safety constraint
history data / action rule / output format

작은 모델에서는 설명보다 구조화가 중요하다.

2. Constraint는 명시적으로, 가능하면 반복해서 제공해야 한다

Total PRB = 200 같은 단순한 조건도 자연스럽게 지켜지지 않는다.

따라서 prompt에는 다음처럼 강하게 명시해야 한다.

PRB_embb + PRB_urllc must be exactly 200.
Only one of the following actions is allowed:
1. [current_embb + 1, current_urllc - 1]
2. [current_embb - 1, current_urllc + 1]
3. [current_embb, current_urllc]

더 나아가 LLM이 직접 숫자를 생성하게 하기보다,

가능한 action 후보를 미리 제공하고 그중 하나를 고르게 하는 방식이 더 안정적일 수 있다.

이는 LLM의 출력을 구조적으로 제약하는 방식이 될 수 있다.

3. 분수보다 float가 낫다

61/100은 사람에게는 직관적이지만, 작은 모델에게는 불필요한 산술 부담을 준다.

따라서 SLA satisfaction은 다음처럼 주는 편이 더 낫다.

eMBB_SLA_ratio = 0.61
URLLC_SLA_ratio = 1.00

LLM에게 계산을 맡기기보다, 판단에 필요한 feature는 사전에 계산해서 제공하는 것이 좋다.

4. Row-wise table보다 feature-wise list가 안정적이었다

다음 구조보다,

[ [103, 97], 124.18, 1.12, 63/100, 100/100 ]

다음 구조가 더 안정적이었다.

PRB_eMBB_history: [102, 103, 103, 103, 103]
PRB_URLLC_history: [98, 97, 97, 97, 97]
URLLC_SLA_ratio: [1.00, 1.00, 1.00, 1.00, 1.00]

작은 모델에서는 각 숫자가 무엇을 의미하는지 최대한 명시적으로 라벨링해야 한다.

5. Reasoning은 신뢰성 평가에는 유용하지만, action 검증을 대체할 수 없다

Reasoning을 붙이면 모델의 오류를 발견하기 쉬워진다.

하지만 reasoning이 맞다고 해서 action이 맞는 것은 아니다. 반대로 action이 맞아도 reasoning이 틀릴 수 있다.

따라서 LLM output은 반드시 post-processing으로 검증해야 한다.

예를 들어 다음 조건은 LLM 외부에서 강제해야 한다.

PRB_embb + PRB_urllc == 200
abs(PRB_embb_new - PRB_embb_current) <= 1
abs(PRB_urllc_new - PRB_urllc_current) <= 1
URLLC_SLA_ratio < 0.95이면 PRB_urllc 증가

LLM은 policy suggestion을 할 수 있지만, hard constraint enforcement는 별도의 validator가 담당해야 한다.

작은 LLM을 Resource Allocation에 쓰려면 구조가 필요하다

이번 실험을 통해 느낀 가장 중요한 점은 다음이다.

작은 LLM을 Resource Allocation decision maker로 바로 사용하는 것은 위험하다.

특히 Phi-3-mini와 같은 4B급 모델은 다음 문제를 보였다.

PRB 총합 제약 위반
현재 allocation 인식 오류
eMBB와 URLLC 방향 혼동
SLA threshold 비교 오류
reasoning과 action 불일치
prompt format 변화에 따른 decision instability

따라서 작은 LLM을 사용하려면 단순한 prompt engineering만으로는 부족하다.

보다 현실적인 구조는 다음과 같다.

1. Python environment가 현재 상태 feature 계산
2. Rule-based layer가 가능한 action 후보 생성
3. LLM은 후보 중 하나를 선택
4. Validator가 hard constraint 검증
5. 잘못된 output이면 fallback policy 적용

즉, LLM이 모든 것을 직접 계산하게 하는 것이 아니라, LLM은 high-level decision 또는 preference selection을 담당하고,

수치 제약과 safety constraint는 외부 로직이 보장하는 방식이 더 적절하다.

저작자표시 비영리 변경금지 (새창열림)

'딥러닝 모델 > LLM for Resource Allocation' 카테고리의 다른 글

RAN-CN 시뮬레이션 환경에서 LLM 기반 Resource 할당하기 (0)	2026.05.28
E2E Intelligence를 위한 Semantic Communication과 E2E Learning (0)	2025.12.29
RAN과 Core Network의 환경 구축 (0)	2025.12.12
[논문 리뷰] Latency Equalization Policy of End-to-End Network Slicing Based on Reinforcement Learning (0)	2025.11.12
RAN과 Core의 Network Slicing에 대한 정리 (0)	2025.11.11

현재글RAN Resource Allocation을 위한 Prompt Engineering 과정

ybin's

gksyb4235 님의 블로그 입니다.

ns3 oran, RSRP Prediction, FlexRIC, Google Coral TPU, KT uCloud Edge, Network Slicing, srsRAN, Edge AI Chips, OpenAirInterface, ETSI MEC, AI for Network, OAI CN5G, O-RAN Testbed, OAI nrUE, AWS IoT Greengrass, Azure IoT Edge, EdgeXFoundation, LSTM, Edge Cloud Computing, Baetyl,

Today :
Yesterday :

ybin's