Actor Critic 이용해서 2D Bin Packing Problem 풀어보기 (Generalization)

앞에서 강화학습으로 풀어본 2D Bin Packing 방법은 문제를 잘 해결했지만, 중요한 한계가 있습니다.

이 한계는 학습에 사용한 Item의 종류와 순서가 고정된 문제만 해결할 수 있다는 점입니다.

예를 들어, 강화학습에서 아래와 같은 Item을 사용했다면 학습과 평가 과정에서도

동일한 순서로 생성된 Item이 공급된다는 가정하에 문제를 해결했습니다.

def _generate_items(self):
    items = []
    items.append((1, 9))
    items.append((10, 1))
    items.append((5, 2))
    items.append((4, 4))
    items.append((3, 3))
    items.append((3, 3))
    items.append((3, 3))
    items.append((3, 3))
    items.append((2, 3))
    items.append((2, 3))
    return items
Python
복사

하지만 실제 현장에서는 Item의 종류와 순서가 계속 변화하므로, 이러한 변화에도 대응할 수 있는 Bin Packing 솔루션이 필요합니다.

따라서 더 일반화(Generalization)된 접근법으로 Item 순서가 변경되더라도 문제를 해결할 수 있는 방법을 고민해 보았습니다.

학습 수행시간은 더 길어지겠지만, 얼마나 오래 학습해야 효과적인지 테스트를 통해 확인하고자 했습니다.

이를 위해 강화학습 방법론 중 액터-크리틱(Actor-Critic) 방식을 활용해보기로 했습니다.

Actor

빈 패킹 문제에서는 현재 채워진 빈의 상태를 보고, 다음 아이템을 어느 위치 (x, y)에 놓을지를 결정하는 역할을 합니다. 즉, 행동 정책(Policy)을 만듭니다.

Critic

특정 위치에 아이템을 놓은 결과가 미래에 더 좋은 결과를 가져올지, 아니면 나쁜 결과를 초래할지를 예측하고 점수(가치, Value)를 매깁니다.

이 둘의 관계를 학생(액터)과 선생님(크리틱)에 비유할 수 있습니다.

학생이 문제를 풀면, 선생님은 그 풀이가 얼마나 좋은지, 정답에 가까운지를 알려주며 피드백을 줍니다.

학생은 그 피드백을 바탕으로 다음 문제 풀이 방식을 개선해 나갑니다. 이 과정을 반복하면서 학생은 점점 더 똑똑한 문제 해결사가 됩니다.

State

에이전트가 판단의 근거로 삼을 상태를 정의해야 합니다.

본 블로그에서는 가장 직관적인 방법인 빈의 현재 상태를 행렬(matrix)로 표현합니다.

Action

액터는 현재 상태(빈의 이미지)를 입력으로 받아, 다음에 놓을 아이템의 최적 위치 (x, y) 좌표를 출력(결정)합니다.

가능한 모든 위치 중에서 가장 좋다고 판단되는 곳을 선택합니다.

신경망 구성

Actor Critic은 아래와 같이 입력부를 공유하고 마지막 출력 부분을 Actor와 Critic으로 구분한 신경망으로 구성합니다.

전체소스

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.distributions import Categorical
import numpy as np
import time
import random  # ##<-- 변경: 아이템 생성을 위해 추가

# CUDA 사용 가능 여부에 따라 디바이스 설정
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


## 환경 클래스 (Bin Packing Environment)
class BinPackingEnv:
    def __init__(self, bin_width, bin_height):
        self.bin_width = bin_width
        self.bin_height = bin_height
        self.action_space_size = bin_width * bin_height * 2
        self.items_list = []  # ##<-- 변경: 시작 시 리스트를 비워둠
        self.reset()

    ##<-- 변경: 매번 무작위 아이템을 생성하는 함수로 변경
    def _generate_items(self):
        """적재할 아이템 목록을 무작위로 생성합니다."""
        items = []
        # 에피소드마다 8개에서 15개 사이의 아이템을 생성
        num_items = random.randint(8, 15)
        for _ in range(num_items):
            # 아이템의 가로/세로 크기를 1~5 사이에서 무작위로 생성
            w = random.randint(1, 5)
            h = random.randint(1, 5)
            items.append((w, h))
        return items

    def reset(self):
        """환경을 초기 상태로 리셋하고 새로운 아이템을 생성합니다."""
        self.bin = np.zeros((self.bin_height, self.bin_width))
        self.items = self._generate_items()  # ##<-- 변경: 리셋할 때마다 새로운 아이템 생성
        self.current_item_idx = 0
        # get_state_and_valid_actions가 상태를 반환하므로 여기서는 호출만 함
        state, _ = self.get_state_and_valid_actions()
        return state

    def _print_status(self):
        """현재 상자 상태와 다음 아이템 정보를 출력합니다."""
        print(self.bin)
        if self.current_item_idx < len(self.items):
            w, h = self.items[self.current_item_idx]
            print(f"다음 물건 크기: {w}x{h} (또는 회전 시 {h}x{w})")

    ##<-- 변경: 상태 표현 방식을 (3, H, W) 텐서로 변경
    def get_state_and_valid_actions(self):
        """
        현재 상태와 유효한 행동 마스크를 반환합니다.
        상태는 3채널 텐서로 구성됩니다:
        - 채널 0: 현재 상자(bin)의 상태
        - 채널 1: 현재 아이템의 너비(width)로 채워진 행렬
        - 채널 2: 현재 아이템의 높이(height)로 채워진 행렬
        """
        # 모든 아이템을 다 놓았으면 빈 상태와 빈 마스크 반환
        if self.current_item_idx >= len(self.items):
            bin_state = torch.from_numpy(self.bin).float()
            item_w_state = torch.zeros_like(bin_state)
            item_h_state = torch.zeros_like(bin_state)
            state = torch.stack([bin_state, item_w_state, item_h_state]).unsqueeze(0)
            return state, torch.zeros(self.action_space_size, dtype=torch.bool)

        item_w, item_h = self.items[self.current_item_idx]

        # 3채널 상태 텐서 생성
        bin_state = torch.from_numpy(self.bin).float()
        item_w_state = torch.full_like(bin_state, float(item_w))
        item_h_state = torch.full_like(bin_state, float(item_h))
        state = torch.stack([bin_state, item_w_state, item_h_state]).unsqueeze(0)  # [1, 3, H, W]

        valid_actions_mask = torch.zeros(self.action_space_size, dtype=torch.bool)

        # 1. 원래 방향으로 놓는 경우
        for y in range(self.bin_height - item_h + 1):
            for x in range(self.bin_width - item_w + 1):
                if np.all(self.bin[y:y + item_h, x:x + item_w] == 0):
                    action_index = y * self.bin_width + x
                    valid_actions_mask[action_index] = True

        # 2. 회전해서 놓는 경우 (가로/세로가 다른 경우에만)
        if item_w != item_h:
            item_h_rot, item_w_rot = item_h, item_w
            for y in range(self.bin_height - item_h_rot + 1):
                for x in range(self.bin_width - item_w_rot + 1):
                    if np.all(self.bin[y:y + item_h_rot, x:x + item_w_rot] == 0):
                        action_index = (self.bin_width * self.bin_height) + (y * self.bin_width + x)
                        valid_actions_mask[action_index] = True

        return state, valid_actions_mask

    def _check_if_any_valid_moves_exist(self):
        """현재 아이템을 놓을 수 있는 공간이 있는지 확인합니다."""
        if self.current_item_idx >= len(self.items):
            return True  # 이미 모든 아이템을 다 놓았으므로 유효하다고 판단

        # get_state_and_valid_actions 내부 로직을 재사용하여 유효한 행동이 하나라도 있는지 확인
        _, valid_mask = self.get_state_and_valid_actions()
        return torch.any(valid_mask)

    def step(self, action):
        """선택된 행동을 수행하고 다음 상태, 보상, 종료 여부를 반환합니다."""
        # step 시작 시점에 아이템이 없는 경우 (이미 끝난 상태) 처리
        if self.current_item_idx >= len(self.items):
            next_state, _ = self.get_state_and_valid_actions()
            return next_state, 0.0, True

        item_w, item_h = self.items[self.current_item_idx]
        base_action_space = self.bin_width * self.bin_height
        is_rotated = action >= base_action_space

        if is_rotated:
            item_w, item_h = item_h, item_w
            coord_action = action - base_action_space
        else:
            coord_action = action

        y, x = np.unravel_index(coord_action, (self.bin_height, self.bin_width))

        if y + item_h > self.bin_height or x + item_w > self.bin_width or not np.all(
                self.bin[y:y + item_h, x:x + item_w] == 0):
            reward = -100.0
            done = True
            next_state, _ = self.get_state_and_valid_actions()
            return next_state, reward, done

        self.bin[y:y + item_h, x:x + item_w] = self.current_item_idx + 1
        reward = float(item_w * item_h)
        self.current_item_idx += 1
        done = False

        if not self._check_if_any_valid_moves_exist():
            if self.current_item_idx >= len(self.items):
                reward += 10
            else:
                reward -= 10
            done = True

        if self.current_item_idx >= len(self.items) and not done:
            reward += 10
            done = True

        next_state, _ = self.get_state_and_valid_actions()
        return next_state, reward, done


## ##<-- 변경: MLP에서 CNN 기반 Actor-Critic으로 변경
class ActorCriticCNN(nn.Module):
    def __init__(self, h, w, outputs):
        super(ActorCriticCNN, self).__init__()
        # 입력 채널이 3개 (bin 상태, item 너비, item 높이)
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)

        # 컨볼루션 레이어를 통과한 후의 피처 맵 크기를 계산
        def conv2d_size_out(size, kernel_size=3, stride=1, padding=1):
            return (size + 2 * padding - kernel_size) // stride + 1

        convw = conv2d_size_out(conv2d_size_out(w))
        convh = conv2d_size_out(conv2d_size_out(h))
        linear_input_size = convw * convh * 32

        # 공통 특징 추출을 위한 FC 레이어
        self.fc1 = nn.Linear(linear_input_size, 256)

        # 정책을 결정하는 액터 헤드
        self.actor_head = nn.Linear(256, outputs)
        # 상태의 가치를 평가하는 크리틱 헤드
        self.critic_head = nn.Linear(256, 1)

    def forward(self, x):
        # x의 shape: [batch, 3, H, W]
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = x.view(x.size(0), -1)  # Flatten
        x = F.relu(self.fc1(x))

        action_logits = self.actor_head(x)
        state_value = self.critic_head(x)
        return action_logits, state_value


## 하이퍼파라미터 및 모델 초기화
BIN_SIZE = 10
N_ACTIONS = BIN_SIZE * BIN_SIZE * 2
EPISODES = 100000
LEARNING_RATE = 0.0007
GAMMA = 0.99

env = BinPackingEnv(BIN_SIZE, BIN_SIZE)
model = ActorCriticCNN(BIN_SIZE, BIN_SIZE, N_ACTIONS).to(device)  # ##<-- 변경: 새 모델 사용
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

## 학습 루프
print("===== Actor-Critic (CNN) 학습 시작 =====")
total_rewards = []
for i_episode in range(EPISODES):
    state = env.reset()
    done = False
    episode_reward = 0

    while not done:
        state = state.to(device)

        _, valid_actions_mask = env.get_state_and_valid_actions()
        valid_actions_mask = valid_actions_mask.to(device)

        if not valid_actions_mask.any():
            break

        action_logits, state_value = model(state)
        action_logits[0][~valid_actions_mask] = -float('inf')

        action_probs = F.softmax(action_logits, dim=-1)
        dist = Categorical(action_probs)
        action = dist.sample()

        next_state, reward, done = env.step(action.item())
        episode_reward += reward

        next_state = next_state.to(device)
        _, next_state_value = model(next_state)

        if done:
            next_state_value = torch.tensor([0.0], device=device)

        advantage = reward + GAMMA * next_state_value - state_value
        critic_loss = advantage.pow(2)
        actor_loss = -dist.log_prob(action) * advantage.detach()
        entropy_loss = -dist.entropy()

        loss = actor_loss + critic_loss + 0.01 * entropy_loss

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        state = next_state

    total_rewards.append(episode_reward)

    if (i_episode + 1) % 100 == 0:
        avg_reward = np.mean(total_rewards[-100:])
        print(f"에피소드 {i_episode + 1}/{EPISODES} | 최근 100 에피소드 평균 보상: {avg_reward:.2f}")

print("\n===== 학습 완료! =====\n")

## 실제 적재 테스트
print("===== 실제 적재 테스트 시작 =====")
state = env.reset()  # 테스트 시에도 무작위 아이템으로 시작
env._print_status()
time.sleep(1)

done = False
while not done:
    state, valid_actions_mask = env.get_state_and_valid_actions()
    valid_actions_mask = valid_actions_mask.to(device)
    if not valid_actions_mask.any():
        print("\n실패! 더 이상 물건을 놓을 공간이 없습니다.")
        break

    with torch.no_grad():
        action_logits, _ = model(state.to(device))
        action_logits[0][~valid_actions_mask] = -float('inf')
        action = action_logits.argmax().item()

    base_action_space = BIN_SIZE * BIN_SIZE
    is_rotated = action >= base_action_space
    rotation_text = "(Rotation)" if is_rotated else ""
    coord_action = action - base_action_space if is_rotated else action
    coords = np.unravel_index(coord_action, (BIN_SIZE, BIN_SIZE))

    print(f"\n-> AI의 선택: {coords} 위치에 놓기 {rotation_text}")
    _, _, done = env.step(action)

    env._print_status()
    time.sleep(1)

placed_items = env.current_item_idx
total_items = len(env.items)
print("\n===== 테스트 종료! =====")
print(f"최종 결과: 총 {total_items}개의 물건 중 {placed_items}개를 적재했습니다.")
print("최종 상자 모습:")
print(env.bin)
Python
복사

결과는 아래와 같습니다.

Using device: cuda
===== Actor-Critic (CNN) 학습 시작 =====
에피소드 100/100000 | 최근 100 에피소드 평균 보상: -4.18
에피소드 200/100000 | 최근 100 에피소드 평균 보상: 31.51
에피소드 300/100000 | 최근 100 에피소드 평균 보상: 41.18
...
에피소드 99800/100000 | 최근 100 에피소드 평균 보상: 65.43
에피소드 99900/100000 | 최근 100 에피소드 평균 보상: 66.55
에피소드 100000/100000 | 최근 100 에피소드 평균 보상: 66.49

===== 학습 완료! =====

===== 실제 적재 테스트 시작 =====
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
다음 물건 크기: 4x4 (또는 회전 시 4x4)

-> AI의 선택: (np.int64(4), np.int64(5)) 위치에 놓기 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
다음 물건 크기: 2x5 (또는 회전 시 5x2)

-> AI의 선택: (np.int64(5), np.int64(3)) 위치에 놓기 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 0. 0. 0. 0. 0.]
 [0. 0. 0. 2. 2. 0. 0. 0. 0. 0.]]
다음 물건 크기: 4x2 (또는 회전 시 2x4)

-> AI의 선택: (np.int64(8), np.int64(6)) 위치에 놓기 
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 0. 3. 3. 3. 3.]
 [0. 0. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 3x3 (또는 회전 시 3x3)

-> AI의 선택: (np.int64(0), np.int64(6)) 위치에 놓기 
[[0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 1. 1. 1. 1. 0.]
 [0. 0. 0. 2. 2. 0. 3. 3. 3. 3.]
 [0. 0. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 2x5 (또는 회전 시 5x2)

-> AI의 선택: (np.int64(5), np.int64(0)) 위치에 놓기 
[[0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 4. 4. 4. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 2x5 (또는 회전 시 5x2)

-> AI의 선택: (np.int64(0), np.int64(0)) 위치에 놓기 
[[6. 6. 0. 0. 0. 0. 4. 4. 4. 0.]
 [6. 6. 0. 0. 0. 0. 4. 4. 4. 0.]
 [6. 6. 0. 0. 0. 0. 4. 4. 4. 0.]
 [6. 6. 0. 0. 0. 0. 0. 0. 0. 0.]
 [6. 6. 0. 0. 0. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 4x3 (또는 회전 시 3x4)

-> AI의 선택: (np.int64(0), np.int64(2)) 위치에 놓기 
[[6. 6. 7. 7. 7. 7. 4. 4. 4. 0.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 0.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 0.]
 [6. 6. 0. 0. 0. 0. 0. 0. 0. 0.]
 [6. 6. 0. 0. 0. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 1x4 (또는 회전 시 4x1)

-> AI의 선택: (np.int64(0), np.int64(9)) 위치에 놓기 
[[6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 0. 0. 0. 0. 0. 0. 0. 8.]
 [6. 6. 0. 0. 0. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 4x1 (또는 회전 시 1x4)

-> AI의 선택: (np.int64(3), np.int64(2)) 위치에 놓기 
[[6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 7. 7. 7. 7. 4. 4. 4. 8.]
 [6. 6. 9. 9. 9. 9. 0. 0. 0. 8.]
 [6. 6. 0. 0. 0. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 1. 1. 1. 1. 0.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]
 [5. 5. 0. 2. 2. 0. 3. 3. 3. 3.]]
다음 물건 크기: 1x3 (또는 회전 시 3x1)

-> AI의 선택: (np.int64(5), np.int64(9)) 위치에 놓기 
[[ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  9.  9.  9.  9.  0.  0.  0.  8.]
 [ 6.  6.  0.  0.  0.  1.  1.  1.  1.  0.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  0.  3.  3.  3.  3.]
 [ 5.  5.  0.  2.  2.  0.  3.  3.  3.  3.]]
다음 물건 크기: 4x5 (또는 회전 시 5x4)

===== 테스트 종료! =====
최종 결과: 총 11개의 물건 중 10개를 적재했습니다.
최종 상자 모습:
[[ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  7.  7.  7.  7.  4.  4.  4.  8.]
 [ 6.  6.  9.  9.  9.  9.  0.  0.  0.  8.]
 [ 6.  6.  0.  0.  0.  1.  1.  1.  1.  0.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  1.  1.  1.  1. 10.]
 [ 5.  5.  0.  2.  2.  0.  3.  3.  3.  3.]
 [ 5.  5.  0.  2.  2.  0.  3.  3.  3.  3.]]
Python
복사

Reference:

[1] The Bin Packing Problem by Google, https://developers.google.com/optimization/pack/bin_packing, Creative Commons Attribution 4.0 License