5️⃣Depth Estimation

Depth Estimation

Depth Estimation 작업에는 이미지에 있는 물체의 깊이를 예측하는 작업이 포함됩니다. 이 작업은 3D 재구성, 증강 현실, 자율 주행, 로봇 공학 등의 애플리케이션에 매우 중요합니다. 깊이 추정 모델은 카메라로부터 이미지의 모든 픽셀의 상대적 거리를 결정하도록 훈련되며, 이를 흔히 깊이라고 합니다. 이러한 모델은 단안(단일 이미지) 또는 스테레오(다중 이미지) 입력을 사용하여 깊이를 추정합니다.

from transformers import pipeline

estimator = pipeline(
    task="depth-estimation", 
    model="Intel/dpt-large"
)
result = estimator(images="http://images.cocodataset.org/val2017/000000039769.jpg")
result

config.json:   0%|          | 0.00/942 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/1.37G [00:00<?, ?B/s]


Some weights of DPTForDepthEstimation were not initialized from the model checkpoint at Intel/dpt-large and are newly initialized: ['neck.fusion_stage.layers.0.residual_layer1.convolution1.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution1.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



preprocessor_config.json:   0%|          | 0.00/285 [00:00<?, ?B/s]





{'predicted_depth': tensor([[[ 6.3199,  6.3629,  6.4148,  ..., 10.4104, 10.5109, 10.3847],
          [ 6.3850,  6.3615,  6.4166,  ..., 10.4540, 10.4384, 10.4554],
          [ 6.3519,  6.3176,  6.3575,  ..., 10.4247, 10.4618, 10.4257],
          ...,
          [22.3772, 22.4624, 22.4227,  ..., 22.5207, 22.5593, 22.5293],
          [22.5073, 22.5148, 22.5115,  ..., 22.6604, 22.6345, 22.5871],
          [22.5177, 22.5275, 22.5218,  ..., 22.6282, 22.6216, 22.6108]]]),
 'depth': <PIL.Image.Image image mode=L size=640x480>}

result["depth"]

from PIL import Image
import requests

url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image

from transformers import pipeline

checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline(
    "depth-estimation", 
    model=checkpoint
)
predictions = depth_estimator(image)

config.json:   0%|          | 0.00/920 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/245M [00:00<?, ?B/s]



preprocessor_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

predictions["depth"]

Other Full code on CPU

from PIL import Image
import numpy as np
import requests
import torch

from transformers import DPTImageProcessor, DPTForDepthEstimation


"""
Here, the code loads a pre-trained image processor and model. 
low_cpu_mem_usage=True is an optional argument that reduces memory usage, 
useful for systems with limited resources.
"""

image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
model = DPTForDepthEstimation.from_pretrained(
    "Intel/dpt-hybrid-midas", 
    low_cpu_mem_usage=True
)

url = "https://images.unsplash.com/photo-1536048810607-3dc7f86981cb?q=80&w=1000&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8Mnx8dmFsbGV5fGVufDB8fDB8fHww"
image = Image.open(
    requests.get(
        url, 
        stream=True
    ).raw)

"""
The image is processed using the DPTImageProcessor to convert it into
 a format suitable for the model. return_tensors="pt" specifies that 
the output should be PyTorch tensors.
"""
inputs = image_processor(
    images=image, 
    return_tensors="pt"
)

"""
Inference is performed without calculating gradients (torch.no_grad()), 
which is typical for inference to save memory and computation. 
The depth map is extracted from the outputs.
"""
with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth


"""
interpolate to original size
The depth map tensor is resized to match the original image dimensions 
using bicubic interpolation, which helps in smoothing the resized image.
"""
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

preprocessor_config.json:   0%|          | 0.00/382 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/9.88k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/490M [00:00<?, ?B/s]

depth

PreviousText or Image-to-Video NextImage Classification

Last updated 1 year ago