1️⃣Text-to-Speech: TTS
Text-to-Speech: TTS
TEXT = "And you know what they call a... a... a Quarter Pounder with Cheese in Seoul?"
from transformers import pipeline
from IPython.display import Audio
pipe = pipeline(model="suno/bark-small")
output = pipe(TEXT)
print(output)config.json: 0%| | 0.00/8.80k [00:00<?, ?B/s]
pytorch_model.bin: 0%| | 0.00/1.68G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/4.91k [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/353 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/996k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/2.92M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/125 [00:00<?, ?B/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.
{'audio': array([[-0.00355643, -0.00254505, -0.00216578, ..., 0.00740955,
0.00741918, 0.00732478]], dtype=float32), 'sampling_rate': 24000}T5 Model
Last updated
