bert, ai, sentiment analysis, machine learning, natural language processing,

Fine-Tuning BERT for Sentiment Analysis with Hugging Face Transformers

Sebastian Schkudlara Sebastian Schkudlara Follow May 23, 2024 · 5 mins read
Fine-Tuning BERT for Sentiment Analysis with Hugging Face Transformers
Share this

🚀 Fine-Tuning BERT for Sentiment Analysis with Hugging Face Transformers

Fine-tuning BERT for sentiment analysis is a powerful technique for extracting insights from text data. Here’s a comprehensive guide to help you get started:

Getting Started

To begin, set up your environment by installing the necessary libraries: transformers, datasets, and torch.

Data Preparation

Use a sample dataset of movie reviews and tokenize it using BERT’s tokenizer. This step converts text data into a format suitable for model training.

Model Training

Load a pre-trained BERT model and fine-tune it using the Hugging Face Trainer class. This class simplifies the training process, allowing you to focus on optimizing the model for your specific dataset.

Evaluation and Prediction

After training, evaluate the model’s performance and test it with new text inputs to predict sentiments. The fine-tuned BERT model should accurately classify sentiments, showcasing its effectiveness.

This approach is ideal for:

  • Customer Feedback Analysis: Automatically classify feedback to understand customer satisfaction.
  • Social Media Monitoring: Track sentiment trends to gauge public opinion.
  • Market Research: Analyze product reviews for actionable business insights.

Fine-tuning BERT has the potential to transform your text analysis projects. Happy coding, and feel free to reach out with any questions!

For the complete code and detailed guide, check out my GitHub gist.

For the original article, visit KDnuggets: Fine-Tuning BERT for Sentiment Analysis.

Credit: KDnuggets
Caption: Fine-tuning BERT for accurate sentiment classification in various applications.

Environment Setup

Install necessary libraries:

!pip install transformers datasets torch

Data Preparation

Prepare a sample dataset of movie reviews:

import pandas as pd
from datasets import Dataset

# Sample dataset
data = {
    "text": [
        "I loved this movie! The acting was great and the story was so touching.",
        "Absolutely terrible. The plot made no sense and the acting was subpar.",
        "A masterpiece! The visuals, the storyline, everything was perfect.",
        "Not my cup of tea. The pacing was slow and I couldn't connect with the characters.",
        "An average movie with some good moments but overall forgettable."
    ],
    "label": [1, 0, 1, 0, 0]  # 1 for positive, 0 for negative
}
df = pd.DataFrame(data)
dataset = Dataset.from_pandas(df)

Tokenization

Tokenize the dataset using BERT tokenizer:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def tokenize_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

DataLoaders

Create DataLoaders for training and evaluation:

import torch
from torch.utils.data import DataLoader

train_dataloader = DataLoader(tokenized_dataset, batch_size=2, shuffle=True)
eval_dataloader = DataLoader(tokenized_dataset, batch_size=2)

Model Initialization

Load the pre-trained BERT model:

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Training Setup

Define training parameters and create a Trainer:

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    logging_dir='./logs'
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    eval_dataset=tokenized_dataset
)

Model Training and Evaluation

Train and evaluate the model:

trainer.train()
results = trainer.evaluate()
print(results)

Making Predictions

Use the fine-tuned model for sentiment predictions:

text = "The movie was fantastic!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
print(f"Sentiment: {'Positive' if prediction == 1 else 'Negative'}")

Conclusion

Fine-tuning BERT for sentiment analysis can significantly enhance your text analysis capabilities, making it a valuable tool for various applications. By following this guide, you can create a robust sentiment analysis model tailored to your needs. Happy coding!

Happy coding!

Sebastian Schkudlara
Written by Sebastian Schkudlara Follow
Hi, I am Sebastian Schkudlara, the author of Jevvellabs. I hope you enjoy my blog!