A Step-by-Step Guide to Using Object Storage in Ceph

Reza Mohammadi
5 min readNov 23, 2024

--

This article introduces the use of Ceph for object storage, covering essential concepts and practical steps. It explains storage fundamentals, the role of Ceph and RADOS, and the distinction between structured and unstructured data. Readers are guided through deploying a RADOS Gateway (RGW), creating users, setting up buckets, uploading files, and checking bucket usage. This concise guide equips users to efficiently manage Ceph object storage for scalable and reliable data handling.

What is Storage?

Storage is a data repository within an information network, enabling quick and easy access to data across all systems or nodes. It consists of multiple hard drives configured side by side, providing secure data storage and backup. Additionally, storage devices can connect remotely to the network’s main server. While they usually have lower processing power than the main server, they efficiently organize, store, and facilitate easy data access.

What is Ceph?

Ceph is currently the hottest software-defined storage (SDS) technology, shaking up the entire storage industry. It is an open-source project that provides unified software-defined solutions for block, file, and object storage. The core idea of Ceph is to provide a distributed storage system that is massively scalable and high-performing with no single point of failure. From the roots, it has been designed to be highly scalable (up to the exabyte level and beyond) while running on general-purpose commodity hardware.

You can have N servers, install OS on them, and then install a Ceph on them. Now, you have a Ceph Cluster and can provide Object Storage, Block Storage, and File Storage. You can have three of them simultaneously and have parallel storage.

RADOS

The Reliable Autonomic Distributed Object Store (RADOS) is the foundation of the Ceph storage cluster. Everything in Ceph is stored as objects, and the RADOS object store is responsible for storing these objects regardless of their data types. The RADOS layer ensures that the data always remains consistent.

It is a layer that contains RBD, RADOS GW, and CephFS services and helps you run and manage them.

MON (Monitor): It is a core component of Ceph responsible for cluster state, configuration management, and maintaining a consensus for the cluster map.

RADOS encompasses Ceph's storage backend, with OSDs (Object Storage Daemons) handling the actual data storage. The MON nodes support RADOS by managing metadata and ensuring cluster consistency.

RBD → For block storage

RADOS GW → For object storage

CephFS → For Filesystem

RBD, RADOS Gateway (RGW), and CephFS provide block, object, and file storage, respectively. These services interact with the RADOS layer, which handles the distributed object storage. MON nodes manage cluster metadata and maintain the cluster state but do not directly manage these services. Instead, RADOS ensures data distribution, replication, and management across the storage daemons (OSDs).

Structure & Unstructured Data

Structured data refers to data that is organized in a predefined format, such as tables, rows, and columns. This type of data is easily searchable, and organized, and can be stored in a database. On the other hand, unstructured data does not have a predefined format and is often text-heavy, such as emails, social media posts, and videos. Unstructured data can be more difficult to analyze and search through compared to structured data. Both types of data are important in data analysis and decision-making processes.

Object Storage

Object Storage is relatively new when compared with more traditional storage systems such as file or block storage.

So, what is object storage, exactly?

In short, it is a storage for unstructured data that eliminates the scaling limitation of traditional file storage. Limitless scale is why object storage is the cloud's storage. All major public cloud services, including Amazon, Google, and Mircosoft, employ object storage as their primary storage.

Object storage systems often expose their functionality through RESTful APIs that use HTTP/HTTPS.

RADOS Gateway (RGW) is a component of the Ceph distributed storage system that provides a RESTful API interface for accessing Ceph’s object storage. It enables object storage functionality similar to services like Amazon S3 or OpenStack Swift. RGW acts as a bridge between the applications or users and the underlying Ceph storage cluster.

Rados Gateway can be set up on the Ceph cluster or an absolute node, it will handle API requests to the Ceph Cluster.

After you set up your Ceph Cluster, you can create an RGW.

Create a Rados Gateway (RGW)

# cephadm shell
$ ceph orch apply rgw <RGW-Name>
$ ceph orch apply rgw test-rgw

# Use some options (optional)
$ ceph orch apply rgw --placement=label:rgw count-per-host:2 --port 80
# 80 is the default port.

# Check
$ ceph orch ls

Create a user for Rados Gateway

$ radosgw-admin user create --uid=<username> --display-name=<Name-to-show> --email=<Admin-Email>

$ radosgw-admin user create --uid=first-rados --display-name="First Rados" --email=info@rados.com

# The result is a token (access and secret key) to access the key to connect to the object storage (You can use s3cmd to connect).

# List RGW Users
$ radosgw-admin user list

# Info
$ radosgw-admin user info --uid=first-rados
# op_mask ---> is user access.
# max_size ---> is the size that you put on the object storage.
# max_objects ---> is the count of the objects that can be put on a bucket.

Rados Gateway objects’s will be sit on the buckets.

Get the s3cmd from GitHub to install using pip and then configure s3cmd.

$  pip install s3cmd

# Use radosgw-admin user info --uid=<username> command to get the access & secret key.
$ s3cmd --configure

Create a bucket

# Creation
$ s3cmd mb s3://<bucket-name>
$ s3cmd mb s3://data

# list buckets
$ s3cmd ls

# See inside the bucket
$ s3cmd ls s3://<bucket-name>
$ s3cmd ls s3://data

Put a file into the bucket

# Put the file to the object storage
$ s3cmd put <file-Name> <Bucket-Name>
$ s3cmd put file.fs s3://data

List the Bucket Disk Usage

$ s3cmd du -H <Bucket-Name>
$ s3cmd du -H s3://data/

--

--

No responses yet