Protocol Buffers - Data exchange

Protocol Buffers is a data definition language created by Google that can be compared to IDL, but is much simpler. Its syntax, based on the C language, evokes that of JSON, with the difference of the use of typed variables.
Google has defined this language for use on its own servers that store and exchange big quantities of structured data, and in 2008 decided to make it open source.
The proto files have a dual format, the human readable source and the binary that can be handled quickly by the machine. It is an alternative to XML, much more compact, with a processing time considerably decreased.

Glossary

Protocol Buffers: name of the language and name of units of data encapsulated into a proto file.
Proto: a data definition file in the PB language, with the .proto extension.
Protoc: name of the compiler that produces classes or binaries.

Features

- Object oriented language, each message inherits the Message class.
- Typed data language.
- Textual and binary formats.
- The Protoc compiler generates, from the data definition, a class in the choosen langage.
- The compiler provides C++ or Java classes and is intended to be compatible with all languages.
- The class is serialized into a binary file. Protoc can also produce the binary file from the PB language.
- A unit is called "message". A .proto file can contain several messages.
- Supports namespaces.
- The structure of a message is recursive, a proto structure may have elements that are other proto structures.
- repeated fields, as in XML, their definition can be reused in the same message.
- Dynamically extensible.

Why use Protocol Buffers?

It is a means of storing structured data and exchange them between software, possibly written in different programming languages, and between a server and a client. A library of functions is included to assist in the use of Web services.
It allows a cross-langage serialization of classes. The serialization produces compact and easy to process binary code.

Syntax

Each source has the form:

message name     {     
  ...list of data fields...   
 }

Modifiers of variables are: required, optional, repeated.

A sequence number is assigned to each variable, which is a directive to the compiler and not a value for the variable.

An initial value may be assigned with the default directive:

required string x = 1 [default="Some text"];    

The main scalar types are string, int32, int64, float, double, bool.

In addition nested types are added by defining a message into another message:

message container
{
  required int32 number = 1;  
  message contained
  {
    repeated string x = 1;
 }
}

Contained can be accessed through a string as container.contained.x

Enumerations enum can be included in messages.

When we defined the structure of a message, it is used in a program by creating an instance. To it are associated methods specific to the class and produced by the compiler int C++ or Java generated files.

container myinstance;
myinstance.set_number(18);

Sites and tools

Sample code

Simple message.

message hello
{
    required string = 1 [default="the message"];
    optional int32 = 2;
} 
Programming technologies Ajax - API - Cassandra - CIL - CLI - Cookie - Cover Flow - Dalvik - DFA - .NET - HTTP code - IDE - JavaFX - JNA - JSON - MySQL - NaCl - Protocol Buffers - Qt - REST - Servlet - Web 2.0 - WebGL - Webkit - WYSIWYG

(c) 2008-2009 Scriptol.com