使用protobuf实现跨语言序列化 Java和Python实例
首先下载安装protoc
- github.releases
- 对于OS X可以通过brew直接安装
brew install protobuf
- 安装完可以通过
protoc --version
查看版本信息
创建proto文件-带Any类型的版本
带Any类型的只能导出pb2类型的Python文件,没法导出Python3 的版本,暂时不知道如何解决
Any类型的字段可以在java中实现泛型的功能
MessageDto.proto
1
2
3
4
5
6
7syntax = "proto3";
import "google/protobuf/any.proto";
message MessageDto {
string action=1;
int32 statte=2;
google.protobuf.Any data=3;
}RpcCmd.proto
1
2
3
4
5
6
7syntax = "proto3";
import "MessageDto.proto";
message RpcCmd {
MessageDto message=1;
string randomKey=2;
string remoteAddressKey=3;
}Point2PointMessage.proto
1
2
3
4
5syntax = "proto3";
message Point2PointMessage {
string targetAddressKey;
string message;
}BytesData.proto
1
2
3
4syntax = "proto3";
message BytesData {
bytes content=1;
}
导出相应的对象定义文件
Python版
protoc --python_out=./gen_pb2 RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
生成的文件名为XXX_pb2.py
Java版
protoc --java_out=./gen_java RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
生成的文件名为XXXOuterClass.java
在Python中使用
首先要导入生成的文件,放到自己喜欢的包下,然后修改导入包的地址,比如RpcCmd_pb2.py中修改
1
import com.tony.proto.py2.MessageDto_pb2 as MessageDto__pb2
然后开始使用
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52from com.tony.proto.py2 import RpcCmd_pb2, Point2PointMessage_pb2, BytesData_pb2, MessageDto_pb2
def serialize_to_file(file_path):
p2p_msg = Point2PointMessage_pb2.Point2PointMessage()
p2p_msg.message = "Hello, p2p from python"
p2p_msg.targetAddressKey = "/127.0.0.1:38211"
# bytes_data = BytesData_pb2.BytesData()
# bytes_data.content = b"Hello, bytes data from python"
rpc_cmd = RpcCmd_pb2.RpcCmd()
rpc_cmd.randomKey = "random-key-key-random"
rpc_cmd.remoteAddressKey = "/127.0.0.1:1234"
rpc_cmd.message.action = "p2p"
rpc_cmd.message.state = 100
rpc_cmd.message.data.Pack(p2p_msg)
# rpc_cmd.message.data.Pack(bytes_data)
bytes_write = rpc_cmd.SerializeToString()
fw = open(file_path, mode="wb")
fw.write(bytes_write)
fw.flush()
fw.close()
print("write bytes to file:", bytes_write)
def deserialize_from_file(file_path):
fo = open(file_path, mode="rb")
bytes_read = fo.read()
fo.close()
print("read bytes from file:", bytes_read)
rpc_cmd = RpcCmd_pb2.RpcCmd()
rpc_cmd.ParseFromString(bytes_read)
print(rpc_cmd)
p2p_msg = Point2PointMessage_pb2.Point2PointMessage()
# bytes_data = BytesData_pb2.BytesData()
# rpc_cmd.message.data.Unpack(bytes_data)
rpc_cmd.message.data.Unpack(p2p_msg)
print("msg_content:", p2p_msg.message)
print("msg_target:", p2p_msg.targetAddressKey)
# print("bytes_data:", str(bytes_data.content, 'utf-8'))
if __name__ == "__main__":
serialize_file_path = "/trans-data-pb2.dat"
serialize_to_file(serialize_file_path)
deserialize_from_file(serialize_file_path)执行结果如下,可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15write bytes to file: b'\n]\n\x03p2p\x10d\x1aT\n&type.googleapis.com/Point2PointMessage\x12*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
read bytes from file: b'\n]\n\x03p2p\x10d\x1aT\n&type.googleapis.com/Point2PointMessage\x12*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/Point2PointMessage"
value: "\n\020/127.0.0.1:38211\022\026Hello, p2p from python"
}
}
randomKey: "random-key-key-random"
remoteAddressKey: "/127.0.0.1:1234"
msg_content: Hello, p2p from python
msg_target: /127.0.0.1:38211
在Java中使用
同样导入到喜欢的包下,修改对应的包名即可
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public class ProtobufSerializeDemo {
public void serializeToFile() throws Exception {
Point2PointMessageOuterClass.Point2PointMessage.Builder p2pMsgBuilder = Point2PointMessageOuterClass.Point2PointMessage.newBuilder();
p2pMsgBuilder.setTargetAddressKey("/127.0.0.1:1233");
p2pMsgBuilder.setMessage("hello from java");
// BytesDataOuterClass.BytesData.Builder bytesBuilder = BytesDataOuterClass.BytesData.newBuilder();
// bytesBuilder.setContent(ByteString.copyFrom("bytes data from java".getBytes(StandardCharsets.UTF_8)));
MessageDtoOuterClass.MessageDto.Builder messageBuilder = MessageDtoOuterClass.MessageDto.newBuilder();
messageBuilder.setAction("p2p");
messageBuilder.setState(100);
messageBuilder.setData(Any.pack(p2pMsgBuilder.build()));
// messageBuilder.setData(Any.pack(bytesBuilder.build()));
RpcCmdOuterClass.RpcCmd.Builder builder = RpcCmdOuterClass.RpcCmd.newBuilder();
builder.setRandomKey("RANDOM_KEY_JAVA");
builder.setRemoteAddressKey("/127.0.0.1:1234");
builder.setMessage(messageBuilder.build());
builder.build().writeTo(new FileOutputStream("java_protobuf.dat"));
}
public void deserializeFromFile() throws Exception {
RpcCmdOuterClass.RpcCmd rpcCmd = RpcCmdOuterClass.RpcCmd.parseFrom(new FileInputStream("java_protobuf.dat"));
Point2PointMessageOuterClass.Point2PointMessage p2pMsg = rpcCmd.getMessage().getData().unpack(Point2PointMessageOuterClass.Point2PointMessage.class);
// BytesDataOuterClass.BytesData bytesData = rpcCmd.getMessage().getData().unpack(BytesDataOuterClass.BytesData.class);
log.info("deserialize rpcCmd: \n{}", rpcCmd);
log.info("deserialize p2pMsg: \n{}", p2pMsg);
// log.info("deserialize bytesData: \n{}", bytesData);
}
}执行结果,可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1610:22:03.118 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize rpcCmd:
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/Point2PointMessage"
value: "\n\017/127.0.0.1:1233\022\017hello from java"
}
}
randomKey: "RANDOM_KEY_JAVA"
remoteAddressKey: "/127.0.0.1:1234"
10:22:03.168 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize p2pMsg:
targetAddressKey: "/127.0.0.1:1233"
message: "hello from java"
然后是Java和Python之间互相序列化和反序列化
- 只需要修改对应的文件地址就可以进行测试
Python反序列化Java
1 | java_serialize_file_path = $path_to_java_serialized$ |
执行结果,这里演示的是BytesData类型的
1
2
3
4
5
6
7
8
9
10
11
12
13read bytes from file: b'\n@\n\x03p2p\x10d\x1a7\n\x1dtype.googleapis.com/BytesData\x12\x16\n\x14bytes data from java\x12\x0fRANDOM_KEY_JAVA\x1a\x0f/127.0.0.1:1234'
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/BytesData"
value: "\n\024bytes data from java"
}
}
randomKey: "RANDOM_KEY_JAVA"
remoteAddressKey: "/127.0.0.1:1234"
bytes_data: bytes data from java
Java反序列化Python
1 |
|
执行结果,同样是BytesData类型的
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1510:33:03.360 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize rpcCmd:
message {
action: "p2p"
state: 100
data {
type_url: "type.googleapis.com/BytesData"
value: "\n\035Hello, bytes data from python"
}
}
randomKey: "random-key-key-random"
remoteAddressKey: "/127.0.0.1:1234"
10:33:03.402 [main] INFO com.tony.proto.ProtobufSerializeDemo - deserialize bytesData:
content: "Hello, bytes data from python"
在Java平台,还有个更好用的工具可以不用手写proto文件
这个工具是io.protostuff
通过maven导入依赖
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-core</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>io.protostuff</groupId>
<artifactId>protostuff-runtime</artifactId>
<version>1.6.0</version>
</dependency>
<!-- 用于创建对象 -->
<dependency>
<groupId>org.objenesis</groupId>
<artifactId>objenesis</artifactId>
<version>2.2</version>
</dependency>
创建序列化工具类
1 | import io.protostuff.LinkedBuffer; |
创建序列化对象
1 | import lombok.Data; |
序列化测试
1 | import lombok.extern.slf4j.Slf4j; |
测试输出
1 | 11:02:45.646 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize cmd: |
MessageDto中的Data 可以泛型化使用
1 | /** |
当序列化和反序列化不需要跨平台使用时,可以直接使用Serializable类型,反之需要用byte数组保存数据,进行二次序列化和反序列化。同时可以在序列化工具类
ProtobufSerializer
中将ProtobufIOUtil修改为ProtostuffIOUtil通过setData方法进行响应的操作
1
2
3
4
5
6
7public <T extends Serializable> void setData(T object, boolean isStuff) {
if (isStuff) {
setSerialData(object);
} else {
setBytesData(object);
}
}
跨语言Python中反序列化
创建proto文件
MessageDto.proto
1
2
3
4
5
6
7syntax = "proto3";
message MessageDto {
string action=1;
int32 state=2;
bytes data=3;
}RpcCmd.proto
1
2
3
4
5
6
7
8syntax = "proto3";
import "MessageDto.proto";
message RpcCmd {
MessageDto message = 1;
string randomKey = 2;
string remoteAddressKey = 3;
}Point2PointMessage.proto
1
2
3
4
5
6
7syntax = "proto3";
message Point2PointMessage {
bytes java_class = 127;
string targetAddressKey = 1;
string message = 2;
}BytesData.proto
1
2
3
4syntax = "proto3";
message BytesData {
bytes content=1;
}
导出Python3对象定义文件
- 此时没有用到Any类型 可以直接导出为Python3的py文件
protoc --python3_out=./gen RpcCmd.proto MessageDto.proto Point2PointMessage.proto BytesData.proto
- 和pb2的区别是序列化和反序列化的方法名称进行了修改
- pb2中用的是
ParseFromString
和SerializeToString
- pb3中修改成了
encode_to_bytes
和parse_from_bytes
- pb2中用的是
在Python中使用
同样的放到喜欢的包下,修改对应包名 这里不赘述
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58#!/usr/bin/python3
# -*-coding: utf-8 -*-
from com.tony.proto.py3 import RpcCmd, Point2PointMessage, MessageDto, BytesData
def serialize_to_file(file_path):
p2p_msg = Point2PointMessage.Point2PointMessage()
p2p_msg.message = "Hello, p2p from python"
p2p_msg.targetAddressKey = "/127.0.0.1:38211"
# bytes_data = BytesData.BytesData()
# bytes_data.content = "bytes data from python"
rpc_cmd = RpcCmd.RpcCmd()
rpc_cmd.randomKey = "random-key-key-random"
rpc_cmd.remoteAddressKey = "/127.0.0.1:1234"
rpc_cmd.message.action = "p2p"
rpc_cmd.message.state = 100
rpc_cmd.message.data = p2p_msg.encode_to_bytes()
# rpc_cmd.message.data = bytes_data.encode_to_bytes()
bytes_write = rpc_cmd.encode_to_bytes()
fw = open(file_path, mode="wb")
fw.write(bytes_write)
fw.flush()
fw.close()
print("write bytes to file:", bytes_write)
def deserialize_from_file(file_path):
fo = open(file_path, mode="rb")
bytes_read = fo.read()
fo.close()
print("read bytes from file:", bytes_read)
rpc_cmd = RpcCmd.RpcCmd()
rpc_cmd.parse_from_bytes(bytes_read)
print(rpc_cmd)
msg_bytes = rpc_cmd.message.data
print("message bytes", msg_bytes)
p2p_msg = data_of(rpc_cmd.message, Point2PointMessage.Point2PointMessage)
# bytes_data = data_of(rpc_cmd.message, BytesData.BytesData)
print("msg_content:", p2p_msg.message)
print("msg_target:", p2p_msg.targetAddressKey)
# print("bytes_data:", bytes_data.content)
def data_of(message: RpcCmd.MessageDto, message_identify):
content = message_identify()
content.parse_from_bytes(message.data)
return content
if __name__ == "__main__":
serialize_file_path = "./trans-data.dat"
serialize_to_file(serialize_file_path)
deserialize_from_file(serialize_file_path)执行结果如下,同样的可以将bytes_data相关的注释取消同时注释掉p2p_msg相关的测试BytesData类型的序列化和反序列化
1
2
3
4
5
6
7
8
9
10
11
12write bytes to file: b'\n3\n\x03p2p\x10d\x1a*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
read bytes from file: b'\n3\n\x03p2p\x10d\x1a*\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python\x12\x15random-key-key-random\x1a\x0f/127.0.0.1:1234'
<Message(RpcCmd)>
<MessageField(id=1, optional)>:
<Message(MessageDto)> <StringField(id=1, optional)>: p2p <Int32Field(id=2, optional)>: 100 <BytesField(id=3, optional)>: b'\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python'
<StringField(id=2, optional)>:
random-key-key-random
<StringField(id=3, optional)>:
/127.0.0.1:1234
message bytes b'\n\x10/127.0.0.1:38211\x12\x16Hello, p2p from python'
msg_content: Hello, p2p from python
msg_target: /127.0.0.1:38211
Python和Java互转
- 同样是仅仅修改序列化文件地址即可
Python反序列化Java
java_serialize_file_path = $path_to_java_serialized$ deserialize_from_file(java_serialize_file_path)
read bytes from file: b'\n-\n\x03p2p\x10d\x1a$\n\x0f/127.0.0.1:1233\x12\x11message from java\x12\x0fRANDOM_KEY_JAVA' <Message(RpcCmd)> <MessageField(id=1, optional)>: <Message(MessageDto)> <StringField(id=1, optional)>: p2p <Int32Field(id=2, optional)>: 100 <BytesField(id=3, optional)>: b'\n\x0f/127.0.0.1:1233\x12\x11message from java' <StringField(id=2, optional)>: RANDOM_KEY_JAVA message bytes b'\n\x0f/127.0.0.1:1233\x12\x11message from java' msg_content: message from java msg_target: /127.0.0.1:12331
2
3
- 执行结果1
2
3
4
5
6
7
8
9
10
11
###### Java反序列化Python
- ```java
@Test
public void deserializeFromPythonFile() throws Exception {
RpcCmd rpcCmd = ProtobufSerializer.getInstance()
.deSerialize(new FileInputStream($python_serialize_path$), RpcCmd.class);
log.info("deserialize cmd:\n{}", rpcCmd);
log.info("deserialize p2p msg:\n{}", rpcCmd.getMessage().dataOfClazz(Point2PointMessage.class, false));
}执行结果
1
2
3
413:15:17.821 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize cmd:
RpcCmd(message=MessageDto(action=p2p, state=100, bytesData=[10, 16, 47, 49, 50, 55, 46, 48, 46, 48, 46, 49, 58, 51, 56, 50, 49, 49, 18, 22, 72, 101, 108, 108, 111, 44, 32, 112, 50, 112, 32, 102, 114, 111, 109, 32, 112, 121, 116, 104, 111, 110], serialData=null), randomKey=random-key-key-random, remoteAddressKey=null)
13:15:17.828 [main] INFO com.tony.simple.JavaProtostuffSerializeDemo - deserialize p2p msg:
Point2PointMessage(targetAddressKey=/127.0.0.1:38211, message=Hello, p2p from python)
io.protostuff使用总结
- 在java平台可以直接定义普通的POJO而不需要手写proto文件并生成对应的对象文件,仅仅通过其所提供的
ProtobufIOUtil
或者ProtostuffIOUtil
来实现序列化和反序列化即可。 - 当需要进行跨语言序列化和反序列化时,需要其他语言中编写对应的proto文件并生成对象文件,而在Java中的泛型实例变量则需要进行修改,改成二次序列化的byte数组,方便在Python等语言中进行解析。Java中的序列化也应采用
ProtobufIOUtil
来实现。此时,Python中可以根据业务类型反序列化成指定的对象,Java中也以该对象来序列化,反过来也是一样的操作。以此来达到的目的是定义MessageDto之后如果需要扩展,不需要修改MessageDto,仅仅需要定义更多的data类型然后赋值给MessageDto.$data。 - 对比纯protobuf实现的来说,在编码上更加简单,不需要写大量的Any.pack()和Any.unpack()