A Framework for Building Application-Level End-to-End 5G Networking Datasets.
As 5G network deployments advance, the opportunities to embed intelligence on the network to support network slicing, fault management, resource management, etc are still available. Intelligence in 5G networks through machine learning and artificial intelligence techniques can be bolstered for adoption by the availability of realistic, open, free, and available datasets. However, this is not always the case. Alongside the lack of openly available and free datasets for experimentation, operators are faced with the need to protect the privacy of their users and businesses. Thus, they cannot easily grant access to 5G networking datasets. However, this gap can be bridged with the small-scale deployment of 5G networks as private test networks. Technically, the availability of open-source and configurable software for 5G experimentation makes it possible to realize experiments on data collection and dataset building. In this work, we propose a framework and detailed methodology that can be used to build application-level datasets in 5G networks. An application-level dataset aims to track the characteristics of an application from the Radio Access Network to the Core Network and display features that can best describe this specific application. In this work, the primary focus is on the user plane traffic on the 5G network. We implement the framework on a physical testbed deployed using srsRAN for gNB radio propagation on an Ettus research X310 Software Defined Radio. Our core network is based on Open5Gs. We write custom packet filtering Quality of Service rules to map QoS flows in the Core network to Data Radio bearers in the Radio Access Network to isolate Spotify traffic on a logical end-to-end QoS pipe. We use this logical end-to-end traffic isolation to capture packets and frames on Wireshark. The packet captures are parsed on a custom script wrapping t-shark filter to expose 5G user plane network features that can be filled with input to create an end-to-end 5G networking dataset.