Virtual Cloud Service System for Building Research and Educational s over Inter-Cloud environments Atsuko Takefusa National Institute of Informatics (NII) This work was partly supported by JST CREST Grant Number JPMJCR1501, Japan. 1
Inter-Cloud Computing Inter-Cloud is a new computing paradigm. Utilizes geographically distributed resources provided by multiple cloud data centers. Enables the implementation of performance-assured, highly available and fault-tolerant applications. Data locality Performance assurance job Disaster recovery Avoiding cloud vender lock-in Load balancing 2
Issues for Inter-Cloud Computing Security and Performance of inter-cloud network. Difficult to ensure security of networks between multiple clouds over the public Internet. The performance of Internet is unstable and limited. High performance R&E network can improve them. Difficulty of building an application environment. Requires knowledge about not only the target applications, but also cloud APIs and optimization of cloud resources. Instance types, cost, OS/kernel/library versions, network configuration, etc. Performance tuning in multiple cloud environments is not easy for general researchers or faculty. 3
SINET, Japanese R&E Network SINET is a Japanese R&E network operated by NII. SINET5 allows construction of a high performance inter-cloud environment. Fully meshed 100-Gbps lines among all prefectures. High performance international links. Secure L2/L3 VPN services. Direct connections to cloud datacenters. AWS, Azure, etc. : SINET router : Domestic line (100Gbps or more) : International line (100Gbps) : International line (10Gbps) To Europe Sapporo Fukuoka Osaka Tokyo To US To Asia 4
Issues for Inter-Cloud Computing Security and Performance of inter-cloud network. Difficult to ensure security of networks between multiple clouds over the public Internet. The performance of Internet is unstable and limited. Difficulty of building an application environment. Requires knowledge about not only the target applications, but also cloud APIs and optimization of cloud resources. Instance types, cost, OS/kernel/library versions, network configuration, etc. Performance tuning in multiple cloud environments is not easy for general researchers or faculty. 5
Virtual Cloud Service System (VCSS) Propose Virtual Cloud Service System (VCSS) [CloudCom2017]. Aims to support academic users to use research and educational applications over clouds via R&E networks. Allows users to build and operate an effective application environment using Jupyter Notebook. Galaxy Template VCSS User (VC Administrator) Galaxy Users VCP system VCP Manager Unit: galaxy Unit: galaxy-compute Unit: compute VC Controller GW VCSS App: Galaxy +Slurmctld BC NFS Server BM disk node #1 App: Slurmd BC BM Private Cloud OpenStack@NII chiba node #2 App: Slurmd BC BM NFS Client Scale-out VPN over SINET5 node #n App: Slurmd BC VM Public Cloud: AWS@Tokyo node #n+1 App: Slurmd BC VM 6
Virtual Cloud Service System (VCSS) Templates Cluster Cluster L2/3 Net L2/3 Net L2 Overlay L2 Overlay Virtual Cloud Provider (VCP) BM BM VM VM BM BM VM VM High Performance R&E Network (e.g., SINET5) Cloud Provider Cloud Provider Cloud Provider Cloud Provider OpenStack AWS Azure 7
Virtual Cloud Service System (VCSS) Virtual Cloud Provider (VCP) middleware Enables easy deployment of computing infrastructure called virtual cloud (VC) using Docker container. Secure connections between remote cloud data centers. Abstracts different cloud APIs and provides a service I/F to control a VC deployed over multiple clouds. Templates Provides Jupyter-based procedure manuals and a management environment for typical research and educational applications. Virtual Cloud Provider (VCP) BM BM VM VM BM BM VM VM Cloud Provider Templates Cluster Cluster L2/3 Net L2/3 Net L2 Overlay L2 Overlay High Performance R&E Network (e.g., SINET5) Cloud Provider Cloud Provider Cloud Provider OpenStack AWS Azure 8
) () ) ( 9
Virtual Cloud (VC) deployed by VCP VCP composes a VC in Docker-in-Docker manner. Base containers configure VM/BM and provide monitoring functionalities for VC. Users can deploy application containers using existing images provided by each community. app bins/libs Container Constructs secure inter-datacenter networks. OS (kernel) app bins/libs Container VCP Base Container Hardware / VM LAN VLAN SINET5/R&E Network L2VPLS LAN VLAN G W IPSec tunnel Internet G W 10
VCP Performance: iperf3 Throughput Better Two m4.large/r4.large nodes in AWS m4.large: 450 Mbps (EBS BW) r4.large: 10G NIC Throughput [Mbps] host net=host net=ipvlan net=bridge 600 500 400 300 200 100 net=overlay Capped by VM NIC performance Throughput [Mbps] 10000 8000 6000 4000 2000 w/ EN show good performance and VCP overhead is slightly. Better w/o EN show low performance 0 EN: Enhanced Networking m4.large w/o EN m4.large w/ EN m4.large w/ VCP, EN 0 r4.large w/o EN r4.large w/ EN r4.large w/ VCP, EN Default is w/o EN and images w/ EN for some instances not provided. VCP allows users to use optimized communication environments easily. 11
VCP Performance: HPL Performance Four m4.large/r4.large nodes w/ EN in AWS host net=host net=ipvlan net=overlay VCP net=host VCP net=ipvlan VCP net=overlay Better Performance [Gflops] 120 100 80 60 40 20 host 1-tier VCP host 1-tier VCP host 1-tier VCP host 1-tier VCP host 1-tier VCP 0 m4.large r4.large m4.large r4.large r4.large 50000 57000 87000 Matrix size Computing performance is decreasing due to the container overheads, but it is acceptable. 12
VCP Performance: Inter-Cloud iperf3 Throughput Base container on BM Base container on r4.large NII Cloud in Chiba SINET5 VCP IGW in Tokyo SINET5 IPSec over Internet AWS Tokyo Better Throughput [Mbps] 3000 2500 2000 1500 1000 500 0 496.7 240.7 IPSec over Internet SINET SINET5 provides a high-performance communication environment. 2472 2230 2270 1436 IPSec throughput saturates around 400Mbps. 407.1 339.7 327 299.1 1 10 100 1000 Transfer data size [Mbytes] 13
VCP abstracts different cloud provider APIs. Utilizes Terraform to control resources provided by each provider. Supports major cloud provider APIs. Allows to implement a provider plug-in. Provides REST API. VCP Service Interfaces User can describe configuration of VC in a YAML-based configuration file (CCI) Also provides user-friendly I/F. Python-based VCP SDK and a Jupyter Notebook environment. Jupyter Notebook Template VCP SDK REST I/F VC Controller Terraform POST + CCI VCP API A API B API C Cloud A Cloud B Cloud C OpenStack, AWS, Azure, etc. 14
VCP SDK Pseude-code of VCP SDK # initialize a VC vcp_accesskey = "KEY_FOR_TEST001 my_vc_name = "my_vc vc = VcpSDK(vcp_accesskey, my_vc_name) # specify the node flavor (performance) spec = vc.spec.find( aws, small ) # provision a node nodes = vc.unit.create( sample_server, spec) # cleanup vc # vc.cleanup() Example of VCP SDK property file # VC Controller address vcc_host = '172.17.0.1 ### Default AWS settings aws_access_key= AWS_access_key aws_secret_key="aws_secret_key aws_region="ap-northeast-1" ### Default Azure settings azure_subscription_id= Azure_subscription_ID : Node flavors for each cloud are defined in advance in VCP SDK. User can overwrite each parameter. Example of generated CCI - vc: name: my_vc - unit: name: sample_server image: vcp/base:1.0 num: 1 cloud_provider: aws cloud_params: access_key: AWS_access_key instance-type: t2.small region: ap-northeast-1 secret_key: ' AWS_secret_key disk_size: 20 15
16
Provides Jupyter-based procedure manuals and a management environment for typical research and educational applications. Jupyter is a python-based coding and documentation environment via web interface. NII has been developing extensions for collaborative operation of infrastructure. https://github.com/nii-cloud-operation We have been developing them in cooperation with each app community. Templates Templates Cluster Virtual Cloud Provider (VCP) Cluster L2/3 Net L2/3 Net L2 Overlay L2 Overlay BM BM VM VM BM BM VM VM High Performance R&E Network (e.g., SINET5) Cloud Provider Cloud Provider Cloud Provider Cloud Provider OpenStack AWS Azure 17
HPC Template Supports construction of an HPC cluster system using OpenHPC. LMS Template Builds and maintains a Moodle-based learning management system. VDI Template Builds a scalable virtual desktop infrastructure (VDI) using Apache Guacamole. Genome Analysis Template Constructs genome analysis environment using Galaxy workflow tool. Courseware Template Templates Galaxy Provides courseware and its contents such as TensorFlow and Elasticsearch. 18
An Example: HPC Template Outline of HPC Template Parameter settings Resource Allocation Network configuration of application containers Installation of HPC libraries User account settings Installation of optional packages Execution of benchmark programs 19
20
Demonstration of Scale-out for Galaxy VC 21
Summary Propose Virtual Cloud Service System (VCSS). Consists of the inter-cloud middleware, Virtual Cloud Provider (VCP) and Templates. Aims to support academic users to use research and educational applications over clouds via R&E networks. Allows users to build and operate an effective application environment using Jupyter Notebook. Future work is international collaboration. Deployment of an experimental environment over international inter-cloud resources using VCP and NSI testbed. 22
Software Defined Experimental Env by VCP We are working on on-demand construction of international experimental environment over SINET, JGN and Pacific Wave. RISE Dojima JGN RISE Tokyo Pacific Wave PRAGMA- ENT C C Minio JGN Dojima JGN Tokyo PW Seattle NAIST SINET VLAN3 NSI-enabled network C C Minio SINET Dojima VLAN2 SINET Tokyo VLAN4 PW LA Osaka Univ. VLAN1 VLAN0 NSI/L2OD on-demand VLAN C C Minio C PRAGMA-ENT static VLAN Static VLAN BC (Base Container) C Minio BC2 BGP OVS BC1 VCP System AWS Tokyo US UCSD BGP App ( Container) NII Chiba NII Tokyo 23