Design and develop a tool for computational resource monitoring (GPU, CPU, RAM, Disk, ...)

Unit: Computer Network Service (IT)

Franck Tison

Duration: 3 Months
Start Date: 2016

Provide XRCE researchers & developers with tools that allow them to quickly find available computational resources that match their job requirements. This operation will use a synthetic dashboard exploiting "Ganglia" monitoring metrics, or dedicated URLs (API), or a command line mode.

The search process will try to match the user's request with both the static resources declared for the servers, and the real time resources available provided by Ganglia.

This tool will be integrated with other monitoring tools using Ganglia for other platforms (polls of CPU servers). This tool must be easy to deploy and configure (text files for resources, see examples below). It will run on a Centos Server, Apache, SQLite DB.

  • Example to declare a server:

$cat server5.conf












  • Researcher request - Example 1:

I want for a GPU on a server with at least 64G of RAM and 200G of local HDD

URL         http://server/ganglia/request/GPU=1-CPU=1-RAM=64G-HDD=local+200G

                {server12, gpu4}

BASH       $request --master server:10100 -r GPU=1,CPU=1,RAM=64G,HDD=local-200G

                {server3, gpu2}

  • Example 2:

I am asking for 4 jobs, each needs 2 GPU, 128GB of RAM, on specific server

URL         http://server/ganglia/request/4/GPU=2-RAM=128G-SERVER=server3+server4+server6+server8

               {server3, gpu0, gpu1} {server3, gpu2, gpu4} {server6, gpu0, gpu1} {null}

BASH       $request --master server:10100 -r4 GPU=2,RAM=128G,SERVER=server3+server4+server6+server8

               {server3, gpu0, gpu1} {server3, gpu2, gpu4} {server6, gpu0, gpu1} {null}

  • Example 3:

Visual search with a dashboard

