EasyDMP is in production!

Dear NIRD Users,

It is with great excitement that we in UNINETT Sigma2 hereby announce the launch of the easyDMP, a new service that offers researchers, with minimal experience in data management, a simple way of creating a Data Management Plan (DMP). This is achieved by transforming any funding agency’s or institution’s data management guidelines and policies into a series of easy to answer questions, many containing a simple list of canned answers to pick from. The resulting plan can be used as a blueprint for researchers to put in place the necessary elements that ensure their data are adequately managed. The plan can be edited and shared, and also duplicated to serve as a starting point for other datasets.

EasyDMP is free of charge and available to any researcher in Norway and in Europe:

https://easydmp.sigma2.no/

EasyDMP has been developed and is operated by Sigma2 in collaboration with the EUDAT2020 project. EasyDMP presently implements the EU H2020 recommendations, but the service has been design to easily integrate other schemas, for example institutional specific recommendations. Please do not hesitate to contact us if you want to integrate the easyDMP with your own tailored DMP questionnaire scheme.

Improvements to the tool will be driven by your needs. Thanks to the continuous deployment method, the easyDMP service will be adding new functionalities continuously. We can already anticipate that the next release will have functionality that enables other services to make use of the plan output in compliance with the FAIR principles.

We are now working to establish an external reference group for the service, that will include experts from user communities, librarians and curators and national service providers. This because we really believe that the easyDMP service will benefit from a wide national pool of competence and stakeholders.

Please do feel free to test it and start using it, and please do not hesitate to give us feedback at (support @easydmp.sigma2.no).

More info about easyDMP here:
https://www.sigma2.no/content/easydmp

2 days downtime starting on 25th of April

Update:

  • 2018-04-30 14:46 File system issues are solved now on Fram and access is reopened. Jobs are temporarily on hold due to some troubles with the cooling system in the server room. As soon as that is sorted out, jobs will be permitted again.
  • 2018-04-30 10:15 We are still struggling with the /cluster file system. The problem is escalated to the Vendor. At the moment we do not have a time estimate when Fram is back online, but there is work in progress to fix this as soon as possible, hopefully during the day.
  • 2018-04-27 18:44 Unfortunately there are still problems taking up the Lustre file system on Fram. Issue is caused by an incompatibility hitting routing between IB networks/fabrics on the Lustre object storage servers. The vendor is now planning and working to carry out an emergency update on the system. We are sorry for the trouble.
  • 2018-04-27 16:49 Access to NIRD is reopened now.
  • 2018-04-26 22:50 We are having problems on taking up the Lustre file system on Fram. The issue is reported to the vendor. Additionally, there are some minor issues which must be addressed on NIRD before opening it for production, but we expect reopening the access to both Fram and NIRD during tomorrow.

 

Dear Fram and NIRD user,

A two day downtime is scheduled for week 17. The scheduled maintenance will start on Wednesday, 25th of April, at 09:00 AM and will affect Fram, NIRD and the Service Platform.

During this time we will:
1. Extend NIRD storage space with ~1.1PB.
– The new hardware will be coupled to NIRD and extra disks loaded to the system during these two days.
– Please note that the above advertised storage will not be available at once. Storage space is gradually added as soon as loaded disks are formatted and available to the file system.
– One of our top priorities is to address the inode shortage on $HOME areas.
2. Address file system related bugs on NIRD by upgrading the afferent software and tune some parameters on the servers.
3. Fix broken hardware on Fram.
4. Apply any outstanding patches to both Fram and NIRD.
5. Carry out maintenance work on the cooling system for Fram.

There is a job reservation in place on Fram starting on 08:45 AM 25th of April.  Jobs that cannot complete before that time, will be left pending in the queue with a Reason “ReqNodeNotAvail” and an estimated start time of 2154.  They will be started when the maintenance is over.

We will keep you updated via OpsLog/Twitter.

Thank you for your consideration!
Metacenter Operations

$HOME file system availability issues on Fram – FIXED

We are experiencing availability issues for $HOME file system on Fram. The problem is currently under investigation and we are actively working on solving it.
Update 09:30:
Problem is fixed now.
One of the file servers exporting $HOME  went down and the failover didn’t work as intended.

Thank you for your understanding!
Metacenter Operations

Issues on /cluster file system

We have identified a  bug on the /cluster file system which can lead to random job crashes.

The bug is triggered on the Lustre file system by a combination of running Fortran code compiled with Intel MPI.

A bug report is filed now to the storage vendor.

We will keep you updated!

Update 06-04-2018: We have found and fixed a problem on the file servers and with the tests we ran, we can not reproduce the problem anymore.

Thank you for your consideration!
Metacenter Operations

Change in defaults for job placement on islands

Fram has been in production for half a year now, and we’ve gathered enough data to see possible improvements on defaults. One such improvement is related to how jobs are placed with regards to the island topology on Fram. The way Fram is built, the network bandwidth within an island is far better than between islands. For certain types of jobs spanning many compute nodes, being spread over multiple islands can give a negative impact on performance.

To limit this effect we have now changed the default setup so that each job will run within one island, if that does not delay the job too much, as described here: 

https://documentation.sigma2.no/jobs/framjobplacement.html 

Note that this may lead to longer waiting in the queue, in particular for larger jobs. If your job does not depend on high network throughput, the above mentioned document also describes how to override the new default.

Best regards,

Metacenter Operations

Home directory file permissions

In accordance with the Data handling and Storage policy we will shortly enable automatic enforcement of file permissions on your home directories. We expect this to take place after the next maintenance stop.

This means that you may no longer grant other users/groups read or write access to your home directory. Any sharing of data between users must be done through project or work directories.

We take this opportunity to remind you that your home directory contents are treated as private data by the Metacenter staff and will not be shared with other users, even with your supervisor or project leader without your prior, written consent. Should you be unable to give consent, requests will be handled in accordance with applicable laws and regulations.

Please remember to share necessary data as required before changing jobs, leaves of absence and so on.

Best regards,

the Metacenter security team

Issues with $HOME file system – resolved

We are experiencing troubles with the $HOME (/nird/home) file system.
We are working on the problem and try to fix it as soon as possible. Will get back with further information later.

Update:
A lot of files has been generated on the $HOME file system by some of the users, using all the available inodes.
Problem has been remediated around 09:50 in the morning.

Fram Downtime February 8th 2018 07:00 – POSTPONED

Due to delay in the work done by the power company the power outage will be postoned, no new time is currently scheduled  We are sorry for this and any trouble this may cause for you.  We estimate the downtime to be no longer than 4 hours.

A new system reservation will be made when the new outtage is planned. We will need to re-queue any running jobs that is not finished by the time for the outage.

on behalf of the Fram HPCstaff

Steinar Trædal-Henden