Quota enforcement on Fram

Dear Fram User,

We have fixed the broken quota indexes on Fram /cluster file system.
Due to heavy disk usage on both Fram and NIRD, we need to enforce quotas on all relevant areas:

  • /cluster/home
  • /cluster/projects
  • /cluster/shared

To be able to control disk usage on the areas mentioned above, group ownerships are enforced nightly.

To avoid job crashes prior to starting jobs, please make sure that unix users and groups you are a member of have enough free quota, be it block or inode quotas.

This Wikipedia page gives good explanation about the block and inode quota types.

To check quotas on Fram, you may use the dusage command. i.e.

# list all user and groups quotas
dusage -a
# for help use
dusage -h

Thank you for your understanding!
Metacenter Operations

Enforcing standard quota on $HOME

Dear Fram and Saga user,

As you may know, we have a standard 20GB block quota on $HOME on Fram and Saga HPC resources. This was however not enforced until now, but due to frequent overuse and backup limitations, we are compelled to do it now and will start to be in effect starting on 04.11.2019.

Any project related data shall be moved to /cluster/projects area and unneeded data shall be removed.

We have also implemented a new policy with regards to backups and any files placed under $HOME/nobackup or $HOME/tmp will be excluded.

For more information, please check the documentation pages at https://documentation.sigma2.no/.

Thank you for your understanding!
Metacenter Operations

Saga: problems with logging in

We are currently experiencing problems with the /cluster file system on Saga. This prevents users from logging in.

We are investigating, and will update here when we know more.

Update: 11:30 we have identified and solved the problem, now /cluster filesystem is back online.

NIRD and Service Platform downtime – 26.06.2019

  • 2019-07-02 08:00: All Service Platform services resumed. It might be that some of the services are not properly working and need to be restarted after the maintenance. If you experience any problem with your service, please do not hesitate to contact us asap.
  • 2019-06-27 19:58: NIRD filesystems are mounted back to Fram.
  • 2019-06-27 19:46: NIRD login nodes are started back now, you may login and access your files stored on NIRD.
    Remaining Service Platform services will be started tomorrow morning.
  • 2019-06-27 09:54: We are starting back and testing the file system now.
  • 2019-06-26 22:08: All hardware replacements are done now and the storage system is monitored for any signs of instability. Starting back of the filesystem is planned for tomorrow morning. We will keep you updated.
  • 2019-06-26 14:05: Vendor is meticulously checking each NIRD storage component and decided to replace main controller chassis.
    In the mean time we are applying firmware updates on the Service Platform to improve stability and security.
  • 2019-06-26 08:15: Maintenance has started.

Dear NIRD and Service Platform User,

We have a planned downtime on the 26th of June, Wednesday next week, to replace some defective hardware. Systems will taken offline starting from 08:00AM.

Engineer from storage vendor will assist us from the very first hour.

We expect the maintenance to finish in one day.
Will keep you updated here.

Metacenter Operations

Fram MDS patched

Dear Fram User,

This morning around 09:05, once again has the Fram metadata server crashed and likely had impact on running jobs.

A mitigating patch was delivered by the vendor yesterday and we used this opportunity to apply it on our metadata servers.

We will keep the system closely monitored and cooperate with the vendor on further stabilizing the system.

Apologies for any inconvenience this may have caused!

Fram MDS crashed

Dear Fram User,

Once again has the Fram metadata server crashed and likely had impact on running jobs.
We are in contact with the storage vendor for patching the file system.

Apologies for the inconvenience!

Clean-up of Fram filesystem needed

Dear Fram users,

The Fram filesystem, and most critically /cluster/work and /cluster/home, is running out of inodes, and there is only 8% left. If we run out of inodes it will not be possible to create new files. To avoid loss of data and job crashes we kindly ask all of you to if possible delete files that you no longer need.

Best regards,

Jon, on behalf of the Operations Team

Missing mounts for Project NS2345K on NIRD

Missing mounts for Project NS2345K on NIRD
The NFS server which is exporting /tos-project1/NS2345K/FRAM to NIRD/FRAM has crashed yesterday around 16:00. We have recovered the NFS server and /tos-project1/NS2345K/FRAM is re-exported agian. We are currently working on mounting the filesystem in login containers and meanwhile investigating the cause. We are sorry for the inconveniences caused.