Maintenance on NIRD, NIRD Toolkit, Fram and Saga, 20th April -24th April

Dear NIRD, NIRD Toolkit, Fram and Saga User,

We will have a four day long scheduled maintenance on NIRD, NIRD Toolkit, Fram and Saga starting on the 20th of April, 09:00 AM.

During the maintenance we will:

  • carry out software and firmware updates on all systems

Files stored on NIRD will be unavailable during the time of the maintenance and therefore so will be the services. This will of course affect the NIRD file systems available on Fram and Saga too.

Login services to NIRD, NIRD-toolkit, Fram and Saga will be disabled during the maintenance

Please note that backups taken from the Fram and Saga HPC clusters will also be affected and will be unavailable during this period.

Please accept our apologies for the inconvenience this downtime is causing.

Metacenter Operations

NIRD: file system problems

Dear NIRD user,
We have had serious problems with the GPFS file systems this afternoon and had to stop the storage and all the services.

The NIRD storage and the NIRD-toolkit are now back online.
Please notify the metacenter support if you notice any remaining issues.

We are very sorry for the inconvenience.

Update 10:00 24.02.2020 :  We still have problem with Nird mount points on Fram, we are working on the problem, we will keep users posted here.

Update 10:45 24.02.2020:  Problem with Nird mount point on Fram is resolved.

NIRD project file systems mounted on Saga

Dear Saga User,

We have the pleasure to announce that we have now fixed all the technical requirements and mounted NIRD project file systems on Saga login nodes.

You may find your projects in the

/nird/projects/nird

folder.

Please note that to transfer of large amount of files is sluggish and has a big impact on the I/O performance. It is always better to transfer one larger file than many small files.
As an example, transfer of a folder with 70k entries and about 872MB took 18 minutes, while the same files archived into a single 904MB file took 3 seconds.

You can read more about the tar archiving command by reading the manual pages. Type

man tar

in your Saga terminal.

Metacenter Operations

Reorganized NIRD storage

Dear NIRD User,

During the last maintenance we have reorganized the NIRD storage.

Projects have now a so-called primary site which is either Tromsø or Trondheim. Previously we had single primary site, Tromsø. This change had to be introduced to prepare coupling NIRD storage with Saga and the upcoming Betzy HPC clusters.

While we are working on a final, seamless access solution regardless of the primary site for your data, please use the following temporary solution:


To work closest to your data you have to connect to the login nodes located at the primary site of your project:

  • for Tromsø the address is unchanged and is login.nird.sigma2.no
  • for Trondhein the address is login-trd.nird.sigma2.no

To find out the primary site of your project log in on a login node and type:

readlink /projects/NSxxxxK

It will print out a path starting either with /tos-project or /trd-project.
If it starts with “tos” then use login.nird.sigma2.no.
If it starts with “trd” then use login-trd.nird.sigma2.no.

Metacenter Operations

Network outage

Update

  • 2020-01-13 14:54: Problems have been sorted out now and network is functional again.
  • 2020-01-13 14:40: Problems are unfortunately back again. Uninett’s network specialists are working on solving the problem as soon as possible.
  • 2020-01-13 14:22: Network is functional again. Apologies for the inconvenience it has caused.

We are currently experiencing network outage on Saga and some parts of NIRD. The problem is under investigation.

Please check back here for an update on this matter.

Metacenter Operations

NIRD and NIRD Toolkit scheduled maintenance

Update:

  • 2020-01-23 17:30: Services are now progressively restarted.
  • 2020-01-22 21:49: We have detected file system level corruption and to avoid data corruption we had to unmount and rescan all the file systems (about 18PB) on NIRD.
    We are currently working on starting back the services on NIRD Toolkit.
  • 2020-01-22 11:11: Software and firmware is now upgraded on NIRD Toolkit.
    Most of the fileset changes are also carried out. We are currently working on the last bits. Will keep you updated.
  • 2020-01-20 08:58: Maintenance has started. NIRD file systems are unmounted from Fram until maintenance is finished.

Dear NIRD and NIRD Toolkit User,

We will have a three day long scheduled maintenance on NIRD and NIRD Toolkit starting on the 20th of January, 09:00 AM.

During the maintenance we will:

  • carry out software and firmware updates,
  • change geo-locality for some of the projects,
  • replace synchronization mechanisms,
  • depending on part delivery times from disk vendor – expand the storage and quotas.

Files stored on NIRD will be unavailable during the time of the maintenance and therefore so will be the services. This will of course affect the NIRD file systems available on Fram too.

Please note that backups taken from the Fram and Saga HPC clusters will also be affected and will be unavailable during this period.

Please accept out apologies for the inconvenience this downtime is causing.

Metacenter Operations

NIRD crash.

NIRD storage system was crashed and unavailable for short period of time.
Due to this crash, users logged in to NIRD and Fram experienced problemes.
The problem is resolved, NIRD storage system is online now.

Please contact us if you still encounter problems.

Note: The export of NIRD to FRAM does not work currently

NIRD and service platform downtime on Thursday 22nd of August.

Update:

  • 2019-08-26 12:45: NIRD project areas are mounted on Fram login nodes.
  • 2019-08-25 14:15:Service Platform is up now, you can login now to NIRD and access your files.
    NIRD project areas will be reconnected to Fram tomorrow.
  • 2019-08-23 18:42: Vendor started a forced health check on the system which is taking more time then expected. We will re-open access to NIRD and Service Platform as soon as checks and rebuilds are finished.
  • 2019-08-23 08:05: Storage vendor has finished the hardware replacements and installation of new firmware on the storage system.
    We are currently monitoring the storage system together with the vendor.

Dear NIRD and Service Platform users,

We have a planned downtime on the 22nd of August, to replace some defective hardware. Systems will be taken offline starting from 08:00AM.

Engineer from storage vendor will assist us from the very first hour.

We expect the maintenance to finish in one and a half day.

NIRD projects will still be accessible during the maintenance

from login-trd.nird.sigma2.no but in read-only mode.

Will keep you updated here.

Sorry for the short notice.

Metacenter Operations

NIRD and Service Platform downtime – 26.06.2019

  • 2019-07-02 08:00: All Service Platform services resumed. It might be that some of the services are not properly working and need to be restarted after the maintenance. If you experience any problem with your service, please do not hesitate to contact us asap.
  • 2019-06-27 19:58: NIRD filesystems are mounted back to Fram.
  • 2019-06-27 19:46: NIRD login nodes are started back now, you may login and access your files stored on NIRD.
    Remaining Service Platform services will be started tomorrow morning.
  • 2019-06-27 09:54: We are starting back and testing the file system now.
  • 2019-06-26 22:08: All hardware replacements are done now and the storage system is monitored for any signs of instability. Starting back of the filesystem is planned for tomorrow morning. We will keep you updated.
  • 2019-06-26 14:05: Vendor is meticulously checking each NIRD storage component and decided to replace main controller chassis.
    In the mean time we are applying firmware updates on the Service Platform to improve stability and security.
  • 2019-06-26 08:15: Maintenance has started.

Dear NIRD and Service Platform User,

We have a planned downtime on the 26th of June, Wednesday next week, to replace some defective hardware. Systems will taken offline starting from 08:00AM.

Engineer from storage vendor will assist us from the very first hour.

We expect the maintenance to finish in one day.
Will keep you updated here.

Metacenter Operations

Missing mounts for Project NS2345K on NIRD

Missing mounts for Project NS2345K on NIRD
The NFS server which is exporting /tos-project1/NS2345K/FRAM to NIRD/FRAM has crashed yesterday around 16:00. We have recovered the NFS server and /tos-project1/NS2345K/FRAM is re-exported agian. We are currently working on mounting the filesystem in login containers and meanwhile investigating the cause. We are sorry for the inconveniences caused.