Betzy pre-production

Dear HPC User,

We are pleased to announce that Betzy is opened for pre-production Friday 20 November.

Being close to the weekend, Betzy is opened stepwise. First to prior pilot projects and then for general access Tuesday 24 November.

It has been a long journey, but we are happy to see good performance and stability on the system.

Please note, that during the coming days, changes will be made to the queue system setup, which could necessitate the cancelling of running jobs.

Finally, support will be also offered only from 24 November.

Thank you for your patience and we wish you happy computing!

Best regards,

Lorand Szentannai, on behalf of the preparations team

Updated information about Betzy production

Dear HPC User,

As mentioned previous week, the validation benchmarks have been stable, and we were ready to run and evaluate the site acceptance test. Unfortunately, the interconnect stability issues reoccured once again. 

We and the vendor have been running extensive tests since. The R&D department from the vendor of the interconnect released a new firmware yesterday afternoon, which was applied already yesterday evening and stress-tests immediately started. In order to be sure that the problem is resolved, several days of testing is needed.

Therefore, we have to postpone the production yet again with a week. Current production estimate is end of week 47.

We can assure you that we are very eager to have the system 100% stabilized and in production and everybody involved in the project (be it from Sigma2, the Metacenter, or vendor) is working intensively with this.

Thank you for your understanding!

Best regards,

Lorand Szentannai, on behalf of the preparations team

Information regarding Betzy production

Dear HPC User,

Our previous estimate of production on Betzy has proved to be somewhat optimistic. 

With the help of the vendor, we believe we have identified and fixed the cause of the interconnect stability problem on Betzy. The most recent validation benchmarks have been stable, and we will begin the site acceptance test (SAT) within Friday, 6 November. If the machine passes the SAT, it will be handed over to the operations and opened for production. 

The final preparations usually take 1-3 days. We therefore estimate that production will begin on Betzy within next week, week 46.

Best regards,

Lorand Szentannai, on behalf of the preparations team

Estimated production date for Betzy

Dear HPC user,

Our newest supercomputer – Betzy – is unfortunately delayed entering production due to circumstances outside of our control. 

We have had significant delays in getting all the components in place due to slack in logistics caused by the Covid pandemic. However, approximately 94% of the system capacity is now ready installed and configured. Work is ongoing to prepare the outstanding system capacity in the upcoming weeks. 

Benchmarks and pilot testing on Betzy have revealed an intermittent stability problem with the node interconnect. The vendor has been investigating the issue in the past two weeks in order to identify the source of the issue. Our new best estimate is that Betzy go into production in week 45

This has consequences for the decommissioning of Vilje and Stallo because we rely on Betzy to free up computational load from the other machines. Thus, the new decommissioning date for Vilje and Stallo is 1. DecemberWe would like the machines to be fully utilized until they are decommissioned, and therefore encourage you to continue using Vilje and Stallo if you still have the opportunity.

Thank you for your understanding!

Best regards,
Lorand Szentannai, on behalf of the preparations team

Betzy access closed, preparing for production

UPDATE:

  • 08.10.2020: After extensive testing, the vendor found stability issues are unfortunately still present. The problem is escalated and under investigation. We will get back to you with more information as soon as we get an update from the vendor.
  • 30.09.2020: The vendor will carry out firmware updates on Betzy during today and as a consequence we need to stop running jobs and run tests to make sure the system is table.
    Access to the machine will be reopened as soon as we are ready with the tests. Please follow the progress here, on OpsLog.
  • 25.09.2020: We are temporarily reopening the access over the weekend in order to allow further testing on the machine.
    Further work is expected to be done by the vendor sometime next week and as a consequence, jobs will be terminated again and access closed while maintenance will be ongoing.

Dear Betzy pilots,

We are pleased to announce that despite logistics challenges caused by Covid-19, most of the outstanding issues were sorted out. This unusual situation requested a more dynamic approach from everyone involved, while putting pressure on the communication due to uncertainties and quick situation changes. Because of this, setting and advertising a production date proved to be difficult.

We can now start aiming for setting Betzy into production in the beginning of October. Before we can conclude, and proceed with the preparations, we need to re-run several comprehensive tests.

Therefore, we will have to stop all jobs and access to Betzy starting from tomorrow, 17 September 2020 10AM. Access to Betzy will be re-established as soon as all the tests are effectuated. Please be prepared for a more extensive maintenance this time, which might require up to two and half weeks.

The file system on Betzy is not going to be reformatted. That is, your data will not be removed intentionally. However, we can not guarantee data integrity until backups are taken and the machine is placed into production. Therefore, we strongly advise you to take a backup of your important data for the sake of security.

Apologies for the short notice and the inconvenience this is causing to you.

Best regards,

Lorand Szentannai, on behalf of the preparations team