Skip to content

[FIX] Rethrow takeVMSnapshot() exception#3546

Merged
yadvr merged 1 commit into
apache:masterfrom
onitake:fix/snapshot-throw-not-null
Dec 7, 2019
Merged

[FIX] Rethrow takeVMSnapshot() exception#3546
yadvr merged 1 commit into
apache:masterfrom
onitake:fix/snapshot-throw-not-null

Conversation

@onitake
Copy link
Copy Markdown
Contributor

@onitake onitake commented Aug 7, 2019

Description

This patch changes the behaviour of VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(Long vmId, Long vmSnapshotId, Boolean quiescevm) to rethrow the exception when takeVMSnapshot() fails, instead of returning null.

If null is returned, this will lead to a NullPointerException in VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(VmWorkCreateVMSnapshot work) in the snapshot.getId() call, and causes an incomplete snapshot entry to remain in the database.

Fixes: #3518

See there for more details.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

This change has not been tested yet.

At this point, it is unknown if it will cause unexpected side effects.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 7, 2019

VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(Long vmId, Long vmSnapshotId, Boolean quiescevm) is used in two places:

  • VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(VmWorkCreateVMSnapshot work) where the null value will lead to the NPE and stray DB entry
  • VMSnapshotManagerImpl.createVMSnapshot(Long vmId, Long vmSnapshotId, Boolean quiescevm) where it is returned directly

createVMSnapshot in turn is used in two places:

  • CreateVMSnapshotCmd.execute(), where the null value is checked and a different exception is thrown (ServerApiException)
  • StorageSystemSnapshotStrategy.takeHypervisorSnapshot(VolumeInfo volumeInfo) where the null value is also checked and a CloudRuntimeException is thrown.

An alternative fix for the NPE would be to modify VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(VmWorkCreateVMSnapshot work) in the snapshot.getId() to check for a null value before calling getId().

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 7, 2019

I suspect that orchestrateDeleteVMSnapshot(VmWorkDeleteVMSnapshot work) and orchestrateDeleteAllVMSnapshots(VmWorkDeleteAllVMSnapshots work) have similar problems.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 7, 2019

Sorry, scratch that last remark. These function only get a boolean return value. No risk of an NPE there.

@yadvr yadvr added the type:bug label Aug 8, 2019
@yadvr yadvr added this to the 4.13.0.0 milestone Aug 8, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Aug 8, 2019

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 8, 2019

What do you think, @rhtyd ? Would a null value check in VMSnapshotManagerImpl.orchestrateCreateVMSnapshot(VmWorkCreateVMSnapshot work) be better?

Otherwise, the other null checks could be removed completely, as we would never return a null value.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Aug 8, 2019

The only thing I would check is if throwing an exception would cause any snapshot/item in the db to be set in incorrect state, i.e. any cleanup that may be necessary when the method throws the exception.

@yadvr yadvr modified the milestones: 4.13.0.0, 4.13.1.0 Aug 8, 2019
@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-235

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 9, 2019

The problem here is that this fairly difficult, as I don't have a reliable way to trigger throwing an exception, in a real life scenario.
As I'm also new to the CloudStack code base, I don't know if there is a good way to test the behaviour in a unit test.

A test scenario should (roughly) look like this:

  • prepare a test VM
  • prime takeVMSnapshot so it throws an exception
  • create a VM snapshot
  • verify that: no NPE occurs, the error is propagated back to the user and/or log, no snapshot was created, no invalid database entry for the snapshot remains (i.e. no DB entry at all, or a DB entry in error state)

Conversely, a positive test should also exist, that does the following:

  • prepare a test VM
  • create a VM snapshot
  • verify that: no NPE occurs, no error is generated, a snapshot was created, a valid DB entry for the snapshot exists

DaanHoogland
DaanHoogland previously approved these changes Aug 9, 2019
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial first step

} catch (Exception e) {
s_logger.debug("Failed to create vm snapshot: " + vmSnapshotId, e);
return null;
throw new CloudRuntimeException("Failed to create vm snapshot: " + vmSnapshotId, e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting the snapshot to a good state should happen here. not sure what it would be, but probably 'Error'.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we'll have to find out in testing (or bad experience) I can not in good conscience hold this given your answers in the comment, @onitake . Let's be aware in case anything happens to the snapshot functionality

@DaanHoogland DaanHoogland dismissed their stale review August 9, 2019 09:21

should have marked it request changes

@DaanHoogland
Copy link
Copy Markdown
Contributor

@onitake can you add setting the snapshot state before throwing the exception?

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 9, 2019

There are other sources for exceptions, not just the part that did return null, so I would assume that there is already some sort of error handling and setting error state in place.
Is this not the case? I want to avoid duplicating something that's already done...

The NPE was not handled properly anywhere AFAIK, so nothing would properly set an error state.

@DaanHoogland
Copy link
Copy Markdown
Contributor

ok, fair point but than at this point it can also not deal with CloudRuntimeException expecting not snapshot being created in the first place or expect that no error state needs to set at that point.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 12, 2019

@DaanHoogland Setting the snapshot state to "Error" would involve something like this:

            try {
                VMSnapshot.State.getStateMachine().transitTo(vmSnapshot, VMSnapshot.Event.OperationFailed, null, _vmSnapshotDao);
            } catch (NoTransitionException ex) {
                throw new CloudRuntimeException("Cannot transit snapshot to failed state", ex);
            }

I'm not sure putting this into the exception handler is a good idea.

Maybe it would be better to ensure the CloudRuntimeException is accounted for. There should already be code for setting the error state somewhere.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Aug 13, 2019

So, after some more digging, I came to the conclusion that there are two major code paths how the snapshot creation can be reached:

  • via org.apache.cloudstack.storage.vmsnapshot.DefaultVMSnapshotStrategy#takeVMSnapshot: In this case, there is already a finally clause that ensures the snapshot transitions to Error state.
  • via com.cloud.vm.snapshot.VMSnapshotManagerImpl#orchestrateCreateVMSnapshot(com.cloud.vm.snapshot.VmWorkCreateVMSnapshot): This is where the NPE happens, and I have absolutely no idea how this method is called. There are no direct references to it, and the @ReflectionUse annotation indicates that something generates dynamic calls to it. @rhtyd Do you have any idea where this routine is called and how errors are handled?

} catch (Exception e) {
s_logger.debug("Failed to create vm snapshot: " + vmSnapshotId, e);
return null;
throw new CloudRuntimeException("Failed to create vm snapshot: " + vmSnapshotId, e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we'll have to find out in testing (or bad experience) I can not in good conscience hold this given your answers in the comment, @onitake . Let's be aware in case anything happens to the snapshot functionality

@DaanHoogland
Copy link
Copy Markdown
Contributor

please not that this still needs testing!

@yadvr yadvr modified the milestones: 4.13.1.0, 4.14.0.0 Dec 7, 2019
@yadvr yadvr merged commit 29e1bbc into apache:master Dec 7, 2019
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Dec 7, 2019

Smoketests/Travis LGTM, and OK.

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Dec 9, 2019

@rhtyd is this only going into 4.14? No backport to 4.13?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Dec 9, 2019

The PR was opened for master so it will go into 4.14, if you'd want in 4.13 (4.13.1.0 milestone) please send a PR against 4.13 branch @onitake

@onitake
Copy link
Copy Markdown
Contributor Author

onitake commented Dec 10, 2019

All right, will do.

ustcweizhou pushed a commit to ustcweizhou/cloudstack that referenced this pull request Feb 28, 2020
…pache#3546)

Fixes NPE, and throws actual exception with the error stacktrace

Fixes: apache#3518
@onitake onitake deleted the fix/snapshot-throw-not-null branch July 15, 2021 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unhandled NPE in VMSnapShotManagerImpl.orchestrateCreateVMSnapshot produces snapshots that cannot be deleted

6 participants