summary refs log tree commit diff
path: root/fs/btrfs/tree-checker.c
AgeCommit message (Collapse)Author
2023-08-16btrfs: reject invalid reloc tree root keys with stack dumpQu Wenruo
commit 6ebcd021c92b8e4b904552e4d87283032100796d upstream. [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). That ASSERT() makes sure the reloc tree is properly pointed back by its subvolume tree. [CAUSE] After more debugging output, it turns out we had an invalid reloc tree: BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM, QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree. But reloc trees can only exist for subvolumes, as for non-subvolume trees, we just COW the involved tree block, no need to create a reloc tree since those tree blocks won't be shared with other trees. Only subvolumes tree can share tree blocks with other trees (thus they have BTRFS_ROOT_SHAREABLE flag). Thus this new debug output proves my previous assumption that corrupted on-disk data can trigger that ASSERT(). [FIX] Besides the dedicated fix and the graceful exit, also let tree-checker to check such root keys, to make sure reloc trees can only exist for subvolumes. CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17btrfs: tree-checker: check for overlapping extent itemsJosef Bacik
We're seeing a weird problem in production where we have overlapping extent items in the extent tree. It's unclear where these are coming from, and in debugging we realized there's no check in the tree checker for this sort of problem. Add a check to the tree-checker to make sure that the extents do not overlap each other. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: tree-checker: check extent buffer owner against owner rootidQu Wenruo
Btrfs doesn't check whether the tree block respects the root owner. This means, if a tree block referred by a parent in extent tree, but has owner of 5, btrfs can still continue reading the tree block, as long as it doesn't trigger other sanity checks. Normally this is fine, but combined with the empty tree check in check_leaf(), if we hit an empty extent tree, but the root node has csum tree owner, we can let such extent buffer to sneak in. Shrink the hole by: - Do extra eb owner check at tree read time - Make sure the root owner extent buffer exactly matches the root id. Unfortunately we can't yet completely patch the hole, there are several call sites can't pass all info we need: - For reloc/log trees Their owner is key::offset, not key::objectid. We need the full root key to do that accurate check. For now, we just skip the ownership check for those trees. - For add_data_references() of relocation That call site doesn't have any parent/ownership info, as all the bytenrs are all from btrfs_find_all_leafs(). - For direct backref items walk Direct backref items records the parent bytenr directly, thus unlike indirect backref item, we don't do a full tree search. Thus in that case, we don't have full parent owner to check. For the later two cases, they all pass 0 as @owner_root, thus we can skip those cases if @owner_root is 0. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: add support for multiple global rootsJosef Bacik
With extent tree v2 you will be able to create multiple csum, extent, and free space trees. They will be used based on the block group, which will now use the block_group_item->chunk_objectid to point to the set of global roots that it will use. When allocating new block groups we'll simply mod the gigabyte offset of the block group against the number of global roots we have and that will be the block groups global id. >From there we can take the bytenr that we're modifying in the respective tree, look up the block group and get that block groups corresponding global root id. From there we can get to the appropriate global root for that bytenr. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: tree-checker: don't fail on empty extent roots for extent tree v2Josef Bacik
For extent tree v2 we can definitely have empty extent roots, so skip this particular check if we have that set. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: tree-checker: use u64 for item data end to avoid overflowSu Yue
User reported there is an array-index-out-of-bounds access while mounting the crafted image: [350.411942 ] loop0: detected capacity change from 0 to 262144 [350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044) [350.428564 ] BTRFS info (device loop0): disk space caching is enabled [350.428568 ] BTRFS info (device loop0): has skinny extents [350.429589 ] [350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1 [350.429636 ] index 1048096 is out of range for type 'page *[16]' [350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 [350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] [350.429772 ] Call Trace: [350.429774 ] <TASK> [350.429776 ] dump_stack_lvl+0x47/0x5c [350.429780 ] ubsan_epilogue+0x5/0x50 [350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70 [350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs] [350.429832 ] check_leaf+0x754/0x1a40 [btrfs] [350.429874 ] ? filemap_read+0x34a/0x390 [350.429878 ] ? load_balance+0x175/0xfc0 [350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs] [350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs] [350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs] [350.429969 ] ? newidle_balance+0x259/0x480 [350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs] [350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs] [350.430030 ] ? __schedule+0x2fb/0xa40 [350.430033 ] process_one_work+0x1f6/0x400 [350.430035 ] ? process_one_work+0x400/0x400 [350.430036 ] worker_thread+0x2d/0x3d0 [350.430037 ] ? process_one_work+0x400/0x400 [350.430038 ] kthread+0x165/0x190 [350.430041 ] ? set_kthread_struct+0x40/0x40 [350.430043 ] ret_from_fork+0x1f/0x30 [350.430047 ] </TASK> [350.430047 ] [350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2 btrfs check reports: corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected item end, have 4294971193 expect 3897 The first slot item offset is 4293005033 and the size is 1966160. In check_leaf, we use btrfs_item_end() to check item boundary versus extent_buffer data size. However, return type of btrfs_item_end() is u32. (u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897 equals to leaf data size reasonably. Fix it by use u64 variable to store item data end in check_leaf() to avoid u32 overflow. This commit does solve the invalid memory access showed by the stack trace. However, its metadata profile is DUP and another copy of the leaf is fine. So the image can be mounted successfully. But when umount is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered because the only node in extent tree has 0 item and invalid owner. It's solved by another commit "btrfs: check extent buffer owner against the owner rootid". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299 Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-31btrfs: tree-checker: check item_size for dev_itemSu Yue
Check item size before accessing the device item to avoid out of bound access, similar to inode_item check. Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-31btrfs: tree-checker: check item_size for inode_itemSu Yue
while mounting the crafted image, out-of-bounds access happens: [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1 [350.429636] index 1048096 is out of range for type 'page *[16]' [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1 [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] [350.429772] Call Trace: [350.429774] <TASK> [350.429776] dump_stack_lvl+0x47/0x5c [350.429780] ubsan_epilogue+0x5/0x50 [350.429786] __ubsan_handle_out_of_bounds+0x66/0x70 [350.429791] btrfs_get_16+0xfd/0x120 [btrfs] [350.429832] check_leaf+0x754/0x1a40 [btrfs] [350.429874] ? filemap_read+0x34a/0x390 [350.429878] ? load_balance+0x175/0xfc0 [350.429881] validate_extent_buffer+0x244/0x310 [btrfs] [350.429911] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs] [350.429935] end_bio_extent_readpage+0x3af/0x850 [btrfs] [350.429969] ? newidle_balance+0x259/0x480 [350.429972] end_workqueue_fn+0x29/0x40 [btrfs] [350.429995] btrfs_work_helper+0x71/0x330 [btrfs] [350.430030] ? __schedule+0x2fb/0xa40 [350.430033] process_one_work+0x1f6/0x400 [350.430035] ? process_one_work+0x400/0x400 [350.430036] worker_thread+0x2d/0x3d0 [350.430037] ? process_one_work+0x400/0x400 [350.430038] kthread+0x165/0x190 [350.430041] ? set_kthread_struct+0x40/0x40 [350.430043] ret_from_fork+0x1f/0x30 [350.430047] </TASK> [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2 check_leaf() is checking the leaf: corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912) leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE leaf 29396992 flags 0x1(WRITTEN) backref revision 1 fs uuid a62e00e8-e94e-4200-8217-12444de93c2e chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8 item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40 generation 0 transid 0 size 0 nbytes 17592186044416 block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136 sequence 100305 flags 0x2409000(none) atime 0.0 (1970-01-01 08:00:00) ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03) mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00) otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58) item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48 dev extent chunk_tree 3 chunk_objectid 256 chunk_offset 12582912 length 8388608 chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8 The corrupted leaf of device tree has an inode item. The leaf passed checksum and others checks in validate_extent_buffer until check_leaf_item(). Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we are in the device tree. Since the item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access is triggered. The item end vs leaf boundary check has been done before check_leaf_item(), so fix it by checking item size in check_inode_item() before access of the inode item in extent buffer. Other check functions except check_dev_item() in check_leaf_item() have their item size checks. The commit for check_dev_item() is followed. No regression observed during running fstests. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299 CC: stable@vger.kernel.org # 5.10+ CC: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: rename btrfs_item_end_nr to btrfs_item_data_endJosef Bacik
The name btrfs_item_end_nr() is a bit of a misnomer, as it's actually the offset of the end of the data the item points to. In fact all of the helpers that we use btrfs_item_end_nr() use data in their name, like BTRFS_LEAF_DATA_SIZE() and leaf_data(). Rename to btrfs_item_data_end() to make it clear what this helper is giving us. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: drop the _nr from the item helpersJosef Bacik
Now that all call sites are using the slot number to modify item values, rename the SETGET helpers to raw_item_*(), and then rework the _nr() helpers to be the btrfs_item_*() btrfs_set_item_*() helpers, and then rename all of the callers to the new helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: add ro compat flags to inodesBoris Burkov
Currently, inode flags are fully backwards incompatible in btrfs. If we introduce a new inode flag, then tree-checker will detect it and fail. This can even cause us to fail to mount entirely. To make it possible to introduce new flags which can be read-only compatible, like VERITY, we add new ro flags to btrfs without treating them quite so harshly in tree-checker. A read-only file system can survive an unexpected flag, and can be mounted. As for the implementation, it unfortunately gets a little complicated. The on-disk representation of the inode, btrfs_inode_item, has an __le64 for flags but the in-memory representation, btrfs_inode, uses a u32. David Sterba had the nice idea that we could reclaim those wasted 32 bits on disk and use them for the new ro_compat flags. It turns out that the tree-checker code which checks for unknown flags is broken, and ignores the upper 32 bits we are hoping to use. The issue is that the flags use the literal 1 rather than 1ULL, so the flags are signed ints, and one of them is specifically (1 << 31). As a result, the mask which ORs the flags is a negative integer on machines where int is 32 bit twos complement. When tree-checker evaluates the expression: btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK) The mask is something like 0x80000abc, which gets promoted to u64 with sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves all the upper bits zeroed, and we can't detect unexpected flags. This suggests that we can't use those bits after all. Luckily, we have good reason to believe that they are zero anyway. Inode flags are metadata, which is always checksummed, so any bit flips that would introduce 1s would cause a checksum failure anyway (excluding the improbable case of the checksum getting corrupted exactly badly). Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit inode flag should preserve its value and not add leading zeroes (at least for twos complement). The only place that flag (BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in the root item, and indeed for that inode we see 0xffffffff80000000 as the flags on disk. However, that inode is never seen by tree checker, nor is it used in a context where verity might be meaningful. Theoretically, a future ro flag might cause trouble on that inode, so we should proactively clean up that mess before it does. With the introduction of the new ro flags, keep two separate unsigned masks and check them against the appropriate u32. Since we no longer run afoul of sign extension, this also stops writing out 0xffffffff80000000 in root_item inodes going forward. Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: tree-checker: add missing stripe checks for raid1c3/4 profilesDavid Sterba
The stripe checks for raid1c3/raid1c4 are missing in the sequence in btrfs_check_chunk_valid. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: tree-checker: use table values for stripe checksDavid Sterba
There are hardcoded values in several checks regarding chunks and stripe constraints. We have that defined in the raid table and ought to use it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set ↵Josef Bacik
improperly We need to validate that a data extent item does not have the FULL_BACKREF flag set on its flags. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-22btrfs: tree-checker: do not error out if extent ref hash doesn't matchJosef Bacik
The tree checker checks the extent ref hash at read and write time to make sure we do not corrupt the file system. Generally extent references go inline, but if we have enough of them we need to make an item, which looks like key.objectid = <bytenr> key.type = <BTRFS_EXTENT_DATA_REF_KEY|BTRFS_TREE_BLOCK_REF_KEY> key.offset = hash(tree, owner, offset) However if key.offset collide with an unrelated extent reference we'll simply key.offset++ until we get something that doesn't collide. Obviously this doesn't match at tree checker time, and thus we error while writing out the transaction. This is relatively easy to reproduce, simply do something like the following xfs_io -f -c "pwrite 0 1M" file offset=2 for i in {0..10000} do xfs_io -c "reflink file 0 ${offset}M 1M" file offset=$(( offset + 2 )) done xfs_io -c "reflink file 0 17999258914816 1M" file xfs_io -c "reflink file 0 35998517829632 1M" file xfs_io -c "reflink file 0 53752752058368 1M" file btrfs filesystem sync And the sync will error out because we'll abort the transaction. The magic values above are used because they generate hash collisions with the first file in the main subvol. The fix for this is to remove the hash value check from tree checker, as we have no idea which offset ours should belong to. Reported-by: Tuomas Lähdekorpi <tuomas.lahdekorpi@gmail.com> Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment] Signed-off-by: David Sterba <dsterba@suse.com>
2021-01-07btrfs: tree-checker: check if chunk item end overflowsSu Yue
While mounting a crafted image provided by user, kernel panics due to the invalid chunk item whose end is less than start. [66.387422] loop: module loaded [66.389773] loop0: detected capacity change from 262144 to 0 [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613) [66.431061] BTRFS info (device loop0): disk space caching is enabled [66.431078] BTRFS info (device loop0): has skinny extents [66.437101] BTRFS error: insert state: end < start 29360127 37748736 [66.437136] ------------[ cut here ]------------ [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs] [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G O 5.11.0-rc1-custom #45 [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs] [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286 [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001 [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0 [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000 [66.437447] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.437451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.437460] PKRU: 55555554 [66.437464] Call Trace: [66.437475] set_extent_bit+0x652/0x740 [btrfs] [66.437539] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.437576] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.437621] read_one_chunk+0x33c/0x420 [btrfs] [66.437674] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.437708] ? kvm_sched_clock_read+0x18/0x40 [66.437739] open_ctree+0xb32/0x1734 [btrfs] [66.437781] ? bdi_register_va+0x1b/0x20 [66.437788] ? super_setup_bdi_name+0x79/0xd0 [66.437810] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.437854] ? __kmalloc_track_caller+0x217/0x3b0 [66.437873] legacy_get_tree+0x34/0x60 [66.437880] vfs_get_tree+0x2d/0xc0 [66.437888] vfs_kern_mount.part.0+0x78/0xc0 [66.437897] vfs_kern_mount+0x13/0x20 [66.437902] btrfs_mount+0x11f/0x3c0 [btrfs] [66.437940] ? kfree+0x5ff/0x670 [66.437944] ? __kmalloc_track_caller+0x217/0x3b0 [66.437962] legacy_get_tree+0x34/0x60 [66.437974] vfs_get_tree+0x2d/0xc0 [66.437983] path_mount+0x48c/0xd30 [66.437998] __x64_sys_mount+0x108/0x140 [66.438011] do_syscall_64+0x38/0x50 [66.438018] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.438023] RIP: 0033:0x7f0138827f6e [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [66.438078] irq event stamp: 18169 [66.438082] hardirqs last enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0 [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0 [66.438092] softirqs last enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438103] ---[ end trace e114b111db64298b ]--- [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127 [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists) [66.441069] ------------[ cut here ]------------ [66.441072] kernel BUG at fs/btrfs/extent_io.c:679! [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G W O 5.11.0-rc1-custom #45 [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [66.458356] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.459841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.462196] PKRU: 55555554 [66.462692] Call Trace: [66.463139] set_extent_bit.cold+0x30/0x98 [btrfs] [66.464049] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.490466] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.514097] read_one_chunk+0x33c/0x420 [btrfs] [66.534976] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.555718] ? kvm_sched_clock_read+0x18/0x40 [66.575758] open_ctree+0xb32/0x1734 [btrfs] [66.595272] ? bdi_register_va+0x1b/0x20 [66.614638] ? super_setup_bdi_name+0x79/0xd0 [66.633809] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.652938] ? __kmalloc_track_caller+0x217/0x3b0 [66.671925] legacy_get_tree+0x34/0x60 [66.690300] vfs_get_tree+0x2d/0xc0 [66.708221] vfs_kern_mount.part.0+0x78/0xc0 [66.725808] vfs_kern_mount+0x13/0x20 [66.742730] btrfs_mount+0x11f/0x3c0 [btrfs] [66.759350] ? kfree+0x5ff/0x670 [66.775441] ? __kmalloc_track_caller+0x217/0x3b0 [66.791750] legacy_get_tree+0x34/0x60 [66.807494] vfs_get_tree+0x2d/0xc0 [66.823349] path_mount+0x48c/0xd30 [66.838753] __x64_sys_mount+0x108/0x140 [66.854412] do_syscall_64+0x38/0x50 [66.869673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.885093] RIP: 0033:0x7f0138827f6e [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [67.216138] ---[ end trace e114b111db64298c ]--- [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [67.465436] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [67.511660] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [67.558449] PKRU: 55555554 [67.581146] note: mount[613] exited with preempt_count 2 The image has a chunk item which has a logical start 37748736 and length 18446744073701163008 (-8M). The calculated end 29360127 overflows. EEXIST was caught by insert_state() because of the duplicate end and extent_io_tree_panic() was called. Add overflow check of chunk item end to tree checker so it can be detected early at mount time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929 CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: tree-checker: annotate all error branches as unlikelyDavid Sterba
The tree checker is called many times as it verifies metadata at read/write time. The checks follow a simple pattern: if (error_condition) { report_error(); return -EUCLEAN; } All the error reporting functions are annotated as __cold that is supposed to hint the compiler to move the statement block out of the hot path. This does not seem to happen that often. As the error condition is expected to be false almost always, we can annotate it with 'unlikely' as this satisfies one of the few use cases for the annotation. The expected outcome is a stronger hint to compiler to reorder the checks test jump to exit test jump to exit ... which can be observed in asm of eg. check_dir_item, btrfs_check_chunk_valid, check_root_item or check_leaf. There's a measurable run time improvement reported by Josef, the testing workload went from 655 MiB/s to 677 MiB/s, which is about +3%. There should be no functional changes but some of the conditions have been rewritten to produce more readable result, some lines are longer than 80, for the sake of readability. Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: switch cached fs_info::csum_size from u16 to u32David Sterba
The fs_info value is 32bit, switch also the local u16 variables. This leads to a better assembly code generated due to movzwl. This simple change will shave some bytes on x86_64 and release config: text data bss dec hex filename 1090000 17980 14912 1122892 11224c pre/btrfs.ko 1089794 17980 14912 1122686 11217e post/btrfs.ko DELTA: -206 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use cached value of fs_info::csum_size everywhereDavid Sterba
btrfs_get_16 shows up in the system performance profiles (helper to read 16bit values from on-disk structures). This is partially because of the checksum size that's frequently read along with data reads/writes, other u16 uses are from item size or directory entries. Replace all calls to btrfs_super_csum_size by the cached value from fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: add set/get accessors for root_item::drop_levelDavid Sterba
The drop_level member is used directly unlike all the other int types in root_item. Add the definition and use it everywhere. The type is u8 so there's no conversion necessary and the helpers are properly inlined, this is for consistency. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-23btrfs: tree-checker: add missing returns after data_ref alignment checksDavid Sterba
There are sectorsize alignment checks that are reported but then check_extent_data_ref continues. This was not intended, wrong alignment is not a minor problem and we should return with error. CC: stable@vger.kernel.org # 5.4+ Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-11-13btrfs: tree-checker: add missing return after error in root_itemDaniel Xu
There's a missing return statement after an error is found in the root_item, this can cause further problems when a crafted image triggers the error. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210181 Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-26btrfs: tree-checker: validate number of chunk stripes and parityDaniel Xu
If there's no parity and num_stripes < ncopies, a crafted image can trigger a division by zero in calc_stripe_length(). The image was generated through fuzzing. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209587 Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07btrfs: tree-checker: fix false alert caused by legacy btrfs root itemQu Wenruo
Commit 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") introduced btrfs root item size check, however btrfs root item has two versions, the legacy one which just ends before generation_v2 member, is smaller than current btrfs root item size. This caused btrfs kernel to reject valid but old tree root leaves. Fix this problem by also allowing legacy root item, since kernel can already handle them pretty well and upgrade to newer root item format when needed. Reported-by: Martin Steigerwald <martin@lichtvoll.de> Fixes: 259ee7754b67 ("btrfs: tree-checker: Add ROOT_ITEM check") CC: stable@vger.kernel.org # 5.4+ Tested-By: Martin Steigerwald <martin@lichtvoll.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-27btrfs: tree-checker: fix the error message for transid errorQu Wenruo
The error message for inode transid is the same as for inode generation, which makes us unable to detect the real problem. Reported-by: Tyler Richmond <t.d.richmond@gmail.com> Fixes: 496245cac57e ("btrfs: tree-checker: Verify inode item") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25btrfs: tree-checker: remove duplicate definition of 'inode_item_err'Zheng Wei
Remove the duplicate definition of 'inode_item_err' in the file tree-checker.c that got there by accident in c23c77b097dc ("btrfs: tree-checker: Refactor inode key check into seperate function"). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Zheng Wei <wei.zheng@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Verify location key for DIR_ITEM/DIR_INDEXQu Wenruo
[PROBLEM] There is a user report in the mail list, showing the following corrupted tree blocks: item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 location key (4065004 INODE_ITEM 1073741824) type FILE transid 21397 data_len 0 name_len 44 name: FILENAME Note that location key, its offset should be 0 for all INODE_ITEMS. This caused failed lookup of the inode. [CAUSE] That offending value, 1073741824, is 0x40000000. So this looks like a memory bit flip. [FIX] This patch will enhance tree-checker to check location key of DIR_INDEX/DIR_ITEM/XATTR_ITEM. There are several different combinations needs to check: - item_key.type == DIR_INDEX/DIR_ITEM * location_key.type == BTRFS_INODE_ITEM_KEY This location_key should follow the check in inode_item check. * location_key.type == BTRFS_ROOT_ITEM_KEY Despite the existing check, DIR_INDEX/DIR_ITEM can only points to subvolume trees. * All other keys are not allowed. - item_key.type == XATTR_ITEM location_key should be all 0. Reported-by: Mike Gilbert <floppymaster@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Refactor root key check into separate functionQu Wenruo
ROOT_ITEM key check itself is not as simple as single line check, and will be reused for both ROOT_ITEM and DIR_ITEM/DIR_INDEX location key check, so refactor such check into check_root_key(). Also since we are here, fix a comment error about ROOT_ITEM offset, which is transid of snapshot creation, not some "older kernel behavior". Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Refactor inode key check into seperate functionQu Wenruo
Inode key check is not as easy as several lines, and it will be called in more than one location (INODE_ITEM check and DIR_ITEM/DIR_INDEX/XATTR_ITEM location key check). So here refactor such check into check_inode_key(). And add extra checks for XATTR_ITEM. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Clean up fs_info parameter from error message wrapperQu Wenruo
The @fs_info parameter can be extracted from extent_buffer structure, and there are already some wrappers getting rid of the @fs_info parameter. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20btrfs: tree-checker: Check leaf chunk item sizeQu Wenruo
Inspired by btrfs-progs github issue #208, where chunk item in chunk tree has invalid num_stripes (0). Although that can already be caught by current btrfs_check_chunk_valid(), that function doesn't really check item size as it needs to handle chunk item in super block sys_chunk_array(). This patch will add two extra checks for chunk items in chunk tree: - Basic chunk item size If the item is smaller than btrfs_chunk (which already contains one stripe), exit right now as reading num_stripes may even go beyond eb boundary. - Item size check against num_stripes If item size doesn't match with calculated chunk size, then either the item size or the num_stripes is corrupted. Error out anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13Btrfs: make tree checker detect checksum items with overlapping rangesFilipe Manana
Having checksum items, either on the checksums tree or in a log tree, that represent ranges that overlap each other is a sign of a corruption. Such case confuses the checksum lookup code and can result in not being able to find checksums or find stale checksums. So add a check for such case. This is motivated by a recent fix for a case where a log tree had checksum items covering ranges that overlap each other due to extent cloning, and resulted in missing checksums after replaying the log tree. It also helps detect past issues such as stale and outdated checksums due to overlapping, commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and outdated checksums"). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13btrfs: tree-checker: Fix error format string for size_tAndreas Färber
Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(), which returns type size_t, so we need %zu instead of %lu. This fixes a build warning on 32-bit ARM: ../fs/btrfs/tree-checker.c: In function 'check_extent_data_item': ../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=] 230 | "invalid item size, have %u expect [%lu, %u)", | ~~^ | long unsigned int | %u Fixes: 153a6d299956 ("btrfs: tree-checker: Check item size before reading file extent type") Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andreas Färber <afaerber@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Check item size before reading file extent typeQu Wenruo
In check_extent_data_item(), we read file extent type without verifying if the item size is valid. Add such check to ensure the file extent type we read is correct. The check is not as accurate as we need to cover both inline and regular extents, so it only checks if the item size is larger or equal to inline header. So the existing size checks on inline/regular extents are still needed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: rename block_group_item on-stack accessors to follow namingDavid Sterba
All accessors defined by BTRFS_SETGET_STACK_FUNCS contain _stack_ in the name, the block group ones were not following that scheme, so let's switch them. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use better definition of number of compression typeChengguang Xu
The compression type upper limit constant is the same as the last value and this is confusing. In order to keep coding style consistent, use BTRFS_NR_COMPRESS_TYPES as the total number that follows the idom of 'NR' being one more than the last value. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use enum for extent type definesChengguang Xu
Use enum to replace macro definitions of extent types. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Refactor prev_key check for ino into a functionQu Wenruo
Refactor the check for prev_key->objectid of the following key types into one function, check_prev_ino(): - EXTENT_DATA - INODE_REF - DIR_INDEX - DIR_ITEM - XATTR_ITEM Also add the check of prev_key for INODE_REF. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: use has_single_bit_set for clarityDavid Sterba
Replace is_power_of_2 with the helper that is self-documenting and remove the open coded call in alloc_profile_is_valid. Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Add check for INODE_REFQu Wenruo
For INODE_REF we will check: - Objectid (ino) against previous key To detect missing INODE_ITEM. - No overflow/padding in the data payload Much like DIR_ITEM, but with less members to check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18btrfs: tree-checker: Try to detect missing INODE_ITEMQu Wenruo
For the following items, key->objectid is inode number: - DIR_ITEM - DIR_INDEX - XATTR_ITEM - EXTENT_DATA - INODE_REF So in the subvolume tree, such items must have its previous item share the same objectid, e.g.: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 INODE_ITEM 0) (258 INODE_REF 0) (258 XATTR_ITEM 0) (258 EXTENT_DATA 0) But if we have the following sequence, then there is definitely something wrong, normally some INODE_ITEM is missing, like: (257 INODE_ITEM 0) (257 DIR_INDEX xxx) (257 DIR_ITEM xxx) (258 XATTR_ITEM 0) <<< objecitd suddenly changed to 258 (258 EXTENT_DATA 0) So just by checking the previous key for above inode based key types, we can detect a missing inode item. For INODE_REF key type, the check will be added along with INODE_REF checker. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-25btrfs: tree-checker: Fix wrong check on max devidQu Wenruo
[BUG] The following script will cause false alert on devid check. #!/bin/bash dev1=/dev/test/test dev2=/dev/test/scratch1 mnt=/mnt/btrfs umount $dev1 &> /dev/null umount $dev2 &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev1 mount $dev1 $mnt _fail() { echo "!!! FAILED !!!" exit 1 } for ((i = 0; i < 4096; i++)); do btrfs dev add -f $dev2 $mnt || _fail btrfs dev del $dev1 $mnt || _fail dev_tmp=$dev1 dev1=$dev2 dev2=$dev_tmp done [CAUSE] Tree-checker uses BTRFS_MAX_DEVS() and BTRFS_MAX_DEVS_SYS_CHUNK() as upper limit for devid. But we can have devid holes just like above script. So the check for devid is incorrect and could cause false alert. [FIX] Just remove the whole devid check. We don't have any hard requirement for devid assignment. Furthermore, even devid could get corrupted by a bitflip, we still have dev extents verification at mount time, so corrupted data won't sneak in. This fixes fstests btrfs/194. Reported-by: Anand Jain <anand.jain@oracle.com> Fixes: ab4ba2e13346 ("btrfs: tree-checker: Verify dev item") CC: stable@vger.kernel.org # 5.2+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: Detect unbalanced tree with empty leaf before crashing btree operationsQu Wenruo
[BUG] With crafted image, btrfs will panic at btree operations: kernel BUG at fs/btrfs/ctree.c:3894! invalid opcode: 0000 [#1] SMP PTI CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9 RIP: 0010:__push_leaf_left+0x6b6/0x6e0 RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000 RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000 RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240 R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0 FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0 Call Trace: ? _cond_resched+0x1a/0x50 push_leaf_left+0x179/0x190 btrfs_del_items+0x316/0x470 btrfs_del_csums+0x215/0x3a0 __btrfs_free_extent.isra.72+0x5a7/0xbe0 __btrfs_run_delayed_refs+0x539/0x1120 btrfs_run_delayed_refs+0xdb/0x1b0 btrfs_commit_transaction+0x52/0x950 ? start_transaction+0x94/0x450 transaction_kthread+0x163/0x190 kthread+0x105/0x140 ? btrfs_cleanup_transaction+0x560/0x560 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 Modules linked in: ---[ end trace c2425e6e89b5558f ]--- [CAUSE] The offending csum tree looks like this: checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE ... key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17 key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<< key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17 ... leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8 range start 85975040 end 85983232 length 8192 ... leaf 29642752 items 0 free space 3995 generation 17 owner 0 ^ empty leaf invalid owner ^ leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368 range start 92274688 end 95723520 length 3448832 So we have a corrupted csum tree where one tree leaf is completely empty, causing unbalanced btree, thus leading to unexpected btree balance error. [FIX] For this particular case, we handle it in two directions to catch it: - Check if the tree block is empty through btrfs_verify_level_key() So that invalid tree blocks won't be read out through btrfs_search_slot() and its variants. - Check 0 tree owner in tree checker NO tree is using 0 as its tree owner, detect it and reject at tree block read time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202821 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add EXTENT_DATA_REF checkQu Wenruo
EXTENT_DATA_REF is a little like DIR_ITEM which contains hash in its key->offset. This patch will check the following contents: - Key->objectid Basic alignment check. - Hash Hash of each extent_data_ref item must match key->offset. - Offset Basic alignment check. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add simple keyed refs checkQu Wenruo
For TREE_BLOCK_REF, SHARED_DATA_REF and SHARED_BLOCK_REF we need to check: | TREE_BLOCK_REF | SHARED_BLOCK_REF | SHARED_BLOCK_REF --------------+----------------+-----------------+------------------ key->objectid | Alignment | Alignment | Alignment key->offset | Any value | Alignment | Alignment item_size | 0 | 0 | sizeof(le32) (*) *: sizeof(struct btrfs_shared_data_ref) So introduce a check to check all these 3 key types together. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add EXTENT_ITEM and METADATA_ITEM checkQu Wenruo
This patch introduces the ability to check extent items. This check involves: - key->objectid check Basic alignment check. - key->type check Against btrfs_extent_item::type and SKINNY_METADATA feature. - key->offset alignment check for EXTENT_ITEM - key->offset check for METADATA_ITEM - item size check Both against minimal size and stepping check. - btrfs_extent_item check Checks its flags and generation. - btrfs_extent_inline_ref checks Against 4 types inline ref. Checks bytenr alignment and tree level. - btrfs_extent_item::refs check Check against total refs found in inline refs. This check would be the most complex single item check due to its nature of inlined items. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: tree-checker: Add ROOT_ITEM checkQu Wenruo
This patch will introduce ROOT_ITEM check, which includes: - Key->objectid and key->offset check Currently only some easy check, e.g. 0 as rootid is invalid. - Item size check Root item size is fixed. - Generation checks Generation, generation_v2 and last_snapshot should not be greater than super generation + 1 - Level and alignment check Level should be in [0, 7], and bytenr must be aligned to sector size. - Flags check Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203261 Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01btrfs: tree-checker: Check if the file extent end overflowsQu Wenruo
Under certain conditions, we could have strange file extent item in log tree like: item 18 key (69599 108 397312) itemoff 15208 itemsize 53 extent data disk bytenr 0 nr 0 extent data offset 0 nr 18446744073709547520 ram 18446744073709547520 The num_bytes + ram_bytes overflow 64 bit type. For num_bytes part, we can detect such overflow along with file offset (key->offset), as file_offset + num_bytes should never go beyond u64. For ram_bytes part, it's about the decompressed size of the extent, not directly related to the size. In theory it is OK to have a large value, and put extra limitation on RAM bytes may cause unexpected false alerts. So in tree-checker, we only check if the file offset and num bytes overflow. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-16Btrfs: tree-checker: detect file extent items with overlapping rangesFilipe Manana
Having file extent items with ranges that overlap each other is a serious issue that leads to all sorts of corruptions and crashes (like a BUG_ON() during the course of __btrfs_drop_extents() when it traims file extent items). Therefore teach the tree checker to detect such cases. This is motivated by a recently fixed bug (race between ranged full fsync and writeback or adjacent ranges). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: tree-checker: Allow error injection for tree-checkerQu Wenruo
Allowing error injection for btrfs_check_leaf_full() and btrfs_check_node() is useful to test the failure path of btrfs write time tree check. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>