summary refs log tree commit diff
path: root/fs
AgeCommit message (Collapse)Author
2023-09-15Merge branch 6.1/features/mm-soft-dirty-fileCristian Ciocaltea
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
2023-09-06nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuseRyusuke Konishi
commit cdaac8e7e5a059f9b5e816cda257f08d0abffacd upstream. A syzbot stress test using a corrupted disk image reported that mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can panic if the kernel is booted with panic_on_warn. This is because nilfs2 keeps buffer pointers in local structures for some metadata and reuses them, but such buffers may be forcibly discarded by nilfs_clear_dirty_page() in some critical situations. This issue is reported to appear after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only"), but the issue has potentially existed before. Fix this issue by checking the uptodate flag when attempting to reuse an internally held buffer, and reloading the metadata instead of reusing the buffer if the flag was lost. Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+cdfcae656bac88ba0e2d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi
commit f83913f8c5b882a312e72b7669762f8a5c9385e4 upstream. A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault. Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers(). Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio. Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0ad741797f4565e7e2d2@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06ksmbd: reduce descriptor size if remaining bytes is less than request sizeNamjae Jeon
commit e628bf939aafb61fbc56e9bdac8795cea5127e25 upstream. Create 3 kinds of files to reproduce this problem. dd if=/dev/urandom of=127k.bin bs=1024 count=127 dd if=/dev/urandom of=128k.bin bs=1024 count=128 dd if=/dev/urandom of=129k.bin bs=1024 count=129 When copying files from ksmbd share to windows or cifs.ko, The following error message happen from windows client. "The file '129k.bin' is too large for the destination filesystem." We can see the error logs from ksmbd debug prints [48394.611537] ksmbd: RDMA r/w request 0x0: token 0x669d, length 0x20000 [48394.612054] ksmbd: smb_direct: RDMA write, len 0x20000, needed credits 0x1 [48394.612572] ksmbd: filename 129k.bin, offset 131072, len 131072 [48394.614189] ksmbd: nbytes 1024, offset 132096 mincount 0 [48394.614585] ksmbd: Failed to process 8 [-22] And we can reproduce it with cifs.ko, e.g. dd if=129k.bin of=/dev/null bs=128KB count=2 This problem is that ksmbd rdma return error if remaining bytes is less than Length of Buffer Descriptor V1 Structure. smb_direct_rdma_xmit() ... if (desc_buf_len == 0 || total_length > buf_len || total_length > t->max_rdma_rw_size) return -EINVAL; This patch reduce descriptor size with remaining bytes and remove the check for total_length and buf_len. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06ksmbd: replace one-element array with flex-array member in struct smb2_ea_infoNamjae Jeon
commit 0ba5439d9afa2722e7728df56f272c89987540a4 upstream. UBSAN complains about out-of-bounds array indexes on 1-element arrays in struct smb2_ea_info. UBSAN: array-index-out-of-bounds in fs/smb/server/smb2pdu.c:4335:15 index 1 is out of range for type 'char [1]' CPU: 1 PID: 354 Comm: kworker/1:4 Not tainted 6.5.0-rc4 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] Call Trace: <TASK> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0x48/0x70 linux/lib/dump_stack.c:106 dump_stack+0x10/0x20 linux/lib/dump_stack.c:113 ubsan_epilogue linux/lib/ubsan.c:217 __ubsan_handle_out_of_bounds+0xc6/0x110 linux/lib/ubsan.c:348 smb2_get_ea linux/fs/smb/server/smb2pdu.c:4335 smb2_get_info_file linux/fs/smb/server/smb2pdu.c:4900 smb2_query_info+0x63ae/0x6b20 linux/fs/smb/server/smb2pdu.c:5275 __process_request linux/fs/smb/server/server.c:145 __handle_ksmbd_work linux/fs/smb/server/server.c:213 handle_ksmbd_work+0x348/0x10b0 linux/fs/smb/server/server.c:266 process_one_work+0x85a/0x1500 linux/kernel/workqueue.c:2597 worker_thread+0xf3/0x13a0 linux/kernel/workqueue.c:2748 kthread+0x2b7/0x390 linux/kernel/kthread.c:389 ret_from_fork+0x44/0x90 linux/arch/x86/kernel/process.c:145 ret_from_fork_asm+0x1b/0x30 linux/arch/x86/entry/entry_64.S:304 </TASK> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06ksmbd: fix slub overflow in ksmbd_decode_ntlmssp_auth_blob()Namjae Jeon
commit 4b081ce0d830b684fdf967abc3696d1261387254 upstream. If authblob->SessionKey.Length is bigger than session key size(CIFS_KEY_SIZE), slub overflow can happen in key exchange codes. cifs_arc4_crypt copy to session key array from SessionKey from client. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21940 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06ksmbd: fix wrong DataOffset validation of create contextNamjae Jeon
commit 17d5b135bb720832364e8f55f6a887a3c7ec8fdb upstream. If ->DataOffset of create context is 0, DataBuffer size is not correctly validated. This patch change wrong validation code and consider tag length in request. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21824 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06erofs: ensure that the post-EOF tails are all zeroedGao Xiang
commit e4c1cf523d820730a86cae2c6d55924833b6f7ac upstream. This was accidentally fixed up in commit e4c1cf523d82 but we can't take the full change due to other dependancy issues, so here is just the actual bugfix that is needed. [Background] keltargw reported an issue [1] that with mmaped I/Os, sometimes the tail of the last page (after file ends) is not filled with zeroes. The root cause is that such tail page could be wrongly selected for inplace I/Os so the zeroed part will then be filled with compressed data instead of zeroes. A simple fix is to avoid doing inplace I/Os for such tail parts, actually that was already fixed upstream in commit e4c1cf523d82 ("erofs: tidy up z_erofs_do_read_page()") by accident. [1] https://lore.kernel.org/r/3ad8b469-25db-a297-21f9-75db2d6ad224@linux.alibaba.com Reported-by: keltargw <keltar.gw@gmail.com> Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30nfsd: use vfs setgid helperChristian Brauner
commit 2d8ae8c417db284f598dffb178cc01e7db0f1821 upstream. We've aligned setgid behavior over multiple kernel releases. The details can be found in commit cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") and commit 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux"). Consistent setgid stripping behavior is now encapsulated in the setattr_should_drop_sgid() helper which is used by all filesystems that strip setgid bits outside of vfs proper. Usually ATTR_KILL_SGID is raised in e.g., chown_common() and is subject to the setattr_should_drop_sgid() check to determine whether the setgid bit can be retained. Since nfsd is raising ATTR_KILL_SGID unconditionally it will cause notify_change() to strip it even if the caller had the necessary privileges to retain it. Ensure that nfsd only raises ATR_KILL_SGID if the caller lacks the necessary privileges to retain the setgid bit. Without this patch the setgid stripping tests in LTP will fail: > As you can see, the problem is S_ISGID (0002000) was dropped on a > non-group-executable file while chown was invoked by super-user, while [...] > fchown02.c:66: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700 [...] > chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700 With this patch all tests pass. Reported-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Harshit: backport to 6.1.y: Use init_user_ns instead of nop_mnt_idmap as we don't have commit abf08576afe3 ("fs: port vfs_*() helpers to struct mnt_idmap")] Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30nfs: use vfs setgid helperChristian Brauner
commit 4f704d9a8352f5c0a8fcdb6213b934630342bd44 upstream. We've aligned setgid behavior over multiple kernel releases. The details can be found in the following two merge messages: cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2') 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0') Consistent setgid stripping behavior is now encapsulated in the setattr_should_drop_sgid() helper which is used by all filesystems that strip setgid bits outside of vfs proper. Switch nfs to rely on this helper as well. Without this patch the setgid stripping tests in xfstests will fail. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230313-fs-nfs-setgid-v2-1-9a59f436cfc0@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> [ Harshit: backport to 6.1.y: fs/internal.h -- minor conflict due to code change differences. include/linux/fs.h -- Used struct user_namespace *mnt_userns instead of struct mnt_idmap *idmap fs/nfs/inode.c -- Used init_user_ns instead of nop_mnt_idmap ] Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30nfsd: Fix race to FREE_STATEID and cl_revokedBenjamin Coddington
commit 3b816601e279756e781e6c4d9b3f3bd21a72ac67 upstream. We have some reports of linux NFS clients that cannot satisfy a linux knfsd server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though those clients repeatedly walk all their known state using TEST_STATEID and receive NFS4_OK for all. Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then nfsd4_free_stateid() finds the delegation and returns NFS4_OK to FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to cl_revoked. This would produce the observed client/server effect. Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID and move to cl_revoked happens within the same cl_lock. This will allow nfsd4_free_stateid() to properly remove the delegation from cl_revoked. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575 Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.17+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30NFS: Fix a use after free in nfs_direct_join_group()Trond Myklebust
commit be2fd1560eb57b7298aa3c258ddcca0d53ecdea3 upstream. Be more careful when tearing down the subrequests of an O_DIRECT write as part of a retransmission. Reported-by: Chris Mason <clm@fb.com> Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30NFSv4: Fix dropped lock for racing OPEN and delegation returnBenjamin Coddington
commit 1cbc11aaa01f80577b67ae02c73ee781112125fd upstream. Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. This patch expects that commit to be reverted. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". Fix this by using the old_stateid refresh helpers if the server replies with OLD_STATEID. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30jbd2: fix a race when checking checkpoint buffer busyZhang Yi
[ Upstream commit 46f881b5b1758dc4a35fba4a643c10717d0cf427 ] Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem. jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30jbd2: remove journal_clean_one_cp_list()Zhang Yi
[ Upstream commit b98dba273a0e47dbfade89c9af73c5b012a4eabb ] journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost the same, so merge them into journal_shrink_one_cp_list(), remove the nr_to_scan parameter, always scan and try to free the whole checkpoint list. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30jbd2: remove t_checkpoint_io_listZhang Yi
[ Upstream commit be22255360f80d3af789daad00025171a65424a5 ] Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint() now, it's time to remove the whole t_checkpoint_io_list logic. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30NFSv4: fix out path in __nfs4_get_acl_uncachedFedor Pchelkin
[ Upstream commit f4e89f1a6dab4c063fc1e823cc9dddc408ff40cf ] Another highly rare error case when a page allocating loop (inside __nfs4_get_acl_uncached, this time) is not properly unwound on error. Since pages array is allocated being uninitialized, need to free only lower array indices. NULL checks were useful before commit 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had been initialized to zero on stack. Found by Linux Verification Center (linuxtesting.org). Fixes: 62a1573fcf84 ("NFSv4 fix acl retrieval over krb5i/krb5p mounts") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30NFSv4.2: fix error handling in nfs42_proc_getxattrFedor Pchelkin
[ Upstream commit 4e3733fd2b0f677faae21cf838a43faf317986d3 ] There is a slight issue with error handling code inside nfs42_proc_getxattr(). If page allocating loop fails then we free the failing page array element which is NULL but __free_page() can't deal with NULL args. Found by Linux Verification Center (linuxtesting.org). Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-27Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"Greg Kroah-Hartman
This reverts commit a78a8bcdc26de5ef3a0ee27c9c6c512e54a6051c which is commit a6ec83786ab9f13f25fb18166dee908845713a95 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "f2fs: fix to set flush_merge opt and show noflush_merge"Greg Kroah-Hartman
This reverts commit 6ba0594a81f91d6fd8ca9bd4ad23aa1618635a0f which is commit 967eaad1fed5f6335ea97a47d45214744dc57925 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Yangtao Li <frank.li@vivo.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-27Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"Greg Kroah-Hartman
This reverts commit e2fb24ce37caeaecff08af4e9967c8462624312b which is commit 458c15dfbce62c35fefd9ca637b20a051309c9f1 upstream. Something is currently broken in the f2fs code, Guenter has reported boot problems with it for a few releases now, so revert the most recent f2fs changes in the hope to get this back to a working filesystem. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/b392e1a8-b987-4993-bd45-035db9415a6e@roeck-us.net Cc: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23cifs: Release folio lock on fscache read hit.Russell Harmon via samba-technical
commit 69513dd669e243928f7450893190915a88f84a2b upstream. Under the current code, when cifs_readpage_worker is called, the call contract is that the callee should unlock the page. This is documented in the read_folio section of Documentation/filesystems/vfs.rst as: > The filesystem should unlock the folio once the read has completed, > whether it was successful or not. Without this change, when fscache is in use and cache hit occurs during a read, the page lock is leaked, producing the following stack on subsequent reads (via mmap) to the page: $ cat /proc/3890/task/12864/stack [<0>] folio_wait_bit_common+0x124/0x350 [<0>] filemap_read_folio+0xad/0xf0 [<0>] filemap_fault+0x8b1/0xab0 [<0>] __do_fault+0x39/0x150 [<0>] do_fault+0x25c/0x3e0 [<0>] __handle_mm_fault+0x6ca/0xc70 [<0>] handle_mm_fault+0xe9/0x350 [<0>] do_user_addr_fault+0x225/0x6c0 [<0>] exc_page_fault+0x84/0x1b0 [<0>] asm_exc_page_fault+0x27/0x30 This requires a reboot to resolve; it is a deadlock. Note however that the call to cifs_readpage_from_fscache does mark the page clean, but does not free the folio lock. This happens in __cifs_readpage_from_fscache on success. Releasing the lock at that point however is not appropriate as cifs_readahead also calls cifs_readpage_from_fscache and *does* unconditionally release the lock after its return. This change therefore effectively makes cifs_readpage_worker work like cifs_readahead. Signed-off-by: Russell Harmon <russ@har.mn> Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23btrfs: fix BUG_ON condition in btrfs_cancel_balancexiaoshoukui
commit 29eefa6d0d07e185f7bfe9576f91e6dba98189c2 upstream. Pausing and canceling balance can race to interrupt balance lead to BUG_ON panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance does not take this race scenario into account. However, the race condition has no other side effects. We can fix that. Reproducing it with panic trace like this: kernel BUG at fs/btrfs/volumes.c:4618! RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0 Call Trace: <TASK> ? do_nanosleep+0x60/0x120 ? hrtimer_nanosleep+0xb7/0x1a0 ? sched_core_clone_cookie+0x70/0x70 btrfs_ioctl_balance_ctl+0x55/0x70 btrfs_ioctl+0xa46/0xd20 __x64_sys_ioctl+0x7d/0xa0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Race scenario as follows: > mutex_unlock(&fs_info->balance_mutex); > -------------------- > .......issue pause and cancel req in another thread > -------------------- > ret = __btrfs_balance(fs_info); > > mutex_lock(&fs_info->balance_mutex); > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) { > btrfs_info(fs_info, "balance: paused"); > btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED); > } CC: stable@vger.kernel.org # 4.19+ Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23btrfs: fix incorrect splitting in btrfs_drop_extent_map_rangeJosef Bacik
commit c962098ca4af146f2625ed64399926a098752c9c upstream. In production we were seeing a variety of WARN_ON()'s in the extent_map code, specifically in btrfs_drop_extent_map_range() when we have to call add_extent_mapping() for our second split. Consider the following extent map layout PINNED [0 16K) [32K, 48K) and then we call btrfs_drop_extent_map_range for [0, 36K), with skip_pinned == true. The initial loop will have start = 0 end = 36K len = 36K we will find the [0, 16k) extent, but since we are pinned we will skip it, which has this code start = em_end; if (end != (u64)-1) len = start + len - em_end; em_end here is 16K, so now the values are start = 16K len = 16K + 36K - 16K = 36K len should instead be 20K. This is a problem when we find the next extent at [32K, 48K), we need to split this extent to leave [36K, 48k), however the code for the split looks like this split->start = start + len; split->len = em_end - (start + len); In this case we have em_end = 48K split->start = 16K + 36K // this should be 16K + 20K split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K and now we have an invalid extent_map in the tree that potentially overlaps other entries in the extent map. Even in the non-overlapping case we will have split->start set improperly, which will cause problems with any block related calculations. We don't actually need len in this loop, we can simply use end as our end point, and only adjust start up when we find a pinned extent we need to skip. Adjust the logic to do this, which keeps us from inserting an invalid extent map. We only skip_pinned in the relocation case, so this is relatively rare, except in the case where you are running relocation a lot, which can happen with auto relocation on. Fixes: 55ef68990029 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23cifs: fix potential oops in cifs_oplock_breakSteve French
[ Upstream commit e8f5f849ffce24490eb9449e98312b66c0dba76f ] With deferred close we can have closes that race with lease breaks, and so with the current checks for whether to send the lease response, oplock_response(), this can mean that an unmount (kill_sb) can occur just before we were checking if the tcon->ses is valid. See below: [Fri Aug 4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs] [Fri Aug 4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39 [Fri Aug 4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206 [Fri Aug 4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009 [Fri Aug 4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188 [Fri Aug 4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900 [Fri Aug 4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138 [Fri Aug 4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000 [Fri Aug 4 04:12:50 2023] FS: 0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000 [Fri Aug 4 04:12:50 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Aug 4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0 [Fri Aug 4 04:12:50 2023] Call Trace: [Fri Aug 4 04:12:50 2023] <TASK> [Fri Aug 4 04:12:50 2023] process_one_work+0x225/0x3d0 [Fri Aug 4 04:12:50 2023] worker_thread+0x4d/0x3e0 [Fri Aug 4 04:12:50 2023] ? process_one_work+0x3d0/0x3d0 [Fri Aug 4 04:12:50 2023] kthread+0x12a/0x150 [Fri Aug 4 04:12:50 2023] ? set_kthread_struct+0x50/0x50 [Fri Aug 4 04:12:50 2023] ret_from_fork+0x22/0x30 [Fri Aug 4 04:12:50 2023] </TASK> To fix this change the ordering of the checks before sending the oplock_response to first check if the openFileList is empty. Fixes: da787d5b7498 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed") Suggested-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23btrfs: fix use-after-free of new block group that became unusedFilipe Manana
[ Upstream commit 0657b20c5a76c938612f8409735a8830d257866e ] If a task creates a new block group and that block group becomes unused before we finish its creation, at btrfs_create_pending_block_groups(), then when btrfs_mark_bg_unused() is called against the block group, we assume that the block group is currently in the list of block groups to reclaim, and we move it out of the list of new block groups and into the list of unused block groups. This has two consequences: 1) We move it out of the list of new block groups associated to the current transaction. So the block group creation is not finished and if we attempt to delete the bg because it's unused, we will not find the block group item in the extent tree (or the new block group tree), its device extent items in the device tree etc, resulting in the deletion to fail due to the missing items; 2) We don't increment the reference count on the block group when we move it to the list of unused block groups, because we assumed the block group was on the list of block groups to reclaim, and in that case it already has the correct reference count. However the block group was on the list of new block groups, in which case no extra reference was taken because it's local to the current task. This later results in doing an extra reference count decrement when removing the block group from the unused list, eventually leading the reference count to 0. This second case was caught when running generic/297 from fstests, which produced the following assertion failure and stack trace: [589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299 [589.559] ------------[ cut here ]------------ [589.559] kernel BUG at fs/btrfs/block-group.c:4299! [589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI [589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.561] Code: 68 62 da c0 (...) [589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246 [589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000 [589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff [589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50 [589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00 [589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100 [589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000 [589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0 [589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [589.564] Call Trace: [589.564] <TASK> [589.565] ? __die_body+0x1b/0x60 [589.565] ? die+0x39/0x60 [589.565] ? do_trap+0xeb/0x110 [589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.566] ? do_error_trap+0x6a/0x90 [589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.566] ? exc_invalid_op+0x4e/0x70 [589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.567] ? asm_exc_invalid_op+0x16/0x20 [589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs] [589.567] close_ctree+0x35d/0x560 [btrfs] [589.568] ? fsnotify_sb_delete+0x13e/0x1d0 [589.568] ? dispose_list+0x3a/0x50 [589.568] ? evict_inodes+0x151/0x1a0 [589.568] generic_shutdown_super+0x73/0x1a0 [589.569] kill_anon_super+0x14/0x30 [589.569] btrfs_kill_super+0x12/0x20 [btrfs] [589.569] deactivate_locked_super+0x2e/0x70 [589.569] cleanup_mnt+0x104/0x160 [589.570] task_work_run+0x56/0x90 [589.570] exit_to_user_mode_prepare+0x160/0x170 [589.570] syscall_exit_to_user_mode+0x22/0x50 [589.570] ? __x64_sys_umount+0x12/0x20 [589.571] do_syscall_64+0x48/0x90 [589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc [589.571] RIP: 0033:0x7f497ff0a567 [589.571] Code: af 98 0e (...) [589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567 [589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0 [589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100 [589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0 [589.573] </TASK> [589.574] Modules linked in: dm_snapshot dm_thin_pool (...) [589.576] ---[ end trace 0000000000000000 ]--- Fix this by adding a runtime flag to the block group to tell that the block group is still in the list of new block groups, and therefore it should not be moved to the list of unused block groups, at btrfs_mark_bg_unused(), until the flag is cleared, when we finish the creation of the block group at btrfs_create_pending_block_groups(). Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23btrfs: convert btrfs_block_group::seq_zone to runtime flagDavid Sterba
[ Upstream commit 961f5b8bf48a463ac5fe5b13143426d79eb41817 ] In zoned mode the sequential status of zone can be also tracked in the runtime flags of block group. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23btrfs: convert btrfs_block_group::needs_free_space to runtime flagDavid Sterba
[ Upstream commit 0d7764ff58b4b45c39eb03f2c74a819c1a88fa7b ] We already have flags in block group to track various status bits, convert needs_free_space as well and reduce size of btrfs_block_group. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23btrfs: move out now unused BG from the reclaim listNaohiro Aota
[ Upstream commit a9f189716cf15913c453299d72f69c51a9b0f86b ] An unused block group is easy to remove to free up space and should be reclaimed fast. Such block group can often already be a target of the reclaim process. As we check list_empty(&bg->bg_list), we keep it in the reclaim list. That block group is never reclaimed until the file system is filled e.g. up to 75%. Instead, we can move unused block group to the unused list and delete it fast. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23fs/ntfs3: Mark ntfs dirty when on-disk struct is corruptedKonstantin Komarov
[ Upstream commit e0f363a98830e8d7d70fbaf91c07ae0b7c57aafe ] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23fs: ntfs3: Fix possible null-pointer dereferences in mi_read()Jia-Ju Bai
[ Upstream commit 97498cd610c0d030a7bd49a7efad974790661162 ] In a previous commit 2681631c2973 ("fs/ntfs3: Add null pointer check to attr_load_runs_vcn"), ni can be NULL in attr_load_runs_vcn(), and thus it should be checked before being used. However, in the call stack of this commit, mft_ni in mi_read() is aliased with ni in attr_load_runs_vcn(), and it is also used in mi_read() at two places: mi_read() rw_lock = &mft_ni->file.run_lock -> No check attr_load_runs_vcn(mft_ni, ...) ni (namely mft_ni) is checked in the previous commit attr_load_runs_vcn(..., &mft_ni->file.run) -> No check Thus, to avoid possible null-pointer dereferences, the related checks should be added. These bugs are reported by a static analysis tool implemented by myself, and they are found by extending a known bug fixed in the previous commit. Thus, they could be theoretical bugs. Signed-off-by: Jia-Ju Bai <baijiaju@buaa.edu.cn> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23fs/ntfs3: Enhance sanity check while generating attr_listEdward Lo
[ Upstream commit fdec309c7672cbee4dc0229ee4cbb33c948a1bdd ] ni_create_attr_list uses WARN_ON to catch error cases while generating attribute list, which only prints out stack trace and may not be enough. This repalces them with more proper error handling flow. [ 59.666332] BUG: kernel NULL pointer dereference, address: 000000000000000e [ 59.673268] #PF: supervisor read access in kernel mode [ 59.678354] #PF: error_code(0x0000) - not-present page [ 59.682831] PGD 8000000005ff1067 P4D 8000000005ff1067 PUD 7dee067 PMD 0 [ 59.688556] Oops: 0000 [#1] PREEMPT SMP KASAN PTI [ 59.692642] CPU: 0 PID: 198 Comm: poc Tainted: G B W 6.2.0-rc1+ #4 [ 59.698868] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 59.708795] RIP: 0010:ni_create_attr_list+0x505/0x860 [ 59.713657] Code: 7e 10 e8 5e d0 d0 ff 45 0f b7 76 10 48 8d 7b 16 e8 00 d1 d0 ff 66 44 89 73 16 4d 8d 75 0e 4c 89 f7 e8 3f d0 d0 ff 4c 8d8 [ 59.731559] RSP: 0018:ffff88800a56f1e0 EFLAGS: 00010282 [ 59.735691] RAX: 0000000000000001 RBX: ffff88800b7b5088 RCX: ffffffffb83079fe [ 59.741792] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffbb7f9fc0 [ 59.748423] RBP: ffff88800a56f3a8 R08: ffff88800b7b50a0 R09: fffffbfff76ff3f9 [ 59.754654] R10: ffffffffbb7f9fc7 R11: fffffbfff76ff3f8 R12: ffff88800b756180 [ 59.761552] R13: 0000000000000000 R14: 000000000000000e R15: 0000000000000050 [ 59.768323] FS: 00007feaa8c96440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000 [ 59.776027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.781395] CR2: 00007f3a2e0b1000 CR3: 000000000a5bc000 CR4: 00000000000006f0 [ 59.787607] Call Trace: [ 59.790271] <TASK> [ 59.792488] ? __pfx_ni_create_attr_list+0x10/0x10 [ 59.797235] ? kernel_text_address+0xd3/0xe0 [ 59.800856] ? unwind_get_return_address+0x3e/0x60 [ 59.805101] ? __kasan_check_write+0x18/0x20 [ 59.809296] ? preempt_count_sub+0x1c/0xd0 [ 59.813421] ni_ins_attr_ext+0x52c/0x5c0 [ 59.817034] ? __pfx_ni_ins_attr_ext+0x10/0x10 [ 59.821926] ? __vfs_setxattr+0x121/0x170 [ 59.825718] ? __vfs_setxattr_noperm+0x97/0x300 [ 59.829562] ? __vfs_setxattr_locked+0x145/0x170 [ 59.833987] ? vfs_setxattr+0x137/0x2a0 [ 59.836732] ? do_setxattr+0xce/0x150 [ 59.839807] ? setxattr+0x126/0x140 [ 59.842353] ? path_setxattr+0x164/0x180 [ 59.845275] ? __x64_sys_setxattr+0x71/0x90 [ 59.848838] ? do_syscall_64+0x3f/0x90 [ 59.851898] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 59.857046] ? stack_depot_save+0x17/0x20 [ 59.860299] ni_insert_attr+0x1ba/0x420 [ 59.863104] ? __pfx_ni_insert_attr+0x10/0x10 [ 59.867069] ? preempt_count_sub+0x1c/0xd0 [ 59.869897] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 59.874088] ? __create_object+0x3ae/0x5d0 [ 59.877865] ni_insert_resident+0xc4/0x1c0 [ 59.881430] ? __pfx_ni_insert_resident+0x10/0x10 [ 59.886355] ? kasan_save_alloc_info+0x1f/0x30 [ 59.891117] ? __kasan_kmalloc+0x8b/0xa0 [ 59.894383] ntfs_set_ea+0x90d/0xbf0 [ 59.897703] ? __pfx_ntfs_set_ea+0x10/0x10 [ 59.901011] ? kernel_text_address+0xd3/0xe0 [ 59.905308] ? __kernel_text_address+0x16/0x50 [ 59.909811] ? unwind_get_return_address+0x3e/0x60 [ 59.914898] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 59.920250] ? arch_stack_walk+0xa2/0x100 [ 59.924560] ? filter_irq_stacks+0x27/0x80 [ 59.928722] ntfs_setxattr+0x405/0x440 [ 59.932512] ? __pfx_ntfs_setxattr+0x10/0x10 [ 59.936634] ? kvmalloc_node+0x2d/0x120 [ 59.940378] ? kasan_save_stack+0x41/0x60 [ 59.943870] ? kasan_save_stack+0x2a/0x60 [ 59.947719] ? kasan_set_track+0x29/0x40 [ 59.951417] ? kasan_save_alloc_info+0x1f/0x30 [ 59.955733] ? __kasan_kmalloc+0x8b/0xa0 [ 59.959598] ? __kmalloc_node+0x68/0x150 [ 59.963163] ? kvmalloc_node+0x2d/0x120 [ 59.966490] ? vmemdup_user+0x2b/0xa0 [ 59.969060] __vfs_setxattr+0x121/0x170 [ 59.972456] ? __pfx___vfs_setxattr+0x10/0x10 [ 59.976008] __vfs_setxattr_noperm+0x97/0x300 [ 59.981562] __vfs_setxattr_locked+0x145/0x170 [ 59.986100] vfs_setxattr+0x137/0x2a0 [ 59.989964] ? __pfx_vfs_setxattr+0x10/0x10 [ 59.993616] ? __kasan_check_write+0x18/0x20 [ 59.997425] do_setxattr+0xce/0x150 [ 60.000304] setxattr+0x126/0x140 [ 60.002967] ? __pfx_setxattr+0x10/0x10 [ 60.006471] ? __virt_addr_valid+0xcb/0x140 [ 60.010461] ? __call_rcu_common.constprop.0+0x1c7/0x330 [ 60.016037] ? debug_smp_processor_id+0x1b/0x30 [ 60.021008] ? kasan_quarantine_put+0x5b/0x190 [ 60.025545] ? putname+0x84/0xa0 [ 60.027910] ? __kasan_slab_free+0x11e/0x1b0 [ 60.031483] ? putname+0x84/0xa0 [ 60.033986] ? preempt_count_sub+0x1c/0xd0 [ 60.036876] ? __mnt_want_write+0xae/0x100 [ 60.040738] ? mnt_want_write+0x8f/0x150 [ 60.044317] path_setxattr+0x164/0x180 [ 60.048096] ? __pfx_path_setxattr+0x10/0x10 [ 60.052096] ? strncpy_from_user+0x175/0x1c0 [ 60.056482] ? debug_smp_processor_id+0x1b/0x30 [ 60.059848] ? fpregs_assert_state_consistent+0x6b/0x80 [ 60.064557] __x64_sys_setxattr+0x71/0x90 [ 60.068892] do_syscall_64+0x3f/0x90 [ 60.072868] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 60.077523] RIP: 0033:0x7feaa86e4469 [ 60.080915] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 60.097353] RSP: 002b:00007ffdbd8311e8 EFLAGS: 00000286 ORIG_RAX: 00000000000000bc [ 60.103386] RAX: ffffffffffffffda RBX: 9461c5e290baac00 RCX: 00007feaa86e4469 [ 60.110322] RDX: 00007ffdbd831fe0 RSI: 00007ffdbd831305 RDI: 00007ffdbd831263 [ 60.116808] RBP: 00007ffdbd836180 R08: 0000000000000001 R09: 00007ffdbd836268 [ 60.123879] R10: 000000000000007d R11: 0000000000000286 R12: 0000000000400500 [ 60.130540] R13: 00007ffdbd836260 R14: 0000000000000000 R15: 0000000000000000 [ 60.136553] </TASK> [ 60.138818] Modules linked in: [ 60.141839] CR2: 000000000000000e [ 60.144831] ---[ end trace 0000000000000000 ]--- [ 60.149058] RIP: 0010:ni_create_attr_list+0x505/0x860 [ 60.153975] Code: 7e 10 e8 5e d0 d0 ff 45 0f b7 76 10 48 8d 7b 16 e8 00 d1 d0 ff 66 44 89 73 16 4d 8d 75 0e 4c 89 f7 e8 3f d0 d0 ff 4c 8d8 [ 60.172443] RSP: 0018:ffff88800a56f1e0 EFLAGS: 00010282 [ 60.176246] RAX: 0000000000000001 RBX: ffff88800b7b5088 RCX: ffffffffb83079fe [ 60.182752] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffbb7f9fc0 [ 60.189949] RBP: ffff88800a56f3a8 R08: ffff88800b7b50a0 R09: fffffbfff76ff3f9 [ 60.196950] R10: ffffffffbb7f9fc7 R11: fffffbfff76ff3f8 R12: ffff88800b756180 [ 60.203671] R13: 0000000000000000 R14: 000000000000000e R15: 0000000000000050 [ 60.209595] FS: 00007feaa8c96440(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000 [ 60.216299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.222276] CR2: 00007f3a2e0b1000 CR3: 000000000a5bc000 CR4: 00000000000006f0 Signed-off-by: Edward Lo <loyuantsung@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23ceph: try to dump the msgs when decoding failsXiubo Li
[ Upstream commit 8b0da5c549ae63ba1debd92a350f90773cb4bfe7 ] When the msgs are corrupted we need to dump them and then it will be easier to dig what has happened and where the issue is. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23gfs2: Fix possible data races in gfs2_show_options()Tuo Li
[ Upstream commit 6fa0a72cbbe45db4ed967a51f9e6f4e3afe61d20 ] Some fields such as gt_logd_secs of the struct gfs2_tune are accessed without holding the lock gt_spin in gfs2_show_options(): val = sdp->sd_tune.gt_logd_secs; if (val != 30) seq_printf(s, ",commit=%d", val); And thus can cause data races when gfs2_show_options() and other functions such as gfs2_reconfigure() are concurrently executed: spin_lock(&gt->gt_spin); gt->gt_logd_secs = newargs->ar_commit; To fix these possible data races, the lock sdp->sd_tune.gt_spin is acquired before accessing the fields of gfs2_tune and released after these accesses. Further changes by Andreas: - Don't hold the spin lock over the seq_printf operations. Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23cifs: fix session state check in reconnect to avoid use-after-free issueWinston Wen
[ Upstream commit 99f280700b4cc02d5f141b8d15f8e9fad0418f65 ] Don't collect exiting session in smb2_reconnect_server(), because it will be released soon. Note that the exiting session will stay in server->smb_ses_list until it complete the cifs_free_ipc() and logoff() and then delete itself from the list. Signed-off-by: Winston Wen <wentao@uniontech.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23smb: client: fix warning in cifs_smb3_do_mount()Paulo Alcantara
[ Upstream commit 12c30f33cc6769bf411088a2872843c4f9ea32f9 ] This fixes the following warning reported by kernel test robot fs/smb/client/cifsfs.c:982 cifs_smb3_do_mount() warn: possible memory leak of 'cifs_sb' Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-16btrfs: set cache_block_group_error if we find an errorJosef Bacik
commit 92fb94b69c6accf1e49fff699640fa0ce03dc910 upstream. We set cache_block_group_error if btrfs_cache_block_group() returns an error, this is because we could end up not finding space to allocate and mistakenly return -ENOSPC, and which could then abort the transaction with the incorrect errno, and in the case of ENOSPC result in a WARN_ON() that will trip up tests like generic/475. However there's the case where multiple threads can be racing, one thread gets the proper error, and the other thread doesn't actually call btrfs_cache_block_group(), it instead sees ->cached == BTRFS_CACHE_ERROR. Again the result is the same, we fail to allocate our space and return -ENOSPC. Instead we need to set cache_block_group_error to -EIO in this case to make sure that if we do not make our allocation we get the appropriate error returned back to the caller. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16btrfs: reject invalid reloc tree root keys with stack dumpQu Wenruo
commit 6ebcd021c92b8e4b904552e4d87283032100796d upstream. [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). That ASSERT() makes sure the reloc tree is properly pointed back by its subvolume tree. [CAUSE] After more debugging output, it turns out we had an invalid reloc tree: BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM, QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree. But reloc trees can only exist for subvolumes, as for non-subvolume trees, we just COW the involved tree block, no need to create a reloc tree since those tree blocks won't be shared with other trees. Only subvolumes tree can share tree blocks with other trees (thus they have BTRFS_ROOT_SHAREABLE flag). Thus this new debug output proves my previous assumption that corrupted on-disk data can trigger that ASSERT(). [FIX] Besides the dedicated fix and the graceful exit, also let tree-checker to check such root keys, to make sure reloc trees can only exist for subvolumes. CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16btrfs: exit gracefully if reloc roots don't matchQu Wenruo
commit 05d7ce504545f7874529701664c90814ca645c5d upstream. [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). [CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation. This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers. [ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug. Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside merge_reloc_roots() later. Also replace those ASSERT(0)s with WARN_ON()s. CC: stable@vger.kernel.org # 5.15+ Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16btrfs: properly clear end of the unreserved range in cow_file_rangeChristoph Hellwig
commit 12b2d64e591652a2d97dd3afa2b062ca7a4ba352 upstream. When the call to btrfs_reloc_clone_csums in cow_file_range returns an error, we jump to the out_unlock label with the extent_reserved variable set to false. The cleanup at the label will then call extent_clear_unlock_delalloc on the range from start to end. But we've already added cur_alloc_size to start before the jump, so there might no range be left from the newly incremented start to end. Move the check for 'start < end' so that it is reached by also for the !extent_reserved case. CC: stable@vger.kernel.org # 6.1+ Fixes: a315e68f6e8b ("Btrfs: fix invalid attempt to free reserved space on failure to cow range") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16btrfs: don't stop integrity writeback too earlyChristoph Hellwig
commit effa24f689ce0948f68c754991a445a8d697d3a8 upstream. extent_write_cache_pages stops writing pages as soon as nr_to_write hits zero. That is the right thing for opportunistic writeback, but incorrect for data integrity writeback, which needs to ensure that no dirty pages are left in the range. Thus only stop the writeback for WB_SYNC_NONE if nr_to_write hits 0. This is a port of write_cache_pages changes in commit 05fe478dd04e ("mm: write_cache_pages integrity fix"). Note that I've only trigger the problem with other changes to the btrfs writeback code, but this condition seems worthwhile fixing anyway. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ updated comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16btrfs: wait for actual caching progress during allocationJosef Bacik
commit fc1f91b9231a28fba333f931a031bf776bc6ef0e upstream. Recently we've been having mysterious hangs while running generic/475 on the CI system. This turned out to be something like this: Task 1 dmsetup suspend --nolockfs -> __dm_suspend -> dm_wait_for_completion -> dm_wait_for_bios_completion -> Unable to complete because of IO's on a plug in Task 2 Task 2 wb_workfn -> wb_writeback -> blk_start_plug -> writeback_sb_inodes -> Infinite loop unable to make an allocation Task 3 cache_block_group ->read_extent_buffer_pages ->Waiting for IO to complete that can't be submitted because Task 1 suspended the DM device The problem here is that we need Task 2 to be scheduled completely for the blk plug to flush. Normally this would happen, we normally wait for the block group caching to finish (Task 3), and this schedule would result in the block plug flushing. However if there's enough free space available from the current caching to satisfy the allocation we won't actually wait for the caching to complete. This check however just checks that we have enough space, not that we can make the allocation. In this particular case we were trying to allocate 9MiB, and we had 10MiB of free space, but we didn't have 9MiB of contiguous space to allocate, and thus the allocation failed and we looped. We specifically don't cycle through the FFE loop until we stop finding cached block groups because we don't want to allocate new block groups just because we're caching, so we short circuit the normal loop once we hit LOOP_CACHING_WAIT and we found a caching block group. This is normally fine, except in this particular case where the caching thread can't make progress because the DM device has been suspended. Fix this by not only waiting for free space to >= the amount of space we want to allocate, but also that we make some progress in caching from the time we start waiting. This will keep us from busy looping when the caching is taking a while but still theoretically has enough space for us to allocate from, and fixes this particular case by forcing us to actually sleep and wait for forward progress, which will flush the plug. With this fix we're no longer hanging with generic/475. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iputRyusuke Konishi
commit f8654743a0e6909dc634cbfad6db6816f10f3399 upstream. During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously, nilfs_evict_inode() could cause use-after-free read for nilfs_root if inodes are left in "garbage_list" and released by nilfs_dispose_list at the end of nilfs_detach_log_writer(), and this bug was fixed by commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"). However, it turned out that there is another possibility of UAF in the call path where mark_inode_dirty_sync() is called from iput(): nilfs_detach_log_writer() nilfs_dispose_list() iput() mark_inode_dirty_sync() __mark_inode_dirty() nilfs_dirty_inode() __nilfs_mark_inode_dirty() nilfs_load_inode_block() --> causes UAF of nilfs_root struct This can happen after commit 0ae45f63d4ef ("vfs: add support for a lazytime mount option"), which changed iput() to call mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME flag and i_nlink is non-zero. This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only") when using the syzbot reproducer, but the issue has potentially existed before. Fix this issue by adding a "purging flag" to the nilfs structure, setting that flag while disposing the "garbage_list" and checking it in __nilfs_mark_inode_dirty(). Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"), this patch does not rely on ns_writer to determine whether to skip operations, so as not to break recovery on mount. The nilfs_salvage_orphan_logs routine dirties the buffer of salvaged data before attaching the log writer, so changing __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL will cause recovery write to fail. The purpose of using the cleanup-only flag is to allow for narrowing of such conditions. Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 4.0+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()Namjae Jeon
commit 79ed288cef201f1f212dfb934bcaac75572fb8f6 upstream. There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of current smb2_ea_info. ksmbd need to validate buffer length Before accessing the next ea. ksmbd should check buffer length using buf_len, not next variable. next is the start offset of current ea that got from previous ea. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16ksmbd: validate command request sizeLong Li
commit 5aa4fda5aa9c2a5a7bac67b4a12b089ab81fee3c upstream. In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except for SMB2_OPLOCK_BREAK_HE command, the request size of other commands is not checked, it's not expected. Fix it by add check for request size of other commands. Cc: stable@vger.kernel.org Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Long Li <leo.lilong@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11exfat: check if filename entries exceeds max filename lengthNamjae Jeon
[ Upstream commit d42334578eba1390859012ebb91e1e556d51db49 ] exfat_extract_uni_name copies characters from a given file name entry into the 'uniname' variable. This variable is actually defined on the stack of the exfat_readdir() function. According to the definition of the 'exfat_uni_name' type, the file name should be limited 255 characters (+ null teminator space), but the exfat_get_uniname_from_ext_entry() function can write more characters because there is no check if filename entries exceeds max filename length. This patch add the check not to copy filename characters when exceeding max filename length. Cc: stable@vger.kernel.org Cc: Yuezhang Mo <Yuezhang.Mo@sony.com> Reported-by: Maxim Suhanov <dfirblog@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11f2fs: don't reset unchangable mount option in f2fs_remount()Chao Yu
[ Upstream commit 458c15dfbce62c35fefd9ca637b20a051309c9f1 ] syzbot reports a bug as below: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942 Call Trace: lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline] _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838 vfs_fallocate+0x54b/0x6b0 fs/open.c:324 ksys_fallocate fs/open.c:347 [inline] __do_sys_fallocate fs/open.c:355 [inline] __se_sys_fallocate fs/open.c:353 [inline] __x64_sys_fallocate+0xbd/0x100 fs/open.c:353 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is race condition as below: - since it tries to remount rw filesystem, so that do_remount won't call sb_prepare_remount_readonly to block fallocate, there may be race condition in between remount and fallocate. - in f2fs_remount(), default_options() will reset mount option to default one, and then update it based on result of parse_options(), so there is a hole which race condition can happen. Thread A Thread B - f2fs_fill_super - parse_options - clear_opt(READ_EXTENT_CACHE) - f2fs_remount - default_options - set_opt(READ_EXTENT_CACHE) - f2fs_fallocate - f2fs_insert_range - f2fs_drop_extent_tree - __drop_extent_tree - __may_extent_tree - test_opt(READ_EXTENT_CACHE) return true - write_lock(&et->lock) access NULL pointer - parse_options - clear_opt(READ_EXTENT_CACHE) Cc: <stable@vger.kernel.org> Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11f2fs: fix to set flush_merge opt and show noflush_mergeYangtao Li
[ Upstream commit 967eaad1fed5f6335ea97a47d45214744dc57925 ] Some minor modifications to flush_merge and related parameters: 1.The FLUSH_MERGE opt is set by default only in non-ro mode. 2.When ro and merge are set at the same time, an error is reported. 3.Display noflush_merge mount opt. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: 458c15dfbce6 ("f2fs: don't reset unchangable mount option in f2fs_remount()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11f2fs: fix to do sanity check on direct node in truncate_dnode()Chao Yu
commit a6ec83786ab9f13f25fb18166dee908845713a95 upstream. syzbot reports below bug: BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000 CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351 print_report mm/kasan/report.c:462 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:572 f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574 truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944 f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154 f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721 f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749 f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799 f2fs_truncate include/linux/fs.h:825 [inline] f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006 notify_change+0xb2c/0x1180 fs/attr.c:483 do_truncate+0x143/0x200 fs/open.c:66 handle_truncate fs/namei.c:3295 [inline] do_open fs/namei.c:3640 [inline] path_openat+0x2083/0x2750 fs/namei.c:3791 do_filp_open+0x1ba/0x410 fs/namei.c:3818 do_sys_openat2+0x16d/0x4c0 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_creat fs/open.c:1448 [inline] __se_sys_creat fs/open.c:1442 [inline] __x64_sys_creat+0xcd/0x120 fs/open.c:1442 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is, inodeA references inodeB via inodeB's ino, once inodeA is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's node page, it traverse mapping data from node->i.i_addr[0] to node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access. This patch fixes to add sanity check on dnode page in truncate_dnode(), so that, it can help to avoid triggering such issue, and once it encounters such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE error into superblock, later fsck can detect such issue and try repairing. Also, it removes f2fs_truncate_data_blocks() for cleanup due to the function has only one caller, and uses f2fs_truncate_data_blocks_range() instead. Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11btrfs: remove BUG_ON()'s in add_new_free_space()Filipe Manana
commit d8ccbd21918fd7fa6ce3226cffc22c444228e8ad upstream. At add_new_free_space() we have these BUG_ON()'s that are there to deal with any failure to add free space to the in memory free space cache. Such failures are mostly -ENOMEM that should be very rare. However there's no need to have these BUG_ON()'s, we can just return any error to the caller and all callers and their upper call chain are already dealing with errors. So just make add_new_free_space() return any errors, while removing the BUG_ON()'s, and returning the total amount of added free space to an optional u64 pointer argument. Reported-by: syzbot+3ba856e07b7127889d8c@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000e9cb8305ff4e8327@google.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>