opensuse:kernel.git
4 years agolibceph: close socket directly from ceph_con_close()
Sage Weil [Fri, 20 Jul 2012 23:45:49 +0000 (16:45 -0700)]
libceph: close socket directly from ceph_con_close()

(cherry picked from commit ee76e0736db8455e3b11827d6899bd2a4e1d0584)

It is simpler to do this immediately, since we already hold the con mutex.
It also avoids the need to deal with a not-quite-CLOSED socket in con_work.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: drop gratuitous socket close calls in con_work
Sage Weil [Fri, 20 Jul 2012 22:40:04 +0000 (15:40 -0700)]
libceph: drop gratuitous socket close calls in con_work

(cherry picked from commit 2e8cb10063820af7ed7638e3fd9013eee21266e7)

If the state is CLOSED or OPENING, we shouldn't have a socket.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: move ceph_con_send() closed check under the con mutex
Sage Weil [Fri, 20 Jul 2012 22:34:04 +0000 (15:34 -0700)]
libceph: move ceph_con_send() closed check under the con mutex

(cherry picked from commit a59b55a602b6c741052d79c1e3643f8440cddd27)

Take the con mutex before checking whether the connection is closed to
avoid racing with someone else closing it.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: move msgr clear_standby under con mutex protection
Sage Weil [Fri, 20 Jul 2012 22:33:04 +0000 (15:33 -0700)]
libceph: move msgr clear_standby under con mutex protection

(cherry picked from commit 00650931e52e97fe64096bec167f5a6780dfd94a)

Avoid dropping and retaking con->mutex in the ceph_con_send() case by
leaving locking up to the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix fault locking; close socket on lossy fault
Sage Weil [Fri, 20 Jul 2012 22:22:53 +0000 (15:22 -0700)]
libceph: fix fault locking; close socket on lossy fault

(cherry picked from commit 3b5ede07b55b52c3be27749d183d87257d032065)

If we fault on a lossy connection, we should still close the socket
immediately, and do so under the con mutex.

We should also take the con mutex before printing out the state bits in
the debug output.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: reset connection retry on successfully negotiation
Sage Weil [Mon, 30 Jul 2012 23:22:05 +0000 (16:22 -0700)]
libceph: reset connection retry on successfully negotiation

(cherry picked from commit 85effe183dd45854d1ad1a370b88cddb403c4c91)

We exponentially back off when we encounter connection errors.  If several
errors accumulate, we will eventually wait ages before even trying to
reconnect.

Fix this by resetting the backoff counter after a successful negotiation/
connection with the remote node.  Fixes ceph issue #2802.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: protect ceph_con_open() with mutex
Sage Weil [Mon, 30 Jul 2012 23:21:40 +0000 (16:21 -0700)]
libceph: protect ceph_con_open() with mutex

(cherry picked from commit 5469155f2bc83bb2c88b0a0370c3d54d87eed06e)

Take the con mutex while we are initiating a ceph open.  This is necessary
because the may have previously been in use and then closed, which could
result in a racing workqueue running con_work().

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: (re)initialize bio_iter on start of message receive
Sage Weil [Mon, 30 Jul 2012 23:20:25 +0000 (16:20 -0700)]
libceph: (re)initialize bio_iter on start of message receive

(cherry picked from commit a4107026976f06c9a6ce8cc84a763564ee39d901)

Previously, we were opportunistically initializing the bio_iter if it
appeared to be uninitialized in the middle of the read path.  The problem
is that a sequence like:

 - start reading message
 - initialize bio_iter
 - read half a message
 - messenger fault, reconnect
 - restart reading message
 - ** bio_iter now non-NULL, not reinitialized **
 - read past end of bio, crash

Instead, initialize the bio_iter unconditionally when we allocate/claim
the message for read.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: resubmit linger ops when pg mapping changes
Sage Weil [Mon, 30 Jul 2012 23:19:28 +0000 (16:19 -0700)]
libceph: resubmit linger ops when pg mapping changes

(cherry picked from commit 6194ea895e447fdf4adfd23f67873a32bf4f15ae)

The linger op registration (i.e., watch) modifies the object state.  As
such, the OSD will reply with success if it has already applied without
doing the associated side-effects (setting up the watch session state).
If we lose the ACK and resubmit, we will see success but the watch will not
be correctly registered and we won't get notifies.

To fix this, always resubmit the linger op with a new tid.  We accomplish
this by re-registering as a linger (i.e., 'registered') if we are not yet
registered.  Then the second loop will treat this just like a normal
case of re-registering.

This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and
ceph bug #2796.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix mutex coverage for ceph_con_close
Sage Weil [Mon, 30 Jul 2012 23:24:37 +0000 (16:24 -0700)]
libceph: fix mutex coverage for ceph_con_close

(cherry picked from commit 8c50c817566dfa4581f82373aac39f3e608a7dc8)

Hold the mutex while twiddling all of the state bits to avoid possible
races.  While we're here, make not of why we cannot close the socket
directly.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: report socket read/write error message
Sage Weil [Mon, 30 Jul 2012 23:24:21 +0000 (16:24 -0700)]
libceph: report socket read/write error message

(cherry picked from commit 3a140a0d5c4b9e35373b016e41dfc85f1e526bdb)

We need to set error_msg to something useful before calling ceph_fault();
do so here for try_{read,write}().  This is more informative than

libceph: osd0 192.168.106.220:6801 (null)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: prevent the race of incoming work during teardown
Guanjun He [Mon, 9 Jul 2012 02:50:33 +0000 (19:50 -0700)]
libceph: prevent the race of incoming work during teardown

(cherry picked from commit a2a3258417eb6a1799cf893350771428875a8287)

Add an atomic variable 'stopping' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.

Signed-off-by: Guanjun He <gjhe@suse.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: initialize msgpool message types
Sage Weil [Mon, 9 Jul 2012 21:22:34 +0000 (14:22 -0700)]
libceph: initialize msgpool message types

(cherry picked from commit d50b409fb8698571d8209e5adfe122e287e31290)

Initialize the type field for messages in a msgpool.  The caller was doing
this for osd ops, but not for the reply messages.

Reported-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: allow sock transition from CONNECTING to CLOSED
Sage Weil [Wed, 27 Jun 2012 19:31:02 +0000 (12:31 -0700)]
libceph: allow sock transition from CONNECTING to CLOSED

(cherry picked from commit fbb85a478f6d4cce6942f1c25c6a68ec5b1e7e7f)

It is possible to close a socket that is in the OPENING state.  For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established.  con_work() will come around and shut
down the socket.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: initialize mon_client con only once
Sage Weil [Wed, 27 Jun 2012 19:24:34 +0000 (12:24 -0700)]
libceph: initialize mon_client con only once

(cherry picked from commit 735a72ef952d42a256f79ae3e6dc1c17a45c041b)

Do not re-initialize the con on every connection attempt.  When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: set peer name on con_open, not init
Sage Weil [Wed, 27 Jun 2012 19:24:08 +0000 (12:24 -0700)]
libceph: set peer name on con_open, not init

(cherry picked from commit b7a9e5dd40f17a48a72f249b8bbc989b63bae5fd)

The peer name may change on each open attempt, even when the connection is
reused.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: add some fine ASCII art
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: add some fine ASCII art

(cherry picked from commit bc18f4b1c850ab355e38373fbb60fd28568d84b5)

Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: small changes to messenger.c
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: small changes to messenger.c

(cherry picked from commit 5821bd8ccdf5d17ab2c391c773756538603838c3)

This patch gathers a few small changes in "net/ceph/messenger.c":
  out_msg_pos_next()
    - small logic change that mostly affects indentation
  write_partial_msg_pages().
    - use a local variable trail_off to represent the offset into
      a message of the trail portion of the data (if present)
    - once we are in the trail portion we will always be there, so we
      don't always need to check against our data position
    - avoid computing len twice after we've reached the trail
    - get rid of the variable tmpcrc, which is not needed
    - trail_off and trail_len never change so mark them const
    - update some comments
  read_partial_message_bio()
    - bio_iovec_idx() will never return an error, so don't bother
      checking for it

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: distinguish two phases of connect sequence
Alex Elder [Thu, 24 May 2012 16:55:03 +0000 (11:55 -0500)]
libceph: distinguish two phases of connect sequence

(cherry picked from commit 7593af920baac37752190a0db703d2732bed4a3b)

Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer.  Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun.  The
CONNECTING bit/state continues to be set during this phase.

Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states.  In other
words, when NEGOTIATING is set, clear CONNECTING.  That way only
one of them will be active at a time.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: separate banner and connect writes
Alex Elder [Thu, 31 May 2012 16:37:29 +0000 (11:37 -0500)]
libceph: separate banner and connect writes

(cherry picked from commit ab166d5aa3bc036fba7efaca6e4e43a7e9510acf)

There are two phases in the process of linking together the two ends
of a ceph connection.  The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.

When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.

So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).

Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed.  Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.

Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: define and use an explicit CONNECTED state
Alex Elder [Wed, 23 May 2012 19:35:23 +0000 (14:35 -0500)]
libceph: define and use an explicit CONNECTED state

(cherry picked from commit e27947c767f5bed15048f4e4dad3e2eb69133697)

There is no state explicitly defined when a ceph connection is fully
operational.  So define one.

It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.

Be a little more careful when examining the old state when a socket
disconnect event is reported.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: clear NEGOTIATING when done
Alex Elder [Wed, 23 May 2012 19:35:23 +0000 (14:35 -0500)]
libceph: clear NEGOTIATING when done

(cherry picked from commit 3ec50d1868a9e0493046400bb1fdd054c7f64ebd)

A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server).  But that bit
is not cleared again--at least not until another connection attempt
is initiated.

Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: clear CONNECTING in ceph_con_close()
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: clear CONNECTING in ceph_con_close()

(cherry picked from commit bb9e6bba5d8b85b631390f8dbe8a24ae1ff5b48a)

A connection that is closed will no longer be connecting.  So
clear the CONNECTING state bit in ceph_con_close().  Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: don't touch con state in con_close_socket()
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: don't touch con state in con_close_socket()

(cherry picked from commit 456ea46865787283088b23a8a7f69244513b95f0)

In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.

Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down.  We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.

A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down.  Add a comment explaining
the reason for this.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: just set SOCK_CLOSED when state changes
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: just set SOCK_CLOSED when state changes

(cherry picked from commit d65c9e0b9eb43d14ece9dd843506ccba06162ee7)

When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
connection flag bit is set, and if it had not been previously set
queue_con() is called to ensure con_work() will get a chance to
handle the changed state.

con_work() atomically checks--and if set, clears--the SOCK_CLOSED
bit if it was set.  This means that even if the bit were set
repeatedly, the related processing in con_work() only gets called
once per transition of the bit from 0 to 1.

What's important then is that we ensure con_work() gets called *at
least* once when a socket close event occurs, not that it gets
called *exactly* once.

The work queue mechanism already takes care of queueing work
only if it is not already queued, so there's no need for us
to call queue_con() conditionally.

So this patch just makes it so the SOCK_CLOSED flag gets set
unconditionally in ceph_sock_state_change().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: don't change socket state on sock event
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: don't change socket state on sock event

(cherry picked from commit 188048bce311ee41e5178bc3255415d0eae28423)

Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.

Changing connection information during handling of a socket event is
not very clean, so instead move this assignment inside con_work(),
where it can be done during normal connection-level processing (and
under protection of the connection mutex as well).

Move the handling of a socket closed event up to the top of the
processing loop in con_work(); there's no point in handling backoff
etc. if we have a newly-closed socket to take care of.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: SOCK_CLOSED is a flag, not a state
Alex Elder [Thu, 21 Jun 2012 02:53:53 +0000 (21:53 -0500)]
libceph: SOCK_CLOSED is a flag, not a state

(cherry picked from commit a8d00e3cdef4c1c4f194414b72b24cd995439a05)

The following commit changed it so SOCK_CLOSED bit was stored in
a connection's new "flags" field rather than its "state" field.

    libceph: start separating connection flags from state
    commit 928443cd

That bit is used in con_close_socket() to protect against setting an
error message more than once in the socket event handler function.

Unfortunately, the field being operated on in that function was not
updated to be "flags" as it should have been.  This fixes that
error.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: don't use bio_iter as a flag
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: don't use bio_iter as a flag

(cherry picked from commit abdaa6a849af1d63153682c11f5bbb22dacb1f6b)

Recently a bug was fixed in which the bio_iter field in a ceph
message was not being properly re-initialized when a message got
re-transmitted:
    commit 43643528cce60ca184fe8197efa8e8da7c89a037
    Author: Yan, Zheng <zheng.z.yan@intel.com>
    rbd: Clear ceph_msg->bio_iter for retransmitted message

We are now only initializing the bio_iter field when we are about to
start to write message data (in prepare_write_message_data()),
rather than every time we are attempting to write any portion of the
message data (in write_partial_msg_pages()).  This means we no
longer need to use the msg->bio_iter field as a flag.

So just don't do that any more.  Trust prepare_write_message_data()
to ensure msg->bio_iter is properly initialized, every time we are
about to begin writing (or re-writing) a message's bio data.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: move init of bio_iter
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: move init of bio_iter

(cherry picked from commit 572c588edadaa3da3992bd8a0fed830bbcc861f8)

If a message has a non-null bio pointer, its bio_iter field is
initialized in write_partial_msg_pages() if this has not been done
already.  This is really a one-time setup operation for sending a
message's (bio) data, so move that initialization code into
prepare_write_message_data() which serves that purpose.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: move init_bio_*() functions up
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: move init_bio_*() functions up

(cherry picked from commit df6ad1f97342ebc4270128222e896541405eecdb)

Move init_bio_iter() and iter_bio_next() up in their source file so
the'll be defined before they're needed.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: don't mark footer complete before it is
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: don't mark footer complete before it is

(cherry picked from commit fd154f3c75465abd83b7a395033e3755908a1e6e)

This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
flag before the CRC for the data portion (recorded in the footer)
has been completely computed.  Hold off setting the complete flag
until we've decided it's ready to send.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: encapsulate advancing msg page
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: encapsulate advancing msg page

(cherry picked from commit 84ca8fc87fcf4ab97bb8acdb59bf97bb4820cb14)

In write_partial_msg_pages(), once all the data from a page has been
sent we advance to the next one.  Put the code that takes care of
this into its own function.

While modifying write_partial_msg_pages(), make its local variable
"in_trail" be Boolean, and use the local variable "msg" (which is
just the connection's current out_msg pointer) consistently.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: encapsulate out message data setup
Alex Elder [Mon, 11 Jun 2012 19:57:13 +0000 (14:57 -0500)]
libceph: encapsulate out message data setup

(cherry picked from commit 739c905baa018c99003564ebc367d93aa44d4861)

Move the code that prepares to write the data portion of a message
into its own function.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: drop ceph_con_get/put helpers and nref member
Sage Weil [Thu, 21 Jun 2012 19:49:23 +0000 (12:49 -0700)]
libceph: drop ceph_con_get/put helpers and nref member

(cherry picked from commit d59315ca8c0de00df9b363f94a2641a30961ca1c)

These are no longer used.  Every ceph_connection instance is embedded in
another structure, and refcounts manipulated via the get/put ops.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: use con get/put methods
Sage Weil [Thu, 21 Jun 2012 19:47:08 +0000 (12:47 -0700)]
libceph: use con get/put methods

(cherry picked from commit 36eb71aa57e6a33d61fd90a2fd87f00c6844bc86)

The ceph_con_get/put() helpers manipulate the embedded con ref
count, which isn't used now that ceph_connections are embedded in
other structures.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix NULL dereference in reset_connection()
Dan Carpenter [Tue, 19 Jun 2012 13:52:33 +0000 (08:52 -0500)]
libceph: fix NULL dereference in reset_connection()

(cherry picked from commit 26ce171915f348abd1f41da1ed139d93750d987f)

We dereference "con->in_msg" on the line after it was set to NULL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: transition socket state prior to actual connect
Sage Weil [Sat, 9 Jun 2012 21:19:21 +0000 (14:19 -0700)]
libceph: transition socket state prior to actual connect

(cherry picked from commit 89a86be0ce20022f6ede8bccec078dbb3d63caaa)

Once we call ->connect(), we are racing against the actual
connection, and a subsequent transition from CONNECTING ->
CONNECTED.  Set the state to CONNECTING before that, under the
protection of the mutex, to avoid the race.

This was introduced in 928443cd9644e7cfd46f687dbeffda2d1a357ff9,
with the original socket state code.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix overflow in osdmap_apply_incremental()
Xi Wang [Thu, 7 Jun 2012 00:35:55 +0000 (19:35 -0500)]
libceph: fix overflow in osdmap_apply_incremental()

(cherry picked from commit a5506049500b30dbc5edb4d07a3577477c1f3643)

On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It would also overflow the subsequent kmalloc() size, leading to
out-of-bounds write.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix overflow in osdmap_decode()
Xi Wang [Thu, 7 Jun 2012 00:35:55 +0000 (19:35 -0500)]
libceph: fix overflow in osdmap_decode()

(cherry picked from commit e91a9b639a691e0982088b5954eaafb5a25c8f1c)

On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass
the check ceph_decode_need(p, end, n * sizeof(u32), bad).  It would also
overflow the subsequent kmalloc() size, leading to out-of-bounds write.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix overflow in __decode_pool_names()
Xi Wang [Thu, 7 Jun 2012 00:35:55 +0000 (19:35 -0500)]
libceph: fix overflow in __decode_pool_names()

(cherry picked from commit ad3b904c07dfa88603689bf9a67bffbb9b99beb5)

`len' is read from network and thus needs validation.  Otherwise a
large `len' would cause out-of-bounds access via the memcpy() call.
In addition, len = 0xffffffff would overflow the kmalloc() size,
leading to out-of-bounds write.

This patch adds a check of `len' via ceph_decode_need().  Also use
kstrndup rather than kmalloc/memcpy.

[elder@inktank.com: added -ENOMEM return for null kstrndup() result]

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: make ceph_con_revoke_message() a msg op
Alex Elder [Fri, 1 Jun 2012 19:56:43 +0000 (14:56 -0500)]
libceph: make ceph_con_revoke_message() a msg op

(cherry picked from commit 8921d114f5574c6da2cdd00749d185633ecf88f3)

ceph_con_revoke_message() is passed both a message and a ceph
connection.  A ceph_msg allocated for incoming messages on a
connection always has a pointer to that connection, so there's no
need to provide the connection when revoking such a message.

Note that the existing logic does not preclude the message supplied
being a null/bogus message pointer.  The only user of this interface
is the OSD client, and the only value an osd client passes is a
request's r_reply field.  That is always non-null (except briefly in
an error path in ceph_osdc_alloc_request(), and that drops the
only reference so the request won't ever have a reply to revoke).
So we can safely assume the passed-in message is non-null, but add a
BUG_ON() to make it very obvious we are imposing this restriction.

Rename the function ceph_msg_revoke_incoming() to reflect that it is
really an operation on an incoming message.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: make ceph_con_revoke() a msg operation
Alex Elder [Fri, 1 Jun 2012 19:56:43 +0000 (14:56 -0500)]
libceph: make ceph_con_revoke() a msg operation

(cherry picked from commit 6740a845b2543cc46e1902ba21bac743fbadd0dc)

ceph_con_revoke() is passed both a message and a ceph connection.
Now that any message associated with a connection holds a pointer
to that connection, there's no need to provide the connection when
revoking a message.

This has the added benefit of precluding the possibility of the
providing the wrong connection pointer.  If the message's connection
pointer is null, it is not being tracked by any connection, so
revoking it is a no-op.  This is supported as a convenience for
upper layers, so they can revoke a message that is not actually
"in flight."

Rename the function ceph_msg_revoke() to reflect that it is really
an operation on a message, not a connection.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: have messages take a connection reference
Alex Elder [Mon, 4 Jun 2012 19:43:33 +0000 (14:43 -0500)]
libceph: have messages take a connection reference

(cherry picked from commit 92ce034b5a740046cc643a21ea21eaad589e0043)

There are essentially two types of ceph messages: incoming and
outgoing.  Outgoing messages are always allocated via ceph_msg_new(),
and at the time of their allocation they are not associated with any
particular connection.  Incoming messages are always allocated via
ceph_con_in_msg_alloc(), and they are initially associated with the
connection from which incoming data will be placed into the message.

When an outgoing message gets sent, it becomes associated with a
connection and remains that way until the message is successfully
sent.  The association of an incoming message goes away at the point
it is sent to an upper layer via a con->ops->dispatch method.

This patch implements reference counting for all ceph messages, such
that every message holds a reference (and a pointer) to a connection
if and only if it is associated with that connection (as described
above).

For background, here is an explanation of the ceph message
lifecycle, emphasizing when an association exists between a message
and a connection.

Outgoing Messages
An outgoing message is "owned" by its allocator, from the time it is
allocated in ceph_msg_new() up to the point it gets queued for
sending in ceph_con_send().  Prior to that point the message's
msg->con pointer is null; at the point it is queued for sending its
message pointer is assigned to refer to the connection.  At that
time the message is inserted into a connection's out_queue list.

When a message on the out_queue list has been sent to the socket
layer to be put on the wire, it is transferred out of that list and
into the connection's out_sent list.  At that point it is still owned
by the connection, and will remain so until an acknowledgement is
received from the recipient that indicates the message was
successfully transferred.  When such an acknowledgement is received
(in process_ack()), the message is removed from its list (in
ceph_msg_remove()), at which point it is no longer associated with
the connection.

So basically, any time a message is on one of a connection's lists,
it is associated with that connection.  Reference counting outgoing
messages can thus be done at the points a message is added to the
out_queue (in ceph_con_send()) and the point it is removed from
either its two lists (in ceph_msg_remove())--at which point its
connection pointer becomes null.

Incoming Messages
When an incoming message on a connection is getting read (in
read_partial_message()) and there is no message in con->in_msg,
a new one is allocated using ceph_con_in_msg_alloc().  At that
point the message is associated with the connection.  Once that
message has been completely and successfully read, it is passed to
upper layer code using the connection's con->ops->dispatch method.
At that point the association between the message and the connection
no longer exists.

Reference counting of connections for incoming messages can be done
by taking a reference to the connection when the message gets
allocated, and releasing that reference when it gets handed off
using the dispatch method.

We should never fail to get a connection reference for a
message--the since the caller should already hold one.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: have messages point to their connection
Alex Elder [Fri, 1 Jun 2012 19:56:43 +0000 (14:56 -0500)]
libceph: have messages point to their connection

(cherry picked from commit 38941f8031bf042dba3ced6394ba3a3b16c244ea)

When a ceph message is queued for sending it is placed on a list of
pending messages (ceph_connection->out_queue).  When they are
actually sent over the wire, they are moved from that list to
another (ceph_connection->out_sent).  When acknowledgement for the
message is received, it is removed from the sent messages list.

During that entire time the message is "in the possession" of a
single ceph connection.  Keep track of that connection in the
message.  This will be used in the next patch (and is a helpful
bit of information for debugging anyway).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: tweak ceph_alloc_msg()
Alex Elder [Mon, 4 Jun 2012 19:43:32 +0000 (14:43 -0500)]
libceph: tweak ceph_alloc_msg()

(cherry picked from commit 1c20f2d26795803fc4f5155fe4fca5717a5944b6)

The function ceph_alloc_msg() is only used to allocate a message
that will be assigned to a connection's in_msg pointer.  Rename the
function so this implied usage is more clear.

In addition, make that assignment inside the function (again, since
that's precisely what it's intended to be used for).  This allows us
to return what is now provided via the passed-in address of a "skip"
variable.  The return type is now Boolean to be explicit that there
are only two possible outcomes.

Make sure the result of an ->alloc_msg method call always sets the
value of *skip properly.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fully initialize connection in con_init()
Alex Elder [Sun, 27 May 2012 04:26:43 +0000 (23:26 -0500)]
libceph: fully initialize connection in con_init()

(cherry picked from commit 1bfd89f4e6e1adc6a782d94aa5d4c53be1e404d7)

Move the initialization of a ceph connection's private pointer,
operations vector pointer, and peer name information into
ceph_con_init().  Rearrange the arguments so the connection pointer
is first.  Hide the byte-swapping of the peer entity number inside
ceph_con_init()

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: init monitor connection when opening
Alex Elder [Sun, 27 May 2012 04:26:43 +0000 (23:26 -0500)]
libceph: init monitor connection when opening

(cherry picked from commit 20581c1faf7b15ae1f8b80c0ec757877b0b53151)

Hold off initializing a monitor client's connection until just
before it gets opened for use.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: drop connection refcounting for mon_client
Sage Weil [Fri, 1 Jun 2012 03:27:50 +0000 (20:27 -0700)]
libceph: drop connection refcounting for mon_client

(cherry picked from commit ec87ef4309d33bd9c87a53bb5152a86ae7a65f25)

All references to the embedded ceph_connection come from the msgr
workqueue, which is drained prior to mon_client destruction.  That
means we can ignore con refcounting entirely.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: embed ceph connection structure in mon_client
Alex Elder [Sun, 27 May 2012 04:26:43 +0000 (23:26 -0500)]
libceph: embed ceph connection structure in mon_client

(cherry picked from commit 67130934fb579fdf0f2f6d745960264378b57dc8)

A monitor client has a pointer to a ceph connection structure in it.
This is the only one of the three ceph client types that do it this
way; the OSD and MDS clients embed the connection into their main
structures.  There is always exactly one ceph connection for a
monitor client, so there is no need to allocate it separate from the
monitor client structure.

So switch the ceph_mon_client structure to embed its
ceph_connection structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: set CLOSED state bit in con_init
Alex Elder [Tue, 29 May 2012 16:04:58 +0000 (11:04 -0500)]
libceph: set CLOSED state bit in con_init

(cherry picked from commit a5988c490ef66cb04ea2f610681949b25c773b3c)

Once a connection is fully initialized, it is really in a CLOSED
state, so make that explicit by setting the bit in its state field.

It is possible for a connection in NEGOTIATING state to get a
failure, leading to ceph_fault() and ultimately ceph_con_close().
Clear that bits if it is set in that case, to reflect that the
connection truly is closed and is no longer participating in a
connect sequence.

Issue a warning if ceph_con_open() is called on a connection that
is not in CLOSED state.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: provide osd number when creating osd
Alex Elder [Sun, 27 May 2012 04:26:43 +0000 (23:26 -0500)]
libceph: provide osd number when creating osd

(cherry picked from commit e10006f807ffc4d5b1d861305d18d9e8145891ca)

Pass the osd number to the create_osd() routine, and move the
initialization of fields that depend on it therein.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: start tracking connection socket state
Alex Elder [Wed, 23 May 2012 03:15:49 +0000 (22:15 -0500)]
libceph: start tracking connection socket state

(cherry picked from commit ce2c8903e76e690846a00a0284e4bd9ee954d680)

Start explicitly keeping track of the state of a ceph connection's
socket, separate from the state of the connection itself.  Create
placeholder functions to encapsulate the state transitions.

    --------
    | NEW* |  transient initial state
    --------
        | con_sock_state_init()
        v
    ----------
    | CLOSED |  initialized, but no socket (and no
    ----------  TCP connection)
     ^      \
     |       \ con_sock_state_connecting()
     |        ----------------------
     |                              \
     + con_sock_state_closed()       \
     |\                               \
     | \                               \
     |  -----------                     \
     |  | CLOSING |  socket event;       \
     |  -----------  await close          \
     |       ^                            |
     |       |                            |
     |       + con_sock_state_closing()   |
     |      / \                           |
     |     /   ---------------            |
     |    /                   \           v
     |   /                    --------------
     |  /    -----------------| CONNECTING |  socket created, TCP
     |  |   /                 --------------  connect initiated
     |  |   | con_sock_state_connected()
     |  |   v
    -------------
    | CONNECTED |  TCP connection established
    -------------

Make the socket state an atomic variable, reinforcing that it's a
distinct transtion with no possible "intermediate/both" states.
This is almost certainly overkill at this point, though the
transitions into CONNECTED and CLOSING state do get called via
socket callback (the rest of the transitions occur with the
connection mutex held).  We can back out the atomicity later.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: start separating connection flags from state
Alex Elder [Tue, 22 May 2012 16:41:43 +0000 (11:41 -0500)]
libceph: start separating connection flags from state

(cherry picked from commit 928443cd9644e7cfd46f687dbeffda2d1a357ff9)

A ceph_connection holds a mixture of connection state (as in "state
machine" state) and connection flags in a single "state" field.  To
make the distinction more clear, define a new "flags" field and use
it rather than the "state" field to hold Boolean flag values.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil<sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: embed ceph messenger structure in ceph_client
Alex Elder [Sun, 27 May 2012 04:26:43 +0000 (23:26 -0500)]
libceph: embed ceph messenger structure in ceph_client

(cherry picked from commit 15d9882c336db2db73ccf9871ae2398e452f694c)

A ceph client has a pointer to a ceph messenger structure in it.
There is always exactly one ceph messenger for a ceph client, so
there is no need to allocate it separate from the ceph client
structure.

Switch the ceph_client structure to embed its ceph_messenger
structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: rename kvec_reset and kvec_add functions
Alex Elder [Wed, 23 May 2012 19:35:23 +0000 (14:35 -0500)]
libceph: rename kvec_reset and kvec_add functions

(cherry picked from commit e22004235a900213625acd6583ac913d5a30c155)

The functions ceph_con_out_kvec_reset() and ceph_con_out_kvec_add()
are entirely private functions, so drop the "ceph_" prefix in their
name to make them slightly more wieldy.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: rename socket callbacks
Alex Elder [Tue, 22 May 2012 16:41:43 +0000 (11:41 -0500)]
libceph: rename socket callbacks

(cherry picked from commit 327800bdc2cb9b71f4b458ca07aa9d522668dde0)

Change the names of the three socket callback functions to make it
more obvious they're specifically associated with a connection's
socket (not the ceph connection that uses it).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: kill bad_proto ceph connection op
Alex Elder [Wed, 30 May 2012 02:47:38 +0000 (21:47 -0500)]
libceph: kill bad_proto ceph connection op

(cherry picked from commit 6384bb8b8e88a9c6bf2ae0d9517c2c0199177c34)

No code sets a bad_proto method in its ceph connection operations
vector, so just get rid of it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: eliminate connection state "DEAD"
Alex Elder [Tue, 22 May 2012 16:41:43 +0000 (11:41 -0500)]
libceph: eliminate connection state "DEAD"

(cherry picked from commit e5e372da9a469dfe3ece40277090a7056c566838)

The ceph connection state "DEAD" is never set and is therefore not
needed.  Eliminate it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: check PG_Private flag before accessing page->private
Yan, Zheng [Mon, 28 May 2012 06:44:30 +0000 (14:44 +0800)]
ceph: check PG_Private flag before accessing page->private

(cherry picked from commit 28c0254ede13ab575d2df5c6585ed3d4817c3e6b)

I got lots of NULL pointer dereference Oops when compiling kernel on ceph.
The bug is because the kernel page migration routine replaces some pages
in the page cache with new pages, these new pages' private can be non-zero.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agorbd: Fix ceph_snap_context size calculation
Yan, Zheng [Wed, 6 Jun 2012 14:15:33 +0000 (09:15 -0500)]
rbd: Fix ceph_snap_context size calculation

(cherry picked from commit f9f9a1904467816452fc70740165030e84c2c659)

ceph_snap_context->snaps is an u64 array

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agorbd: store snapshot id instead of index
Josh Durgin [Mon, 21 Nov 2011 21:04:42 +0000 (13:04 -0800)]
rbd: store snapshot id instead of index

(cherry picked from commit 77dfe99fe3cb0b2b0545e19e2d57b7a9134ee3c0)

When a device was open at a snapshot, and snapshots were deleted or
added, data from the wrong snapshot could be read. Instead of
assuming the snap context is constant, store the actual snap id when
the device is initialized, and rely on the OSDs to signal an error
if we try reading from a snapshot that was deleted.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agorbd: protect read of snapshot sequence number
Josh Durgin [Mon, 5 Dec 2011 18:47:13 +0000 (10:47 -0800)]
rbd: protect read of snapshot sequence number

(cherry picked from commit 403f24d3d51760a8b9368d595fa5f48c309f1a0f)

This is updated whenever a snapshot is added or deleted, and the
snapc pointer is changed with every refresh of the header.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agorbd: don't hold spinlock during messenger flush
Alex Elder [Wed, 4 Apr 2012 18:35:44 +0000 (13:35 -0500)]
rbd: don't hold spinlock during messenger flush

(cherry picked from commit cd9d9f5df6098c50726200d4185e9e8da32785b3)

A recent change made changes to the rbd_client_list be protected by
a spinlock.  Unfortunately in rbd_put_client(), the lock is taken
before possibly dropping the last reference to an rbd_client, and on
the last reference that eventually calls flush_workqueue() which can
sleep.

The problem was flagged by a debug spinlock warning:
    BUG: spinlock wrong CPU on CPU#3, rbd/27814

The solution is to move the spinlock acquisition and release inside
rbd_client_release(), which is the spot where it's really needed for
protecting the removal of the rbd_client from the client list.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: fix messenger retry
Sage Weil [Tue, 10 Jul 2012 18:53:34 +0000 (11:53 -0700)]
libceph: fix messenger retry

(cherry picked from commit 5bdca4e0768d3e0f4efa43d9a2cc8210aeb91ab9)

In ancient times, the messenger could both initiate and accept connections.
An artifact if that was data structures to store/process an incoming
ceph_msg_connect request and send an outgoing ceph_msg_connect_reply.
Sadly, the negotiation code was referencing those structures and ignoring
important information (like the peer's connect_seq) from the correct ones.

Among other things, this fixes tight reconnect loops where the server sends
RETRY_SESSION and we (the client) retries with the same connect_seq as last
time.  This bug pretty easily triggered by injecting socket failures on the
MDS and running some fs workload like workunits/direct_io/test_sync_io.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: flush msgr queue during mon_client shutdown
Sage Weil [Mon, 11 Jun 2012 03:43:56 +0000 (20:43 -0700)]
libceph: flush msgr queue during mon_client shutdown

(cherry picked from commit f3dea7edd3d449fe7a6d402c1ce56a294b985261)
(cherry picked from commit 642c0dbde32f34baa7886e988a067089992adc8f)

We need to flush the msgr workqueue during mon_client shutdown to
ensure that any work affecting our embedded ceph_connection is
finished so that we can be safely destroyed.

Previously, we were flushing the work queue after osd_client
shutdown and before mon_client shutdown to ensure that any osd
connection refs to authorizers are flushed.  Remove the redundant
flush, and document in the comment that the mon_client flush is
needed to cover that case as well.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agorbd: Clear ceph_msg->bio_iter for retransmitted message
Yan, Zheng [Thu, 7 Jun 2012 00:35:55 +0000 (19:35 -0500)]
rbd: Clear ceph_msg->bio_iter for retransmitted message

(cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037)
(cherry picked from commit b132cf4c733f91bb4dd2277ea049243cf16e8b66)

The bug can cause NULL pointer dereference in write_partial_msg_pages

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: use con get/put ops from osd_client
Sage Weil [Fri, 1 Jun 2012 03:22:18 +0000 (20:22 -0700)]
libceph: use con get/put ops from osd_client

(cherry picked from commit 0d47766f14211a73eaf54cab234db134ece79f49)

There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c.  This is a bug since those ops aren't defined to
be ceph_con_get/put.

This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.

The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.

[elder@inktank.com: added that last sentence]

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 88ed6ea0b295f8e2383d599a04027ec596cdf97b)

4 years agolibceph: osd_client: don't drop reply reference too early
Alex Elder [Mon, 4 Jun 2012 19:43:32 +0000 (14:43 -0500)]
libceph: osd_client: don't drop reply reference too early

(cherry picked from commit ab8cb34a4b2f60281a4b18b1f1ad23bc2313d91b)

In ceph_osdc_release_request(), a reference to the r_reply message
is dropped.  But just after that, that same message is revoked if it
was in use to receive an incoming reply.  Reorder these so we are
sure we hold a reference until we're actually done with the message.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 680584fab05efff732b5ae16ad601ba994d7b505)

4 years agolibceph: fix pg_temp updates
Sage Weil [Mon, 21 May 2012 16:45:23 +0000 (09:45 -0700)]
libceph: fix pg_temp updates

(cherry picked from commit 6bd9adbdf9ca6a052b0b7455ac67b925eb38cfad)

Usually, we are adding pg_temp entries or removing them.  Occasionally they
update.  In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.

Fix by removing the existing entry before inserting a new one.

Fixes http://tracker.newdream.net/issues/2446

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: avoid unregistering osd request when not registered
Sage Weil [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
libceph: avoid unregistering osd request when not registered

(cherry picked from commit 35f9f8a09e1e88e31bd34a1e645ca0e5f070dd5c)

There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request().  If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.

Simply check if the request is still already unregistered; if so,
return immediately and do nothing.

Fixes http://tracker.newdream.net/issues/2420

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: add auth buf in prepare_write_connect()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: add auth buf in prepare_write_connect()

(cherry picked from commit 3da54776e2c0385c32d143fd497a7f40a88e29dd)

Move the addition of the authorizer buffer to a connection's
out_kvec out of get_connect_authorizer() and into its caller.  This
way, the caller--prepare_write_connect()--can avoid adding the
connect header to out_kvec before it has been fully initialized.

Prior to this patch, it was possible for a connect header to be
sent over the wire before the authorizer protocol or buffer length
fields were initialized.  An authorizer buffer associated with that
header could also be queued to send only after the connection header
that describes it was on the wire.

Fixes http://tracker.newdream.net/issues/2424

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: rename prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: rename prepare_connect_authorizer()

(cherry picked from commit dac1e716c60161867a47745bca592987ca3a9cb2)

Change the name of prepare_connect_authorizer().  The next
patch is going to make this function no longer add anything to the
connection's out_kvec, so it will no longer fit the pattern of
the rest of the prepare_connect_*() functions.

In addition, pass the address of a variable that will hold the
authorization protocol to use.  Move the assignment of that to the
connection's out_connect structure into prepare_write_connect().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: return pointer from prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: return pointer from prepare_connect_authorizer()

(cherry picked from commit 729796be9190f57ca40ccca315e8ad34a1eb8fef)

Change prepare_connect_authorizer() so it returns a pointer (or
pointer-coded error).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: use info returned by get_authorizer
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: use info returned by get_authorizer

(cherry picked from commit 8f43fb53894079bf0caab6e348ceaffe7adc651a)

Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: have get_authorizer methods return pointers
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: have get_authorizer methods return pointers

(cherry picked from commit a3530df33eb91d787d08c7383a0a9982690e42d0)

Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value.  This is to pave the way for making use of the
returned value in an upcoming patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: ensure auth ops are defined before use
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: ensure auth ops are defined before use

(cherry picked from commit a255651d4cad89f1a606edd36135af892ada4f20)

In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced.  There is no
obvious guarantee that this pointer has been assigned.  And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.

Add checks in both routines to make sure they are defined (non-null)
before use.  Add similar checks in a few other spots in these files
while we're at it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: reduce args to create_authorizer
Alex Elder [Wed, 16 May 2012 20:16:39 +0000 (15:16 -0500)]
ceph: messenger: reduce args to create_authorizer

(cherry picked from commit 74f1869f76d043bad12ec03b4d5f04a8c3d1f157)

Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops.  Use a local variable of that type as a
shorthand in the get_authorizer method definitions.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: define ceph_auth_handshake type
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: define ceph_auth_handshake type

(cherry picked from commit 6c4a19158b96ea1fb8acbe0c1d5493d9dcd2f147)

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: check return from get_authorizer
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: check return from get_authorizer

(cherry picked from commit ed96af646011412c2bf1ffe860db170db355fae5)

In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value.  This function can
return an error, so check for it and return it if that ever occurs.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: rework prepare_connect_authorizer()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: rework prepare_connect_authorizer()

(cherry picked from commit b1c6b9803f5491e94041e6da96bc9dec3870e792)

Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.

Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: check prepare_write_connect() result
Alex Elder [Thu, 17 May 2012 02:51:59 +0000 (21:51 -0500)]
ceph: messenger: check prepare_write_connect() result

(cherry picked from commit 5a0f8fdd8a0ebe320952a388331dc043d7e14ced)

prepare_write_connect() can return an error, but only one of its
callers checks for it.  All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: don't set WRITE_PENDING too early
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: don't set WRITE_PENDING too early

(cherry picked from commit e10c758e4031a801ea4d2f8fb39bf14c2658d74b)

prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection.  Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending.  It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.

Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.

Partially addresses http://tracker.newdream.net/issues/2424

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: drop msgr argument from prepare_write_connect()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: drop msgr argument from prepare_write_connect()

(cherry picked from commit e825a66df97776d30a48a187e3a986736af43945)

In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr.  Just get the msgr
value from the ceph connection and drop the unneeded argument.

The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: send banner in process_connect()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: send banner in process_connect()

(cherry picked from commit 41b90c00858129f52d08e6a05c9cfdb0f2bd074d)

prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message.  It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: reset connection kvec caller
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
ceph: messenger: reset connection kvec caller

(cherry picked from commit 84fb3adf6413862cff51d8af3fce5f0b655586a2)

Reset a connection's kvec fields in the caller rather than in
prepare_write_connect().   This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agolibceph: don't reset kvec in prepare_write_banner()
Alex Elder [Wed, 16 May 2012 20:16:38 +0000 (15:16 -0500)]
libceph: don't reset kvec in prepare_write_banner()

(cherry picked from commit d329156f16306449c273002486c28de3ddddfd89)

Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: change read_partial() to take "end" arg
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: change read_partial() to take "end" arg

(cherry picked from commit fd51653f78cf40a0516e521b6de22f329c5bad8d)

Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents.  This
amounts to moving the addition "to + size" into the caller.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: update "to" in read_partial() caller
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: update "to" in read_partial() caller

(cherry picked from commit e6cee71fac27c946a0bbad754dd076e66c4e9dbd)

read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.

Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: messenger: use read_partial() in read_partial_message()
Alex Elder [Thu, 10 May 2012 15:29:50 +0000 (10:29 -0500)]
ceph: messenger: use read_partial() in read_partial_message()

(cherry picked from commit 57dac9d1620942608306d8c17c98a9d1568ffdf4)

There are two blocks of code in read_partial_message()--those that
read the header and footer of the message--that can be replaced by a
call to read_partial().  Do that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoceph: osd_client: fix endianness bug in osd_req_encode_op()
Alex Elder [Fri, 20 Apr 2012 20:49:43 +0000 (15:49 -0500)]
ceph: osd_client: fix endianness bug in osd_req_encode_op()

(cherry picked from commit 065a68f9167e20f321a62d044cb2c3024393d455)

From Al Viro <viro@zeniv.linux.org.uk>

Al Viro noticed that we were using a non-cpu-encoded value in
a switch statement in osd_req_encode_op().  The result would
clearly not work correctly on a big-endian machine.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocrush: fix memory leak when destroying tree buckets
Sage Weil [Mon, 7 May 2012 22:37:05 +0000 (15:37 -0700)]
crush: fix memory leak when destroying tree buckets

(cherry picked from commit 6eb43f4b5a2a74599b4ff17a97c03a342327ca65)

Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocrush: fix tree node weight lookup
Sage Weil [Mon, 7 May 2012 22:36:49 +0000 (15:36 -0700)]
crush: fix tree node weight lookup

(cherry picked from commit f671d4cd9b36691ac4ef42cde44c1b7a84e13631)

Fix the node weight lookup for tree buckets by using a correct accessor.

Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocrush: be more tolerant of nonsensical crush maps
Sage Weil [Mon, 7 May 2012 22:35:24 +0000 (15:35 -0700)]
crush: be more tolerant of nonsensical crush maps

(cherry picked from commit a1f4895be8bf1ba56c2306b058f51619e9b0e8f8)

If we get a map that doesn't make sense, error out or ignore the badness
instead of BUGging out.  This reflects the ceph.git commits
9895f0bff7dc68e9b49b572613d242315fb11b6c and
8ded26472058d5205803f244c2f33cb6cb10de79.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocrush: adjust local retry threshold
Sage Weil [Mon, 7 May 2012 22:35:09 +0000 (15:35 -0700)]
crush: adjust local retry threshold

(cherry picked from commit c90f95ed46393e29d843686e21947d1c6fcb1164)

This small adjustment reflects a change that was made in ceph.git commit
af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago.  An N-1
search is not exhaustive.  Fixed ceph.git bug #1594.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agocrush: clean up types, const-ness
Sage Weil [Mon, 7 May 2012 22:38:35 +0000 (15:38 -0700)]
crush: clean up types, const-ness

(cherry picked from commit 8b12d47b80c7a34dffdd98244d99316db490ec58)

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.

Reviewed-by: Alex Elder <elder@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoselinux: fix sel_netnode_insert() suspicious rcu dereference
Dave Jones [Fri, 9 Nov 2012 00:09:27 +0000 (16:09 -0800)]
selinux: fix sel_netnode_insert() suspicious rcu dereference

commit 88a693b5c1287be4da937699cb82068ce9db0135 upstream.

===============================
[ INFO: suspicious RCU usage. ]
3.5.0-rc1+ #63 Not tainted
-------------------------------
security/selinux/netnode.c:178 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by trinity-child1/8750:
 #0:  (sel_netnode_lock){+.....}, at: [<ffffffff812d8f8a>] sel_netnode_sid+0x16a/0x3e0

stack backtrace:
Pid: 8750, comm: trinity-child1 Not tainted 3.5.0-rc1+ #63
Call Trace:
 [<ffffffff810cec2d>] lockdep_rcu_suspicious+0xfd/0x130
 [<ffffffff812d91d1>] sel_netnode_sid+0x3b1/0x3e0
 [<ffffffff812d8e20>] ? sel_netnode_find+0x1a0/0x1a0
 [<ffffffff812d24a6>] selinux_socket_bind+0xf6/0x2c0
 [<ffffffff810cd1dd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810cdb55>] ? lock_release_holdtime.part.9+0x15/0x1a0
 [<ffffffff81093841>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff812c9536>] security_socket_bind+0x16/0x20
 [<ffffffff815550ca>] sys_bind+0x7a/0x100
 [<ffffffff816c03d5>] ? sysret_check+0x22/0x5d
 [<ffffffff810d392d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff8133b09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816c03a9>] system_call_fastpath+0x16/0x1b

This patch below does what Paul McKenney suggested in the previous thread.

Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoreiserfs: Protect reiserfs_quota_write() with write lock
Jan Kara [Tue, 13 Nov 2012 17:25:38 +0000 (18:25 +0100)]
reiserfs: Protect reiserfs_quota_write() with write lock

commit 361d94a338a3fd0cee6a4ea32bbc427ba228e628 upstream.

Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoreiserfs: Move quota calls out of write lock
Jan Kara [Tue, 13 Nov 2012 16:05:14 +0000 (17:05 +0100)]
reiserfs: Move quota calls out of write lock

commit 7af11686933726e99af22901d622f9e161404e6b upstream.

Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoreiserfs: Protect reiserfs_quota_on() with write lock
Jan Kara [Tue, 13 Nov 2012 15:34:17 +0000 (16:34 +0100)]
reiserfs: Protect reiserfs_quota_on() with write lock

commit b9e06ef2e8706fe669b51f4364e3aeed58639eb2 upstream.

In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoreiserfs: Fix lock ordering during remount
Jan Kara [Tue, 13 Nov 2012 13:55:52 +0000 (14:55 +0100)]
reiserfs: Fix lock ordering during remount

commit 3bb3e1fc47aca554e7e2cc4deeddc24750987ac2 upstream.

When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>