Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SftpClient Enumerates Rather Than Accumulates Directory Items #395

Open
znamenap opened this issue Mar 4, 2018 · 2 comments
Open

SftpClient Enumerates Rather Than Accumulates Directory Items #395

znamenap opened this issue Mar 4, 2018 · 2 comments

Comments

@znamenap
Copy link

znamenap commented Mar 4, 2018

Hi,
Thank you for really great library! Thank you once again!

I've a proposal to enhance (fix) some behavior which leads to delayed processing and huge memory footprints. They are actually our current production issues caused by 65 thousand directory items in single parent directory. Our consumer service has to list all of them and then process one by one or in batches. The current SftpClient code basically reads all the items and make SftpFile objects of it. That generates a peak in memory usage which is still acceptable. However, the criteria or behavior which is not acceptable is the time required to make the list of the folder items. In our scenario, it takes 1 minute and 45 seconds. The distance is between London and New York for production environment. The service we maintain and which uses SSH.NET has to check the remote directory as frequent as possible at max by 1 minute. In my test environment between West Europe and West US the listing operation lasts 2 minutes and 6 seconds over 75 thousand folder items.

            var files = _sftpSession.RequestReadDir(handle);

            while (files != null)
            {
                result.AddRange(from f in files
                                select new SftpFile(_sftpSession, string.Format(CultureInfo.InvariantCulture, "{0}{1}", basePath, f.Key), f.Value));

                //  Call callback to report number of files read
                if (listCallback != null)
                {
                    //  Execute callback on different thread
                    ThreadAbstraction.ExecuteThread(() => listCallback(result.Count));
                }

                files = _sftpSession.RequestReadDir(handle);
            }

There is nothing actually to do with the principle on listing the directory into items but how the library handles result data. Currently, it accumulates all items and then passes them to consumer as one big list. The enhancement should be about to introduce a new method which does real enumeration. This would merge/join the time spent on listing and consuming the items and pushes it directly to consumer. The consumer's logic may decide - ah, we've got enough items to process so we can stop enumerating (i.e. in the middle or at the first 150 items). This scenario actually shorten the processing time to acceptable minimum.

Obviously, I've looked into the source code, have done the change, tested it and thought about multithreading scenarios. I wish it to be merged in if you agree on.

Many thanks,
Pavel

Mine proposal is about to add dedicated new methods into SftpClient which does enumeration:

        /// <summary>
        /// Enumerates files and directories in remote directory.
        /// </summary>
        /// <remarks>
        /// This method differs to <see cref="ListDirectory(string, Action{int})"/> in the way how the items are returned. 
        /// It yields the items to the last moment for the enumerator to decide if it needs to continue or stop enumerating the items. 
        /// It is handy in case of really huge directory contents at remote server - meaning really huge 65 thousand files and more.
        /// It also decrease the memory footprint and avoids LOH allocation as happen per call to <see cref="ListDirectory(string, Action{int})"/> method.
        /// There aren't asynchronous counterpart methods to this because enumerating should happen in your specific asynchronous block.
        /// </remarks>
        /// <param name="path">The path.</param>
        /// <param name="listCallback">The list callback.</param>
        /// <returns>
        /// An <see cref="System.Collections.Generic.IEnumerable{SftpFile}"/> of files and directories ready to be enumerated.
        /// </returns>
        /// <exception cref="ArgumentNullException"><paramref name="path" /> is <b>null</b>.</exception>
        /// <exception cref="SshConnectionException">Client is not connected.</exception>
        /// <exception cref="SftpPermissionDeniedException">Permission to list the contents of the directory was denied by the remote host. <para>-or-</para> A SSH command was denied by the server.</exception>
        /// <exception cref="SshException">A SSH error where <see cref="Exception.Message" /> is the message from the remote host.</exception>
        /// <exception cref="ObjectDisposedException">The method was called after the client was disposed.</exception>
        public IEnumerable<SftpFile> EnumerateDirectory(string path, Action<int> listCallback = null)`

        /// <summary>
        /// Internals the list directory.
        /// </summary>
        /// <param name="path">The path.</param>
        /// <param name="listCallback">The list callback.</param>
        /// <returns>
        /// A list of files in the specified directory.
        /// </returns>
        /// <exception cref="ArgumentNullException"><paramref name="path" /> is <b>null</b>.</exception>
        /// <exception cref="SshConnectionException">Client not connected.</exception>
        private IEnumerable<SftpFile> InternalEnumerateDirectory(string path, Action<int> listCallback)
znamenap added a commit to znamenap/SSH.NET that referenced this issue Mar 4, 2018
znamenap added a commit to znamenap/SSH.NET that referenced this issue Mar 4, 2018
@znamenap
Copy link
Author

znamenap commented Mar 4, 2018

Hi Gert,

Can you review and approve this issue and pull-request #396 ?
Not sure what is the policy in this repo and have seen just your commits though thinking of you as moderator.

Thank you.
Pavel

@znamenap
Copy link
Author

Hi @drieseng, I'd like you to bring also your attention to this issue. Please, can you review and approve this pull-request #396 for this issue?

Thank you.
Pavel

znamenap added a commit to znamenap/SSH.NET that referenced this issue Sep 1, 2020
znamenap added a commit to znamenap/SSH.NET that referenced this issue Sep 1, 2020
znamenap added a commit to znamenap/SSH.NET that referenced this issue Sep 20, 2023
znamenap added a commit to znamenap/SSH.NET that referenced this issue Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant