Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting memory behaviours and leak detections #1518

Open
wcandillon opened this issue Sep 13, 2024 · 20 comments
Open

Documenting memory behaviours and leak detections #1518

wcandillon opened this issue Sep 13, 2024 · 20 comments

Comments

@wcandillon
Copy link

Regarding React Native Skia and WebGPU, I have often been confused about the observable memory allocations in these libraries. But even worse, we are genuinely struggling to explain these behaviors to our clients (see #982), who believe that our libraries are leaking memory.

I think it’s fair to say that many developers, in various situations, are under the impression that Hermes is leaking memory, even though it is not.

The goal of this issue is to create a documentation page that explains the current observable memory behavior (so we can share that information with our clients) and shows how to factually check whether a library/app is leaking memory.

Is there interest in such an initiative?

Here’s a recent example: As mentioned in #982, I was under the impression that RN WebGPU was leaking memory. I spent a substantial amount of time debugging, only to confirm that all objects are eventually deleted. Yet, the allocated memory stays the same and keeps growing. The demo on iOS in dev mode goes like this:
We allocate a bunch of objects, and after a certain time, they are all deleted. However, the allocated memory in the performance monitor remains the same. In the second cycle, many more objects are allocated before deletion (thus, allocated memory increases), and this exact pattern continues, creating an almost exponential memory allocation over time.
I assumed this behavior in dev mode is "okay." However, our clients are experiencing very similar behavior, and I’m not sure what to tell them since I’m not even sure myself what’s going on.

I hope this makes sense.

@tmikov
Copy link
Contributor

tmikov commented Sep 15, 2024

We are definitely interested in providing documentation about the observable memory behavior.

However what you are describing doesn't really align with that - we would like to understand the behavior you are describing. None of these things are supposed to happen: memory should not remain the same data is freed. Also, allocation should not accumulate exponentially. We have not actually observed what you are describing and don't have an explanation for it.

In #982 we asked for some data. It would be very useful to see the values returned by getInstrumentedStats() in sequence, in each frame and how they change, along with the number of alive host objects at the same time.

BTW, there is documentation on "Memory Profiles", although some minor details might be out of date: https://github.com/facebook/hermes/blob/main/doc/MemoryProfilers.md

@wcandillon
Copy link
Author

This is a great question. in #982 once I noticed that they were no leaks indeed, I wasn't sure which useful data points I could provide to you that could be useful. would a lot of getInstrumentedStats() as a JSON export or CSV be good?

Please let me know what kind of scenario, input, output data would be perfect for you.
I hope we can fix this gap about perceived behaviour eventually.

@neildhar
Copy link
Contributor

The behaviour you're observing is pretty unexpected, so we're eager to learn more.

  1. Could you confirm which tool you're using to see the growing memory consumption, and the numbers that you are seeing?
  2. Is the memory being reported for the entire app, or memory that is being specifically managed by Hermes?
  3. Are you seeing this only in dev mode, or are you also seeing elevated memory consumption in release builds?
  4. Sharing the result of HermesInternal.getInstrumentedStats() over the period you're seeing the memory growth would be very useful. It would allow us to see the breakdown of the portion of the heap that is being managed by Hermes. A JSON file would be good.

We have also previously seen the allocator on macOS/iOS retain a large amount of excess memory even after allocations are freed, but I would not expect memory consumption to be increasing exponentially over time.

@wcandillon
Copy link
Author

Thank you for following up on this.

    1. I am looking at the performance monitor of the react native app and the instrument panel on XCode.
  1. This is in dev mode only, it sounds like it might be worth trying in release mode? If this would be a dev-mode behaviour only, we would still need a way to explain to our user but it sounds like it could be fair.
  2. Should I just run a demo and log HermesInternal.getInstrumentedStats() every minute or so? and have it as a CSV file?

@neildhar
Copy link
Contributor

looking at the performance monitor of the react native app and the instrument panel on XCode

To confirm, you're seeing elevated memory consumption in both the performance monitor and in the instruments panel?

sounds like it might be worth trying in release mode?

It would be interesting to see the release mode results. A disparity between dev and release modes would definitely be an interesting data point.

Should I just run a demo and log HermesInternal.getInstrumentedStats() every minute or so? and have it as a CSV file?

Yes, that would be fine. Depending on the time scale over which you see the rise in memory consumption, you may want to log more frequently than that so we have enough visibility into the memory pattern. Having extra data will only help.

@dev-john-nguyen
Copy link

dev-john-nguyen commented Oct 4, 2024

Hoping this helps. I believe this issue is related to our app's frequent crashes. Shopify/react-native-skia#2675

HermesInternal.getInstrumentedStats() every second before crash.
hermes_stats.txt

Crash logs
image
image

@neildhar
Copy link
Contributor

neildhar commented Oct 4, 2024

Hey @dev-john-nguyen, according to the Hermes stats, Hermes is only managing 20MB of memory. Could you share more about why you believe Hermes is responsible for the elevated memory consumption?

@neildhar
Copy link
Contributor

neildhar commented Oct 4, 2024

@dev-john-nguyen If your app has multiple Hermes runtime instances, could you share the output of getInstrumentedStats from each instance? The attached logs show that Hermes is using fairly little memory, but it is possible another Hermes instance in the app is consuming more.

@dev-john-nguyen
Copy link

@neildhar I'm not entirely sure if Hermes is the cause of the elevated memory consumption, but I’d like to help identify the cause and hopefully help us work towards a solution.

As far as I know, the app doesn’t spin up multiple Hermes runtime instances, unless it’s happening dynamically in certain scenarios I’m unaware of. Is there a way to confirm whether multiple Hermes runtime instances are running in the app?

@tmikov
Copy link
Contributor

tmikov commented Oct 4, 2024

@dev-john-nguyen there are two possible ways that I can think of this being related to Hermes:

  1. If native memory associated with JS objects is not reported using setExternalMemoryPressure(), the Hermes GC wouldn't know about it and it wouldn't attempt to collect it.
  2. Hypothetically a second Hermes runtime could be the owner of all of this memory. If that was the case, we still have to investigate why it isn't being freed, but at least we would be looking at the correct runtime. Is this likely? I don't know. We know that many apps these days use a second runtime with Reanimated, for example.

One way to know when a Hermes runtime is created is to put a breakpoint with a native debugger in facebook::hermes::makeHermesRuntime().

@dev-john-nguyen
Copy link

dev-john-nguyen commented Oct 4, 2024

So it looks like reanimated is spinning up another Hermes runtime instance.

image

Here's getInstrumentedStats for both instances until crash. (Edited posted the right stats)

reanimated_hermes_status.txt
hermes_stats.txt

@tmikov
Copy link
Contributor

tmikov commented Oct 4, 2024

@dev-john-nguyen thanks for collecting this data so quickly! Unfortunately, it looks like neither of the runtimes owns a lot of memory. You can look at js_allocatedBytes and js_heapSize for the JS memory, and at js_externalBytes for the native memory known to the VM. As you can see, the values are pretty small.

Ignoring the possibility of a catastrophic bug in Hermes (this is always theoretically possible, but I don't really see evidence for it), the only other explanation is that there is native memory that is not reported to the VM.

I think you can use Instruments to see where this memory is being allocated. See this link (and particularly the "check the Malloc Stack box in the Diagnostics area" part): https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use .

@tmikov
Copy link
Contributor

tmikov commented Oct 5, 2024

@dev-john-nguyen another experiment you should run is to invoke gc() periodically (every second?). If it helps, it would prove pretty conclusively that these are not leaks, but just external memory not being reported. If it makes no difference, these are actual leaks.

You should probably do it separately on the two runtimes.

@fatiherdogan01
Copy link

Hi, @dev-john-nguyen
I can conclude that this crash is related to the glog package. I had 11K crashes in production.
I manually deleted the LOG(INFO)s in node_modules/react-native/ReactCommon/jsiexecutor/jsireact/JSIExecutor.cpp. And now 0 crashes in production.
Of course, there are memory leaks in various packages. But now I have to solve the problem with the glog package.
I've been trying to solve this crash for about 2 weeks. The problem was actually simple but I focused on other things.

Can you verify me?

@dev-john-nguyen
Copy link

@tmikov I generated a memory graph report, and it appears to have detected a few leaks related to Hermes. I'm not very familiar with the tool, but let me know if you need more information, and I can try to provide it.

image
image

@dev-john-nguyen
Copy link

@fatiherdogan01
This provides a temporary fix for the bad access issue I encountered, but it doesn't address the underlying problem. The app isn't releasing memory, so it will eventually be terminated once it consumes too much.

@dev-john-nguyen
Copy link

dev-john-nguyen commented Oct 7, 2024

@tmikov I invoked gc every second on both runtimes and it didn't help much. Pretty much still has the same behavior. Memory usage drops slightly and then stagnates.

image

@tmikov
Copy link
Contributor

tmikov commented Oct 7, 2024

@dev-john-nguyen So, let's recap:

  • You reported that your app's memory goes up to 1.9GB of memory usage before it crashes soon after.
  • At the same time we saw that the memory directly managed by Hermes is about 20MB.
  • Lastly, you tried running gc() periodically with no change.

I think this is very strong evidence that this is not memory managed or leaked by Hermes, directly or indirectly.

So, if Hermes is not allocating the 1.9GB, you need to identify who is. 1.9GB is quite a lot of data, so hopefully it should not be that difficult.

By looking at all allocations before the crash, you should be able to get some idea of what the majority are. Are they malloc() for example?

(Again, there is always a theoretical possibility that Hermes has a catastrophic bug, so we can never completely exclude that, but either way we need to find out where the data is allocated)

@trungledangAxonActive
Copy link

trungledangAxonActive commented Oct 9, 2024

My app is using almost 1GB (from 400MB to 1GB) but the js_allocatedBytes and js_heapSize for the JS memory, and at js_externalBytes seem like to be allocated too much. I just hide and show the same markers on the map (I'm using Google map and there's also a memory leak of Google Map but I resolved it, but the memory still increases though), I still see RAM increases.
image

{
  "js_VMExperiments": 0,
  "js_allocatedBytes": 168924512,
  "js_externalBytes": 10627489,
  "js_gcCPUTime": 41.120000000000665,
  "js_gcTime": 44.91600000000184,
  "js_heapSize": 188743680,
  "js_mallocSizeEstimate": 0,
  "js_markStackOverflows": 0,
  "js_numGCs": 7547,
  "js_totalAllocatedBytes": 29481105976,
  "js_vaSize": 188743680
}

One thing I notice in Xcode is the CoreFoundation's memory increases the more I use the app

First time using After a while using
image Screenshot 2024-10-09 at 16 39 12

@wcandillon
Copy link
Author

The comments in this issue seem to support the idea that it would be more scalable to build comprehensive documentation to support library developers, rather than trying to address every question individually.
It might also be more scalable to document that library developers should use Nitro modules for building JSI-based APIs, since Nitro modules aim to encompass all the best practices on that topic. It would be hard, if not impossible, for another library developer to catch up with these best practices, in my opinion (e.g native state, setExternalMemoryPressure).

On my side, I am in the process of migrating to Nitro modules and will report back on the observable behaviors I encounter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants