Optimizing App Startup Time
Optimize App startup time
optimizing app startup time
https://developer.apple.com/videos/play/wwdc2016/406/
[ Music ] Good morning and welcome to session 406, Optimizing App Startup Time. [Music] Good morning everyone, and welcome to Lecture 406: Optimizing App Startup Time.
My name is Nick Kledzik, and today my colleague Louis and I are going to take you on a guided tour of how a process launches. My name is Nick Kledzik, and today my colleague Louis and I are going to take you on a tour of how a process is initiated.
Now you may be wondering, is this topic right for me. Now you may be thinking, is this topic suitable for me?
So we had our crack developing marketing team do some research, and they determined there are three groups that will benefit by listening to this talk. So we had our awesome marketing team do some research and they determined there were three groups of people who would benefit from hearing this talk.
The first, is app developers that have a app that launches to slowly. The first is an application developer whose application is slow to launch.
The second group, is app developers that don’t want to be in the first group [laughter]. The second group, are the application developers who don’t want to be in the first group [laughter].
And lastly, is anyone who’s just really curious about how the OS operates. Finally, who is curious about how an operating system works?
So this talk is going to be divided in two sections, the first is more theory and the second more practical, I’ll be doing the first theory part. So this talk will be divided into two parts, the first part is more theoretical and the second part is more practical, I will talk about the first part.
And in it I’ll be walking you through all the steps that happen, all the way up to main. I’m going to take you through all the steps that happen, all the way to the main body.
But in order for you to understand and appreciate all the steps I first need to give you a crash course on Mach-O and Virtual Memory. But in order for you to understand and appreciate it I first need to give you a crash course in analog computing and virtual memory.
So first some Mach-O terminology, quickly. First some Mach-O terminology.
Mach-O is a bunch of file types for different run time executables. Mach-O is a set of file types for different runtime executables.
So the first executable, that’s the main binary in an app, it’s also the main binary in an app extension. The first executable file is the main binary file in the app and the main binary file in the app extension.
A dylib is a dynamic library, on other platforms meet, you may know those as DSOs or DLLs. A dylib is a dynamic library, and on other platforms you may know it as DSOs or dlls.
Our platform also has another kind of thing called a bundle. Our platform also has another thing called bundle.
Now a bundle’s a special kind of dylib that you cannot link against, all you can do is load it at run time by an dlopen and that’s used on a Mac OS for plug-ins. A bundle is a special dylib that you can’t link against, all you can do is load it at runtime via dlopen, which is used for plugins in Mac OS.
Last, is the term image. Finally there are images.
Image refers to any of these three types. Image refers to any of these three types.
And I’ll be using that term a lot. I will use this word a lot.
And lastly, the term framework is very overloaded in our industry, but in this context, a framework is a dylib with a special directory structure around it to holds files needed by that dylib. Finally, the term framework is very overloaded in our industry, but in this context, a framework is a dylib that has a special directory structure around it to hold the files that that dylib needs.
So let’s dive right into the Mach-O image format. Let’s jump right into the Mach-O image format.
A Mach-O image is divided into segments, by convention all segment names are, use upper case letters. A Mach-O image is segmented into segments, and by convention all segment names are in uppercase letters.
Now, each segment is always a multiple of the page size, in this example the text is 3 pages, the DATA and LINKEDIT are each one page. Now, each segment is always a multiple of the page size, in this example text is 3 pages and data and LINKEDIT are both 1 page.
Now the page size is determined by the hardware, for arm64, the page size is 16K, everything else it’s 4k. Now the page size is determined by the hardware, for arm64 the page size is 16K, others are 4k.
Now another way to look at the thing is sections. Another approach is segmentation.
So sections is something the compiler omits. So part of it is ignored by the compiler.
But sections are really just a subrange of a segment, they don’t have any of the constraints of being page size, but they are non-overlapping. But section is actually just a subrange of a segment. They have no page size limit, but they do not overlap.
Now, the most common segment names are TEXT, DATA, LINKEDIT, in fact almost every binary has exactly those three segments. Nowadays, the most common segment names are TEXT, DATA, and LINKEDIT. In fact, almost every binary has these three segments.
You can add custom ones but it usually doesn’t add any value. You can add custom ones, but usually no values will be added.
So what are these used for? Well TEXT is at the start of the file, it contains the Mach header, it contains any machine instructions as well as any read only constant such as c strings. What are these for? The text is at the beginning of the file, it contains the Mach header, which contains any machine instructions as well as read-only constants, such as c strings.
The DATA segment is rewrite, the DATA segment contains all your global variables. The data segment is rewritten and contains all global variables.
And lastly, is the LINKEDIT. Finally, LINKEDIT.Now the LINKEDIT doesn’t contain your functions of global variables, a LINKEDIT contains information about your function of variables such as their name and address. LINKEDIT does not contain functions of global variables, it contains information about variable functions, such as their names and addresses.
You may have also heard of universal files, what are they? Well suppose you build an iOS app, for a 64 bit, and now you have this Mach-O file, so what happens the next code when you say you also want to build it for 32 bit devices? When you rebuild, Xcode will build another separate Mach-O file, this one built for 32 bits, RB7. You may have also heard of universal files, what are they? Let’s say you built an iOS app for 64-bit and now you have this Mach-O file, so what happens next when you say you want to build it for 32-bit devices as well? When you rebuild, Xcode will build another separate Mach-O file, this one built for 32-bit, RB7.
And then those two files are merged into a third file, called the Mach-O universal file. These two files are then merged into a third file called the Mach-O universal file.
And that has a header at the start, and all the header has a list of all the architectures and what their offsets are in the file. It has a header at the beginning with a list of all the architectures and their offsets in the file.
And that header is also one page in size. The title is also the size of a page.
Now you may be wondering, why are the segments multiple page sizes? Why is the header a page sizes, and it’s wasting a lot of space. Now you may be wondering, why are the segments multiple page sizes? Why are the headers one page size, it wastes a lot of space.
Well the reason everything is page based has to do with our next topic which is virtual memory. The reason everything is page-based relates to our next topic and that is virtual memory.
So what is virtual memory? Some of you may know the adage in software engineering that every problem can be solved by adding a level of indirection. So what is virtual memory? Some of you may know the adage in software engineering that every problem can be solved by adding a level of indirection.
So the problem with, that virtual memory solves, is how do you manage all your physical RAM when you have all these processes? So they added a little of indirection. So the problem that virtual memory solves is, how do you manage all the physical RAM when you have all these processes? So they add some indirection.
Every process is a logical address space which gets mapped to some physical page of RAM. Each process is a logical address space, which is mapped to a certain physical page of RAM.
Now this mapping does not have to be one to one, you could have logical addresses that go to no physical RAM and you can have multiple logical addresses that go to the same physical RAM. Now, this mapping doesn’t have to be one-to-one, you can have logical addresses that don’t go to physical RAM, or you can have multiple logical addresses that go to the same physical RAM.
This offered lots of opportunities here. This provides many opportunities.
So what can you do with VM? Well first, if you have a logical address that does not map to any physical RAM, when you access that address in your process, a page fault happens. So what can a VM do? First, if there is a logical address that is not mapped to any physical RAM, and when you access that address in a process, a page fault will occur.
At that point the kernel stops that thread and tries to figure out what needs to happen. At this point, the kernel stops the thread and tries to figure out what needs to happen.
The next thing is if you have two processes, with different logical addresses, mapping to the same physical page, those two processes are now sharing the same bit of RAM. Next, if you have two processes with different logical addresses, mapped to the same physical page, then both processes now share the same bits of RAM.
You now have sharing between processes. You can now share between processes.
Another interesting feature is file backed mapping. Another interesting feature is file support mapping.
Rather than actually read an entire file into RAM you can tell the VM system through the mmap call, the I want this slice of this file mapped to this address range in my process. You can tell the VM system via the mmap call that instead of reading the entire file into RAM, I want this portion of the file to be mapped to this address range in the process.
So why would you do that? Well rather than having to read the entire file, by having that mapping set up, as you first access those different addresses, as if you had read it in memory, each time you access an address that hasn’t been accessed before it will cause a page fault, the kernel will read just that one page. So why would you do that? Rather than having to read the entire file, set up by the map, when you first access these different addresses, if you read it in memory, each time you access an address that hasn’t been accessed before, it will cause a page fault and the kernel will read a page.
And that gives you lazy reading of your file. This will make you lazy to read your document.
Now we can put all these features together, and what I told you about Mach-O you now realize that the TEXT segment of any of that dylib or image can be mapped into multiple processes, it will be read lazily, and all those pages can be shared between those processes. Now we can put all these features together, and I told you Mach-o you now realize that the text segment of any one dylib or image can be mapped into multiple processes, it will be read lazily, and all those pages can be shared between those processes.
What about the DATA segment? The DATA segment is read, write, so for that we have trick called copy on write, it’s kind of similar to the, cloning that seen in the Apple file system. What about the data segment? The data segment is read and written, so we have what is called a copy-on-write technique, which is somewhat similar to cloning as seen in Apple’s file system.What copy and write does is it optimistically shares the DATA page between all the processes. What copy and write do is optimistically share data pages between all processes.
What happens when one process, as long as they’re only reading from the global variables that sharing works. What happens when a process shares work as long as they only read from global variables.
But as soon as one process actually tries to write to its DATA page, the copy and write happens. But as soon as a process actually tries to write to its data pages, a copy and write occurs.
The copy and write causes the kernel to make a copy of that page into another physical RAM and redirect the mapping to go to that. Copy and write cause the kernel to copy the page into another physical RAM and redirect the mapping into that physical RAM.
So that one process now has its own copy of that page. So a process now has its own copy of the page.
Which brings us to clean versus dirty pages. This brings us to the clean and dirty page.
So that copy is considered a dirty page. So that copy is considered a dirty page.
A dirty page is something that contains process specific information. Dirty pages are pages that contain process-specific information.
A clean page is something that the kernel could regenerate later if needed such as rereading from disc. A clean page is one that the kernel can regenerate if needed later, such as rereading from disk.
So dirty pages are much more expensive than clean pages. So dirty pages are much more expensive than clean pages.
And the last thing is the permission boundaries are on page boundaries. The final point is that permission boundaries are on page boundaries.
By that I mean the permissions are you can mark a page readable, writable, or executable, or any combination of those. What I mean by permissions is that you can mark a page as readable, writable, executable, or any combination of these.
So let’s put this all together, I talked about the Mach-O format, something about virtual memory, let’s see how they play together. Let’s put this together, I talked about the Mach-O format, a little bit about virtual memory, and let’s see how they work together.
Now I’m going to skip ahead and talk a little, how the dyld operates and in a few moments I’ll actually walk you through this but for now, I just want to show you how this maps between Mach-O and virtual memory. Now I’m going to skip a little bit and talk a little bit about how dyld works, and I’ll walk you through this in detail in a moment, but for now, I just want to show you how this maps between Mach-O and virtual memory.
So we have a dylib file here, and rather than reading it in memory we’ve mapped it in memory. We have a dylib file here, and instead of reading it in memory, we map it in memory.
So, in memory this dylib would have taken eight pages. In memory, this dylib takes 8 pages.
The savings, why it’s different is these ZeroFills. What makes it different is these zero paddings.
So it turns out most global variables are zero initially. So most global variables start out as 0.
So the static [inaudible] makes an optimization that moves all the zero global variables to the end, and then takes up no disc space. So the static [inaudible] was optimized to move all the 0 global variables to the end and then take up no disk space.
And instead, we use the VM feature to tell the VM the first time this page is accessed, fill it with zero’s. Instead, we use a VM feature to tell the VM to fill it with zeros the first time it accesses this page.
So it requires no reading. So it doesn’t require reading.
So the first thing dyld has to do is it has to look at the Mach header, in memory, in this process. So the first thing dyld does is look at the Mach header in memory during this process.
So it’ll be looking at the top box in memory, when that happens, there’s nothing there, there’s no mapping to a physical page so a page fault happens. It will see the top box in memory and when it happens, there is nothing there, no mapping to the physical page so the page fault occurs.
At that point the kernel realizes this is mapped to a file, so it’ll read the first page of the file, place it into physical RAM, set the mapping to it. At this point, the kernel realizes that the map is to a file, so it reads the first page of the file, puts it into physical RAM, and sets the map to it.
Now dyld can actually start reading through the Mach header. Now, dyld can start reading the Mach header.
It reads through the Mach header, the Mach header says oh, there’s some information in the LINKEDIT segment you need to look at. It will read the Mach header, and the Mach header will say that there is some information in the LINKEDIT fragment that it needs to see.
So again, dyld drops down what’s in the bottom box in process one. Once again, dyld puts down the contents of the bottom box from process 1.
Which again causes a page fault. This will again cause a page fault.
Kernel services it by reading into another physical page of RAM, the LINKEDIT. The kernel services it by reading into another physical page LINKEDIT of RAM.
Now dyld can expect a LINKEDIT. Now, dyld can look forward to LINKEDIT.
Now in process, the LINKEDIT will tell dyld, you need to make some fix ups to this DATA page to make this dylib runable. Now, during this process, LINKEDIT will tell dyld that you need to make some fixes to this data page to make this dylib runnable.
So, the same thing happens, dyld is now, reads some data from the DATA page, but there’s something different here. The same thing happens, dyld now reads some data from the data page, but something is different here.
dyld is actually going to write something back, it’s actually going to change that DATA page and at this point, a copy on write happens. dyld will write something back, it will change that data page, and at this time, copy-on-write will occur.
And this page becomes dirty. This page is getting dirty.So what would have been 8 pages of dirty RAM if I just malloced eight pages and then the read the stuff into it I would have eight pages of dirty RAM. So what is the dirty memory of 8 pages? If I misplace 8 pages and then read in, I will have 8 pages of dirty memory.
But now I only have one page of dirty RAM and two clean pages. But now I only have one page of dirty memory and two pages of clean memory.
So what’s going to happen when the second process loads the same dylib. So what happens when the second process loads the same dylib.
So in the second process dyld goes through the same steps. In the second pass, dyld goes through the same steps.
First it looks at the Mach header, but this time the kernel says, ah, I already have that page in RAM somewhere so it simply redirects the mapping to reuse that page no iO was done. First it looks at the Mach header, but this time the kernel says, ah, I already have that page somewhere in RAM, so it just redirects the map to reuse that page, no iOs.
The same think with LINKEDIT, it’s much faster. Like LINKEDIT, it’s faster.
Now we get to the DATA page, at this point the kernel has to look to see if the DATA page, the clean copy already still exists in RAM somewhere, and if it does it can reuse it, if not, it has to reread it. Now let’s look at the data page, at this point the kernel has to look at the data page and see if a clean copy still exists somewhere in RAM, if it does it can reuse it, if not it has to re-read it.
And now in this process, dyld will dirty the RAM. In the process, dyld pollutes RAM.
Now the last step is the LINKEDIT is only needed while dyld is doing its operations. The last step is LINKEDIT which is only required when dyld performs the operation.
So it can hint to the kernel, once it’s done, that it doesn’t really need these LINKEDIT pages anymore, you can reclaim them when someone else needs RAM. It can hint to the kernel that once it’s done, it doesn’t need those LINKEDIT pages anymore and you can reclaim them when someone else needs the RAM.
So the result is now we have two processes sharing these dylibs, each one would have been eight pages, or a total of 16 dirty pages, but now we only have two dirty pages and one clean, shared page. The result is that now we have two processes sharing these dylibs, each with 8 pages, or 16 dirty pages in total, but now we only have two dirty pages and one clean shared page.
Two other minor things I want to go over is that how security effects dyld, these two big security things that have impacted dyld. The other two small things I want to talk about are how security affects dyld, and these two big things affect dyld.
So one is ASLR, address space layout randomization, this is a decade or two old technology, where basically you randomize the load address. One is ASLR, Address Space Layout Randomization, which is a technology from ten to twenty years ago where basically you randomize load addresses.
The second is code signing, it has to, many of you have had to deal with code signing, in Xcode, and you think of code signing as, you run a cryptographic hash over the entire file, and then sign it with your signature. The second is code signing. A lot of people have dealt with code signing in Xcode. Code signing is just running a cryptographic hash on the entire file and then signing it with a signature.
Well, in order to validate that run time, that means the entire file would have to be re-read. In order to verify the runtime, this means the entire file must be re-read.
So instead what actually happens at build time, is every single page of your Mach-O file gets its own individual cryptographic hash. So what actually happens at build time is that each page of the Mach-O file has its own cryptographic hash.
And all those hashes are stored in the LINKEDIT. All these hash tables are stored in LINKEDIT.
This allows each page to be validated that it hasn’t been tampered with and was owned by you at page in time. This verifies each page to ensure it has not been tampered with and is the one you owned when you were on the page.
Okay, so we finished the crash course, now I’m going to walk you from exec to main. Okay, we’ve completed the crash course, now I’m going to take you from exec to main.
So what is exec? Exec is a system call. So what is exec? Exec is a system call.
When you trap into the kernel, you basically say I want to replace this process with this new program. When you get stuck in the kernel, you can basically say I want to replace this process with this new program.
The kernel wipes the entire address space and maps in that executable you specified. The kernel clears and maps the entire address space in the specified executable.
Now for ASLR it maps it in at a random address. For ASLR, it maps to a random address.
The next thing it does is from that random, back down to zero, it marks that whole region inaccessible, ad by that I mean it’s marked not readable, not writeable, not executable. The next thing it does is from that random, back to 0, it marks the entire area as inaccessible, I mean it marks it as unreadable, unwritable, unexecutable.
The size of that region is at least 4KB to 32 bit processes and at least 4GB for 64 bit processes. The size of this area should be at least 4KB for 32-bit processes and 4GB for 64-bit processes.
This catches any NULL pointer references and also foresees more bits, it catches any, pointer truncations. It catches any null pointer references, and predicting more bits, it catches any, pointer truncation.
Now, life was easy for the first couple decades, of Unix because all I do is map a program, set the PC into it, and start running it. For the first few decades, life on Unix was simple because all I did was map a program, put my PC in it, and start running it.
And then shared libraries were invented. Then, shared libraries were invented.So who loads dylibs? They quickly realize that they got really complicated fast and the kernel people didn’t want the kernel to do it, so instead a helper program was created. So who installed the dylibs? They quickly realized that they were getting very complex very quickly and the kernel people didn’t want the kernel to do it, so a helper program was created.
In our platform it’s called dyld. Our platform is called dyld.
On other Unix’s you may know it as LD. On other Unixes you may know it as LD.
so. So.
So when the kernel’s done mapping a process it now maps another Mach-O called dyld into that process at another random address. So when the kernel has finished mapping the process, it now maps another Mach-O named dyld into the process at another random address.
Sets the PC into dyld and let’s dyld finish launching the process. Set the PC to dyld and let dyld finish starting the process.
So now dyld’s running in process and its job is to load all the dylibs that you depend on and get everything prepared and running. Now that dyld is running, its job is to load all the dylibs you depend on and get everything ready and running.
So let’s walk through those steps. Let’s look at the steps.
This is a whole bunch of steps and it has sort of a timeline along the bottom here, as we walk through these we’ll walk through the timeline. It’s a series of steps and it has a timeline at the bottom and as we go through these we go through the timeline.
So first thing, is dyld has to map all the dependent dylibs. First, dyld needs to map all dependent dylibs.
Well what are the dependent dylibs? To find those it first reads the header of the main executable that the kernel already mapped in that header is a list of all the dependent libraries. What are dependent dylibs? To find these files, it first reads the main executable header that the kernel has mapped into that file header, which is a list of all dependent libraries.
So it’s got to parse that out. It needs to be parsed out.
Then it has to find each dylib. Then it has to find each dylib.
And once it’s found each dylib it has to open and run the start of each file, it needs to make sure that it is a Mach-O file, validate it, find its code signature, register that code signature to the kernel. Once it finds each dylib, it has to open and run the beginning of each file, it needs to make sure it’s a Mach-O file, verify it, find its code signature, and register the code signature with the kernel.
And then it can actually call mmap at each segment in that dylib. Then it can call mmap on every segment in the dylib.
Okay, so that’s pretty simple. Very simple.
Your app knows about the kernel dyld, dyld then says oh this app depends on A and B dylib, load the two of those, we’re done. Your application knows about the kernel dyld, and then dyld says, this application depends on the A and B dylibs, load those two and we’re done.
Well, it gets more complicated, because A. It gets more complicated because A.
dylib and B. dylib and B.
dylib themselves could depend upon the dylibs. dylibs themselves can depend on dylibs.
So dyld has to do the same thing over again for each of those dylibs, and each of the dylibs may depend on something that’s already loaded or something new so it has to determine whether it’s already loaded or not, and if not, it needs to load it. So dyld has to do the same thing for every dylib, each dylib may depend on something already loaded or newly loaded so it has to determine if it’s already loaded and if not, it needs to load it.
So, this continues on and on. This goes on.
And eventually it has everything loaded. Eventually it loaded everything.
Now if you look at a process, the average process in our system, loads anywhere between 1 to 400 dylibs, so that’s a lot of dylibs to be loaded. Now, if you look at a process, the average process on our system, has between 1 and 400 dylibs loaded, so that’s a lot of dylibs to load.
Luckily most of those are OS dylibs, and we do a lot of work when building the OS to pre-calculate and pre-cache a lot of the work that dyld has to do to load these things. Luckily, most of them are OS dylibs and we do a lot of work when building the OS to precompute and pre-cache a lot of the work it takes for the dyld to load these things.
So OS dylibs load very, very quickly. So OS dylibs load very very fast.
So now we’ve loaded all the dylibs, but they’re all sitting in their floating independent of each other, and now we actually have to bind them together. Now that we have all the dylibs loaded, but all in their own floats, we need to bind them together.
That’s called fix-ups. This is what is called can be arranged.
But one thing about fix-ups is we’ve learned, because of code signing we can’t actually alter instructions. But one thing we’ve learned about fixing is that we can’t actually change the instructions because of code signing.
So how does one dylib call into another dylib if you can’t change the instructions of how it calls? Well, we call back our old friend, and we add a lot of old indirection. So, how does one dylib call another dylib if you can’t change the instructions of how it’s called? Well, let’s call an old friend back and throw in a lot of indirect speech.
So our code-gen, is called dynamic PIC. So our code generation is called dynamic PIC.
It’s positioned independent code, meaning the code can be loaded into the address and is dynamic, meaning things are, addressed indirectly. It is location-independent code, which means code can be loaded into addresses, and dynamic, which means things are addressed indirectly.What that means is to call for one thing to another, the co-gen actually creates a pointer in the DATA segment and that pointer points to what you want to call. This means that to call from one thing to another, co-gen actually creates a pointer in the data segment that points to the thing to be called.
The code loads that pointer and jumps to the pointer. The code loads that pointer and jumps to it.
So all dyld is doing is fixing up pointers and data. So what dyld does is fix the pointers and data.
Now there’s two main categories of fix-ups, rebasing and binding, so what’s the difference? So rebasing is if you have a pointer that’s pointing to within your image, and any adjustments needed by that, the second is binding. Now there are two main types of fixes, rebasing and binding, so what’s the difference? So, if you have a pointer to your image and need any adjustments, the second one is binding.
Binding is if you’re pointing something outside your image. Binding is when you point to something outside of your image.
And they each need to be fixed up differently, so I’ll go through the steps. They each require a different fix, so I’ll cover the steps one by one.
But first, if you’re curious, there’s a command, dyld info with a bunch of options on it. But first, if you’re curious, there’s a command, dyld info, with a lot of options on it.
You can run this on any binary and you’ll see all the fix-ups that dyld will have to be doing for that binary to prepare it. You can run this on any binary and you will see all modifications made by dyld to that binary.
So rebasing. So repricing.
Well in the old age you could specify a preferred load address for each dylib, and that preferred load address was the static linker and dyld work together such that, if you load, it to that preferred load address, all the pointers and data that was supposed to code internally, were correct and dyld wouldn’t have to do any fix-ups. In old age you can specify the priority load address for each dylib, and the priority load address is the static linker and dyld work together, if you load, the priority load address, all the pointers and data should be in the internal code, be correct and dyld doesn’t have to do anything to arrange it.
But these days, with ASLR, your dylib is loaded to a random address. But now, with ASLR, your dylib is loaded to a random address.
It’s slid to some other address, which means all those pointers and data are now still pointed to the old address. It slipped to another address, which means all those pointers and data now still point to the old address.
So in order to fix those up, we need to calculate the slide, which is how much has it moved, and for each of those interior pointers, to basically add the slide value to them. To solve these problems, we need to calculate the slider, that is, how much it has moved, and for each internal pointer, we need to add the value of the slider to them.
So rebasing means going through all your data pointers, that are internal, and basically adding a slide to them. So relocating means iterating through all the data pointers, which are internal, and basically adding a slide to them.
So the concept is very simple, read, add, write, read, add, write. The concept is simple, read, add, write, read, add, write.
But where are those data pointers? Where those pointers are in your segment, are encoded in the LINKEDIT segment. But where are those data pointers? These pointers are in your segment, encoded in the LINKEDIT segment.
Now, at this point, all we’ve had is everything mapped in, so when we start doing rebasing, we’re actually causing page faults to page in all the DATA pages. Now, at this point, what we have is everything mapped to, so when we start doing rebasing, we’re actually causing page faults in all the data pages.
And then we causing copy and writes as we’re changing them. When we change them, copies and writes occur.
So rebasing can sometimes be expensive because of all the iO. So relocation can sometimes be expensive due to iOs.
But one trick we do is we do it sequentially and from the kernel’s point of view, it sees data faults happen sequentially. But one trick we’re going to do is to do it sequentially, so from the kernel’s perspective, it will see the data errors happening sequentially.
And when it sees that, the kernel, is reading ahead for us which makes the iO less costly. When it sees this, the kernel reads the data for us ahead of time, reducing the cost of iOs.
So next is binding, binding is for pointers that point outside your dylib. Next is the binding, which is a pointer to outside the dylib.
They’re actually bound by name, they’re actually is the string, in this case, malloc stored in the link edit, that says this data pointer needs to point to malloc. They’re actually bound by name, they’re actually strings, and in this example, malloc is stored in the link edit, which says that this data pointer needs to point to malloc.
So at run time, dyld needs to actually find the implementation of that symbol, which requires a lot of computation, looking through symbol tables. So at runtime, dyld needs to actually find the implementation of that symbol, which is computationally intensive and requires looking at the symbol table.
Once it’s found, that values that’s stored in that data pointer. Once it is found, that value is stored in that data pointer.
So this is way more computationally complex than rebasing is. So it’s a lot more complicated than retargeting.
But there’s very little iO because rebasing has done most of the iO already. But there are very few iOs because Chongji has already completed most of the iOs.
Next, so ObjC has a bunch of DATA structures, class DATA structure which is a pointer to its methods and a pointer to a super gloss and so forth. Next, ObjC has a bunch of data structures, the class data structures are pointers to its methods and pointers to super glossy and so on.Almost all those are fixed up, via rebasing or binding. Almost all of these are fixed via rebase or binding.
But there’s a few extra things that ObjC run time requires. But the ObjC runtime requires something extra.
The first is ObjC is dynamic language and you can request a class become substantiated by name. The first is that ObjC is a dynamic language and you can request a class to be confirmed by name.
So that means the ObjC run time has to maintain a table of all names of which class that they map to. This means that the ObjC runtime must maintain a table containing all the names of the classes to which they are mapped.
So every time you load something, it defines a class, its name needs to be registered with a global table. So every time something is loaded, it defines a class and its name needs to be registered in a global table.
Next, in C++ you may have heard of the fragile ivar problem, sorry. Next, in c++ you may have heard of the fragile ivar problem, sorry.
Fragile base class problem. Fragile base class issue.
We don’t have that problem with ObjC because one of the fix-ups we do is we change the offsets of all the ivars dynamically, at load time. ObjC doesn’t have this problem because a fix we made was to dynamically change the offsets of all ivars on load.
Next, in ObjC you can define categories which change the methods of another class. Next, in ObjC, you can define categories that alter methods of another class.
Sometimes those are in classes that are not in your image on another dylib, that, those method fix-ups have to be applied at this point. Sometimes these classes are not in the image on another dylib and these methods have to be applied at that time.
And lastly, ObjC [inaudible] is based on selectors being unique so we need unique selectors. Finally, ObjC [inaudible] is based on selectors being unique so we need unique selectors.
So now the work that we’ve done all the DATA fix-ups, now we can do all the DATA fix-ups that can be basically described statically. Now that we’ve done all the data corrections, now we can do all the data corrections that can basically be described statically.
So now’s our chance to do dynamic DATA fix ups. Now it’s time for us to do dynamic data repair.
So in C++, you can have an initializer, you can say [inaudible] equals whatever expression you want. In C++, you can have an initializer and you can say [inaudible] equals any expression you want.
That arbitrary expression, at this time needs to be run and it’s run at this point now. This arbitrary expression, now it needs to be run, has been run.
So the C++ compiler generates, initiliazers for these arbitrary DATA initialization. The C++ compiler generates initializers for these arbitrary data initializations.
In ObjC, there’s something called the +load method. In ObjC, there is a method called +load.
Now the +load method is deprecated, we recommend that you don’t use it. The +load method is now deprecated and we recommend that you do not use it.
We recommend you use a plus initialize. We recommend that you use +initialization.
But if you have one, it’s run at this point. But if you have one, it runs at this point.
So, now I have this big graph, we have your main executable top, all the dylibs depend on, this huge graph, we have to run initializers. Now that I have this big graph, we have your main executable on top that all the dylibs depend on, this big graph, we have to run the initializer.
What order do we want them in? Well, we run them bottom up. In what order do we want them to be arranged? Let’s count from the bottom up.
And the reason is, when an initialize is run it may need to call up some dylib and you want to make sure that dylibs already ready to be called. The reason is that when an init runs, it may need to call some dylibs, and you need to make sure the dylibs are ready to be called.
So by running the initializers from the bottom all the way up the app class you’re safe to call into something you depend on. Therefore, by running the initializer from bottom to top, you can safely call the things you depend on.
So once all initiliazers are done, now we actually finally get to call the main dyld program. Once all the initialization routines are complete, now we can finally call the main dyld program
So you survived this theory part, you now all are experts on how processes start, you now know that dyld is a helper program, it loads all dependent libraries, fixing up all the DATA pages, runs initializers and then jumps to main. So you got through this theory part, you are now all experts on how processes are started, and you now know that dyld is a helper program that loads all dependent libraries, fixes all data pages, runs the initializer, and then jumps to main.
So now to put all this theory you’ve learned to use, I’d like to hand it over to Louis, who will be giving you some practical tips. Now, to put the theory you’ve learned into practice, I’d like to hand it over to Louis, who will give you some practical advice.
Thanks, Nick. Thank you, Nick.
We’ve all had that experience where we pull our phone out of our pocket, press the home button, and then tap on an application we want to run. We’ve all been there: we pull our phone out of our pocket, press the home button, and click on the app we want to run.
And then tap, and tap, and tap again on some button because it’s not responding. Then click, and click, and click some button again because it’s not responding.
When that happens to me, it’s really frustrating, and I want to delete the app. When this happens to me I get really frustrated and I want to delete the app.
I’m Louis Gerbarg I work on dyld and today, we’re going to discuss how to make your app launch instantly, so your users are delighted. I’m Louis Gerbarg, I work at dyld, and today we’re going to talk about how to make your application launch instantly so your users will be happy.So first off, let’s discuss what we’re going to go through in this part of the talk. First, let’s discuss this part.
We’re going to discuss how fast you actually need to launch so that your users are going to have a good experience. We’ll discuss how fast you actually need to give your users a good experience.
How to measure that launch time. How to measure launch time.
Because it can be very difficult. Because it’s very difficult.
The standard ways you measure your application don’t apply before your code can run. Until the code can run, the standard methods of measuring an application don’t apply.
We’re going to go through a list of the common reasons why your code, or sorry we’re going to go through a list of, why, the common reasons your launch can be slow. We will list some common reasons to explain why your code starts slowly. Sorry, we will list some common reasons to explain why your code starts slowly.
And finally, we’re going to go through, a way to fix all the slow downs. Finally, what we want to talk about is the solution to all slowness problems.
So I’m going to give you a little spoiler for the rest of my talk. So in the rest of my speech, I’m going to tell you a little episode.
You need to do less stuff [laughter]. You need to do less [laughter].
Now, I don’t mean your app should have less features, I’m saying that your app has to do less things before it’s running. Now, I’m not saying your app should have fewer features, I’m saying your app has to do fewer things before it can run.
We want you to figure out how to defer some of your launch behaviors in order to initialize them just before execution. We hope you can figure out how to delay some startup behaviors so that they are initialized before execution.
So, let’s discuss the goals, how fast we want to launch. So, let’s talk about goals, how fast do we want to launch.
Well, the launch time for various platforms are different. The startup time is different for different platforms.
But, a good, a good rule of thumb, is 400 milliseconds is a good launch time. However, a good rule of thumb is that 400 milliseconds is a good startup time.
Now, the reason for that is that we have launch animations on the phone to give a sense of continuity between the home screen and your application, when you see it execute. The reason for this is that we started the animation on the phone to give them a sense of continuity when you see the home screen and your apps executing.
And those animations take time, and those animations, give you a chance to hide your launch times. These animations take time, and these animations give you the opportunity to hide your startup time.
Obviously that may be different, in different context your app extensions are also applications that have to launch, they launch in different amounts of time. Obviously, this may be different, in different circumstances your application extensions are also the applications that need to be launched, and they are launched at different times.
And a phone and TV, and a watch are different things, but 400 milliseconds is a good target. Cell phones, TVs, watches are different things, but 400 milliseconds is a good target.
You can never take longer than 20 seconds to launch. The launch time cannot exceed 20 seconds.
If you take longer than 20 seconds, the OS will kill your app, assuming it’s going through an infinite loop, and we’ve all had that experience. If you take longer than 20 seconds, the system will kill your app, assuming it’s going through an infinite loop, we’ve all been there.
Where you click an app, it comes up to a home screen, it doesn’t respond, and then it just goes away, and that’s usually what’s happening here. When you tap an app, it appears on the home screen, it becomes unresponsive, and then it disappears, which is usually what happens here.
Finally, it’s very important to test on your slowest supported device. Finally, it’s important to test on the slowest supported device.
So those timers are constant values across all supported devices on our platforms. So these timers are constant for all supported devices on our platform.
So, if you hit 400 milliseconds on a iPhone 6S that you’re using for testing right now, you’re probably just barely hitting it, you’re probably not going to hit it on a iPhone 5. So, if you tap for 400 milliseconds on the iPhone 6S you’re testing, you might just tap it, and you might not tap it on the iPhone 5.
So let’s do a recap of Nick’s part of the talk. Let’s review Nick’s speech.
What do we have to do to launch, we have to parse images, map images, rebase images, bind images, run image initializers, and then call main. What we need to do to get started, we need to parse the image, map the image, reset the image, bind the image, run image initialization, and then call main.
If that sounds like a lot, it is, I’m exhausted just saying it. If that sounds like a lot, it is, and I’m exhausted just saying it.
And then after that, we have to call UIApplicationMain, you’ll see that in your ObjC apps or in your Swift apps handled implicitly. Then, we need to call UIApplicationMain, which you will see implicitly handled in your ObjC app or Swift app.
That does some other things, including running the framework initializers and loading your nibs. It can also do a few other things, including running framework initializers and loading nibs.
And then finally you’ll get a call back in your application delegate. Finally, you’ll get a callback in the app delegate.
I’m mentioning these last two because those are counted in those 400 milliseconds times that I just mentioned. I mention the last two because they are calculated using the 400 milliseconds I just mentioned.
But we’re not going to discuss them in this talk. But we won’t discuss them in this talk.If you want a better view of what goes on there, there’s a talk from 2012, iOS app performance responsiveness. If you want to better understand what’s going on there, there’s a 2012 talk, iOS App Performance Responsiveness.
I highly recommend you go back and view the video. I highly recommend you go back and watch the video.
But that’s the last we’re going to speak of them right now. But that’s the last one we’re going to talk about now.
So, let’s move on, one more thing I want to talk about, warm versus cold launches. So, let’s move on to, the other thing I want to say, hot emission and cold emission.
So when you launch an app, we talk about warm and cold launches. When you start an application, we discuss warm start and cold start.
And a warm launch is an app where the application is already in memory, either because it’s been launched and quit previously, and it’s still sitting in the discache in the kernel, or because you just copied it over. While a warm boot is when the application is already in memory because it has been started and exited before, it is still in the kernel’s disk, or because you just copied it.
A cold launch is a launch where it’s not in the discache. A cold boot is a boot that is not on disk.
And a cold launch is generally the more important to measure. Cold emission is often the more important measurement.
The reason a cold launch is more important to measure is that’s when your user is launching an app after rebooting the phone, or for the first time in a long time, that’s when you really want it to be instant. The more important reason for a cold start is that when your user launches the app after restarting the phone, or for the first time in a long time, you really want it to be instant.
In order to measure those, you really need to reboot between measurements. In order to measure these, you do need to reboot between measurements.
Having said that, if you’re working on improving your warm launches, your cold launches will tend to improve also. Having said that, if you are improving warm starts, cold starts will also improve.
You can do rapid development cycles on warm launches, but then every so often, test with a cold launch. You can run fast development cycles on warm starts, but occasionally use cold starts for testing.
So, how do we measure time before main? Well, we have a built in measurement system in dyld, you can access it through setting an environment variable. So, how do we measure the time before main? We have a built-in measurement system in dyld, which you can access by setting an environment variable.
DYLD Print Statistics. DYLD prints data.
And it’s been available in shipping OSes actually, but it prints out a lot of internal debugging information that’s not particularly useful, it’s missing some information that you probably want. It’s actually already available in released operating systems, but it prints out a lot of internal debugging information that isn’t particularly useful, and it loses some of the information you might want.
And we’re fixing that today. We are addressing this issue today.
So it’s significantly improved on the new OSes. So it’s significantly improved on the new operating system.
It’s going to put out, a lot more relevant information for you that should give you actionable ways to improve your launch times. It will provide you with more relevant information that will give you actionable ways to improve your publishing time.
And it will be available in seed 2. It will be available in seed 2.
So, one other thing I want to talk about with this, is that the debugger has to pause launch on every single dylib load in order to parse the symbols from your app and load your break points, over a USB cable that can be very time consuming. So, another thing I would say is that the debugger has to pause startup every time the dylib is loaded in order to resolve symbols in the application and load breakpoints via the USB cable, which can be very time consuming.
But dyld knows about that and it subtracts the debugger time out from the numbers it’s registering. But dyld knows this and it subtracts the debugger time from the registered number.
So you don’t have to worry about it, but you notice it because dyld’s going to give you much smaller numbers than you’ll observe by looking at the clock on the wall. So you don’t have to worry about it, but you will notice it because dyld will give you a much smaller number than what you would see if you looked at the clock on the wall.
That’s expected and understood, and it’s everything’s going correctly if you see that, but I just wanted to make note of it. This is expected and understandable, and if you see it, it’s all normal, but I just wanted to write it down.
So let’s move on, to setting an environment variable in Xcode, you just go to the scheme editor, and you add it like this. Let’s go ahead and set an environment variable in Xcode, you just go to the scheme editor and add it like this.
Once you do that you’ll get the new console log into the output, console output logged. Once you do this, you will get the new console log to output, console output log.
And what does that look like? Well this is what the output looks like, and we have a time bar on the bottom representing the different parts of it. What does that look like? This is what the output looks like, with a time bar at the bottom to represent the different parts of it.
And let’s add one more thing. Let’s add one more thing.
Let’s add an indicator for that 400 milliseconds target, which this app I’m working on is not hitting. Let’s add an indicator for the 400 millisecond target, which is not being hit with this app I’m using.
So, if you look in, this is in order basically the steps that Nick discussed in order to launch an app so let’s just go through them in order. If you look at this, this is basically the steps that Nick discussed about launching an application. Let’s look at them in order.So dylib loading, the big thing to understand about dylib loading and the slowdown that you’ll see from it, is that embedded dylibs can be expensive. So dylib loading, an important issue in understanding dylib loading, and the slowdown you’re going to see, is that embedded dylibs can be expensive.
So Nick said an average app can be 100 to 400 dylibs. Nick said that an application can have anywhere from 100 to 400 dylibs on average.
But OS dylibs are fast because when we build the OS, we have ways of pre-calculating a lot of that data. But operating system dylibs are fast because when we build an operating system, we have ways to precompute large amounts of data.
But we don’t have every dylib in every app when we’re building the OS. But when building an operating system, not every application has dylibs in it.
We can’t pre-calculate them for the dylibs you embed with your app, so we have to go through a much slower process as we load those. We can’t precompute them for dylibs embedded in the application, so when loading them we have to go through a much slower process.
And the solution for this is that we just need to use fewer dylibs and that can be rough. The solution to this problem is that we just need to use less dylibs, which can be crude.
And I’m not saying you can’t use any, but there are a couple of options here you can merge existing dylibs. I’m not saying you can’t use any, but here are two options you can use to merge existing dylibs.
You can use static archives and link them into both, into apps that way. You can use static archives and link them into both applications.
And you have an option to lazy load, which is to use dlopen, but dlopen causes some subtle performance and correctness issues, and it actually results in doing more work later on, but it is deferred. You can choose lazy loading, even using dlopen, but dlopen causes some subtle performance and correctness issues, it actually causes more work to be done later, but it’s delayed.
So, it’s a viable option but you should think long and hard about it and, I would discourage it if at all possible. So, it’s a viable option, but you should think about it and I’d dissuade you if possible.
So, I have an app here that currently has 26 dylibs, And it’s taking 240 milliseconds just to load those, but if I change it and merge those dylibs into two dylibs, then it only takes 20 milliseconds to load the dylibs. I have an app that currently has 26 dylibs and it takes 240ms to load these, but if I change it and merge these dylibs into two dylibs, it only takes 20ms to load these dylibs.
So I can still have dylibs, I can still use them to share, functionality between my app and my extension, but, limiting them will be very useful. So I can still use dylibs, I can still use them to share functionality between my app and my extensions, however, limiting them would be very useful.
And I understand this is a tradeoff you’re making between your development convenience and your application launch time for your users. I understand this is a trade-off you make between development convenience for your users and application startup time.
Because the more dylibs that you have the easier it is to build and re-link your app in and the faster your development cycles are. Because the more dylibs there are, the easier it is to build and relink the application, and the faster the development cycle will be.
So you absolutely can and should use some, but it’s good to try to target a limited number, we would, I would say off hand, a good target’s about a half a dozen. So you absolutely can and should use some, but it’s better to target a limited number and we would, I would say, a good target is about six.
So now that we’ve fixed up our dylib count let’s move on to the next place where we’re having a slowdown. Now that we’ve solved the dylib count issue let’s move on to the next slowdown.
Between 350 milliseconds in binding and rebasing. 350 milliseconds between binding and relocation.
So as Nick mentioned, rebasing tends to be slower due to iO and binding tends to be computationally expensive but it’s already done the iO. As Nick mentioned, rebasing will be slower due to iO and the computational cost of binding will be higher but it is done with iO.
So that iO is for both of them and they’re comingled, the timing’s also comingled. So iO was for both of them, they came and the time came.
So if we go in and look at that, all that is fixing up pointers in the DATA section. If we go in and look at it, all of these are pointers to the fixed data section.
So what we have to do, is just fix up fewer pointers. So what we have to do is fix fewer pointers.
Nick showed you a tool you can run to see what pointers are being fixed up in the DATA, section, dyld info. Nick shows you a tool you can run to see what pointers are being fixed in DATA, section, dyld info.
And it shows what segments and sections things are in, so that will give you a good idea of what’s being fixed up. It shows parts and pieces of things, which gives you a good idea of what is being restored.
For instance, if you see a symbol to an ObjC class in ObjC section, that’s probably that you have a bunch of ObjC classes. For example, if you see symbols for ObjC classes in the ObjC section, you probably have a bunch of ObjC classes.
So, one of the things you can do is you can just to reduce the number of ObjC classes object and ivars that you have. One thing you can do is reduce the number of ObjC class objects and ivars you have.
So there are a number of coding styles that are encouraging very small classes, that maybe only have one or two functions. Therefore, there are many coding styles that encourage very small classes, which may only have one or two functions.And, those particular patterns may result in gradual slowdowns of your applications as you add more and more of them. And, as you add more and more apps, these specific patterns can cause your apps to gradually slow down.
So you should be careful about those. So you should be careful with these.
Now having 100 or 1,000 classes isn’t a problem, but we’ve seen apps with 5, 10, 15, 20,000 classes. Now having 100 or 1000 classes is not a problem, but we have seen applications with 5, 10, 15, 20 thousand classes.
And in those cases that can add up to 7 or 800 milliseconds to your launch time for the kernel to page them in. In these cases, the boot time for the kernel to page them in will increase by 7 or 800 milliseconds.
Another thing you can do is you can try to reduce your use of C++ virtual functions. Another thing you can do is minimize your use of C++ virtual functions.
So virtual functions create what we call V tables, which are the same as ObjC metadata in that in the sense that they create structures in the DATA section that have to be fixed up. So virtual functions create what we call V-tables, which are the same as ObjC metadata because they create structures in the data part that need to be fixed.
They’re smaller than ObjC, they’re smaller than ObjC metadata but they’re still significant for some applications. They are smaller than ObjC and smaller than ObjC metadata but they are still important to some applications.
You can use Swift structs. Swift structs can be used.
So Swift tends to use less data that has pointers for fix-ups of this sort. Therefore, Swift tends to use less data that has pointers for such fixes.
And, Swift is more inlinable and can better co-gen to avoid a lot of that, so migrating to Swift is a great way to improve this. Also, Swift is easier to embed and works better together to avoid this, so moving to Swift is a great way to improve.
And one other thing, you should be careful about machine generated codes, so we have instances where, you may describe some structures in terms of a DSL or some custom language and then have a program that generates other code from it. Another thing, you should be careful with machine generated code, so we have examples where you describe some structure in a DSL or custom language and then have a program generate other code from it.
And if those generated programs have a lot of pointers in them, they can become very expensive because when you generate your code you can generate very, very large structures. If these generated programs have a lot of pointers in them, they can become very expensive because when you generate code you can generate very, very large structures.
We’ve seen cases where, this causes megabytes and megabytes of data. We have seen that this can result in megabytes of data.
But the upside is you usually have a lot of control because you can just change the code generator to use something that’s not pointers, for instance offset based, structures. But the benefit is that you usually have a lot of control, since you can change the code generator to use something that isn’t a pointer, such as an offset-based structure.
And that will be a big win. This will be a huge victory.
So in this case, let’s look at what’s going on here with me, with my load time. In this case, let’s look at what’s going on here, load time.
And I have at least 10,000 classes, I actually have 20,000, so many it scrolled off the slide. I have at least 10,000 courses, I actually have 20,000 courses, a lot of them just rolled off the slides.
And if I cut it down to 1,000 classes, I just cut my launch times, my time in this part of the launch from 350 to 20 milliseconds. If I reduce it to 1000 classes I reduce my startup time, I go from 350ms to 20ms for this part.
So, now, everything but the initializer is actually below that 400 millisecond mark, so we’re doing pretty good. Now, except for the initializer, everything else is under 400ms and we’re doing great.
So for ObjC set up, well Nick mentioned everything it had to do. For the creation of ObjC, Nick mentioned everything it had to do.
It had to do class registration, it has to deal with the non-fragile ivars, it has to do category registration and it has to do selector uniqing. It has to do class registration, it has to handle non-friable ivars, it has to do class registration and it has to do selector uniqueness.
And I’m not going to spend much time on this one at all, and the reason I’m not is, we solved all of those by fixing up the rebasing and data, and binding before. I’m not going to spend too much time on this, and the reason I’m not going to spend too much time on this is that we solved all of these issues by re-establishing the base and data, as well as the previous bindings.
All the reductions there are going to be the same thing you want to do here. All reductions here are the same.
So we just get a little bit of a free win here, it’s small. So we got a little free win here, it’s small.
It’s 8 milliseconds. This is 8 milliseconds.
But we didn’t do anything explicit for it. But we didn’t do anything explicit.
And now finally, we’re going to look at my initializers which are the big 10 seconds here. Finally, we come to my initializers and they are the most important 10 seconds here.
So I’m going to go a little more in depth on this than Nick did. So I’m going to go a little bit deeper than Nick.
There are two types of initializers, explicit initializers, things like +load. There are two types of initializers, explicit initializers such as +load.
As Nick said we recommend replacing that with +initialize, which will cause the ObjC run time to initialize your code when the classes were substantiated instead of when the file is loaded. As Nick said, we recommend replacing it with +initialize, which will cause the ObjC runtime to initialize the code when the class is confirmed, rather than when the file is loaded.Or, in C/C++ there’s an attribute that can be put onto functions which will cause it to, generate those as initializers, so that’s an explicit initializer, that we just rather you didn’t use. Alternatively, in C/C++, there is an attribute you can put on a function that will generate those initializers, which is an explicit initializer and we just hope you don’t use it.
We rather you replace them with call site initializers. We would rather you replace them with call site initializers.
So by call site initializers I mean things like dispatch once. By calling the site initializer, I mean something like dispatching it once.
Or if you’re in cross platform code, pthread once. Alternatively, if you are using cross-platform code, use pthread once.
Or if you’re in C++ code, std once. Or if you’re in c++ code, std once.
All these functions have basically the same sort of functionality where, any code in one of these blocks will be executed the first time its hit and only that. All of these functions have basically the same functionality, any code within a block will be executed on the first hit and that’s it.
Dispatch once is very, very optimized in our system. Scheduling once is very, very optimized in our system.
After the first execution of it, it’s basically equivalent to a no op running past it, so I highly recommend that instead of using, explicit initializers. After the first execution, it’s basically equivalent to the no op running on top of it, so I strongly recommend against using explicit initializers.
So let’s move on to implicit initializers. Let’s continue talking about implicit initializers.
So implicit initializers are what Nick described mostly from C++ globals with non-trivial initializers, with non-trivial constructors. Therefore, application initializers are described by Nick in c++ global variables, which include non-trivial initializers and non-trivial constructors.
And one option is you can replace those with call site initializers like we just mentioned. One option is that you can replace them with the call site initializers we just mentioned.
There’s certainly places where you can place globals with non-global structures or pointers to objects that you will initialize. Of course you can put global variables with non-global structures somewhere, or pointers to objects to be initialized.
Another option is that you don’t have non-trivial initializers. Another option is that you don’t have a non-trivial initializer.
So in C++ there’s initializers called a POD a plain old data. In c++ there is an initializer called POD a plain old data.
And if you’re objects are just plain old datas, the static, or the static linker will pre-calculate all the data for the DATA section, lay it out as just data seen there, it doesn’t have to be run, it doesn’t have to be fixed up. If your objects are just plain old data, static, or static the linker will precompute all the data in the data section, like the data you see here, it doesn’t need to be run, and it doesn’t need to be fixed.
Finally, it can be really hard to find these, because they’re implicit, but we have a warning in the compiler -Wglobal-constructors and if you do that it will give you warnings whenever you’re generating one of these. Finally, it’s hard to find these because they’re implicit, but we have a warning in the compiler-wglobal-constructor that will give you a warning if you generate one of these.
So it’s good to add that to the flags your compiler uses. So it’s better to add it to the flags used by the compiler.
Another option is just to rewrite them in Swift. Another option is to quickly rewrite them.
And the reason is, Swift has global variables and they’ll be initialized, they’re guaranteed to be initialized before you use them. The reason is that Swift has global variables that are initialized, and they are guaranteed to be initialized before you use them.
But the way it does it, instead, is instead of using an initializer, it, behind the scenes, uses dispatch once for you. But the way it’s implemented is that, behind the scenes, instead of using an initializer, it uses a dispatch for you.
It uses one of those call site initializers. It uses one of the call site initializers.
So moving to Swift will take care of this for you, so I highly encourage it that’s an option. So moving to Swift will help you solve this problem, so I highly encourage you to do that as an option.
Finally, in your initializers please don’t call dlopen, that will be a big performance hit for a bunch of reasons. Finally, in your initializer, don’t call dlopen, this will be a big performance hit for many reasons.
When dyld’s running it’s before the app has started and, we can do things like turn off our locking, because we’re single threaded. When dyld is running it has not started yet, we can turn off the lock because we are single threaded.
As soon as dlopens happened, in those situations, the graph of how our initializers have to run changes, we could have multiple threads, we have to turn on locking, it’s just going to be a big performance mess. Once the dlopen happens, in these cases, how does the initializer run the changed graph, we may have multiple threads and we have to open the lock, which will be a huge performance mess.
You also can have subtle deadlocking and undefined behaviors. You can also have subtle deadlocks and undefined behavior.
Also, please don’t start threads in your initializers, basically for the same reason. Also, please don’t start threads in initializers, for basically the same reason.
You can set up a mute text if you have to and mute text even have like, preferred mute texts even have, predefined static values that you can set them up with that run no code. You can set a silent text if you must, silent text, and there are even, preferred silent text, predefined static values that you can set without running code.But actually starting a thread in your initializer is, potentially a big performance and correctness issue. But actually starting a thread in an initializer is a potential performance and correctness issue.
So here we have some code, I have a C++ class with a non-trivial initializer. Here we have some code, I have a c++ class which has an important initializer.
I’m having trouble with the connection. There is a problem with my connection.
Please try again in a moment. Please try again later.
Well, thank you Siri. Thank you, Siri.
I’m having a, I have a non-trivial initializer. I have a non-trivial initializer.
And I guess I had it in for debugging all commented out and okay, I’m down to 50 milliseconds, total. I think I’ve debugged it all commented out and it’s only 50ms total.
I have plenty of time to initialize my nibs and do everything else, we’re in very good shape. I had enough time to initialize the nib and do other stuff and we were in pretty good shape.
So now that we’ve gone through that, let’s talk about what we should know if you just, this was really long and pretty dense. Now that we’ve covered that, let’s talk about what we should know if you’re just, this one is really long and dense.
The first one is please use dyld print statistics to measure your times, add it to your performance or aggression suites. The first one is please use dyld to print statistics to measure your times, add this to your performance or attack suite.
So you can track how your app is performing over time, so as you’re actively doing something you don’t find it months later and have trouble debugging it. So you can track your application’s performance over time, so while you’re actively working on something, you lose track of it months later and it’s difficult to debug.
You can improve your app launch time by, reducing the number of dylibs you have, reducing the amount of ObjC classes you have, and eliminating your static initializers. You can improve your application’s startup time by reducing the number of dylibs, reducing the number of ObjC classes, and eliminating static initializers.
And you can improve in general by using more Swift because it just does the right things. Often, you can improve by using faster speeds because it’s the right thing to do.
Finally, dlopen usage is discouraged, it causes subtle performance issues that are hard to diagnose. Finally, the use of dlopen is discouraged and can cause subtle performance issues that are difficult to diagnose.
For more information you can see the URL up on screen. See the URL on the screen for more information.
There are several related sessions later in the week and again, there’s the app performance session from 2012 that goes into the other parts of app launch, that highly recommend you watch, if you’re interested. There will be a couple of related sessions later this week, the 2012 App Performance session will cover other parts of app publishing and is highly recommended if you’re interested.
Thank you for coming everybody, have a great week. Thank you all for coming and have a great week.
读完之后,下一步看什么
如果还想继续了解,可以从下面几个方向接着读。