Mobile health and privacy: cross sectional studyBMJ 2021; 373 doi: https://doi.org/10.1136/bmj.n1248 (Published 17 June 2021) Cite this as: BMJ 2021;373:n1248
- Gioacchino Tangari, postdoctoral research fellow1,
- Muhammad Ikram, lecturer1,
- Kiran Ijaz, postdoctoral research fellow2,
- Mohamed Ali Kaafar, professor1,
- Shlomo Berkovsky, professor2
- 1Department of Computing, Macquarie University, Sydney, NSW, Australia
- 2Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
- Correspondence to: M Ikram @midkhan on Twitter) (or
- Accepted 16 May 2021
Objectives To investigate whether and what user data are collected by health related mobile applications (mHealth apps), to characterise the privacy conduct of all the available mHealth apps on Google Play, and to gauge the associated risks to privacy.
Design Cross sectional study
Setting Health related apps developed for the Android mobile platform, available in the Google Play store in Australia and belonging to the medical and health and fitness categories.
Participants Users of 20 991 mHealth apps (8074 medical and 12 917 health and fitness found in the Google Play store: in-depth analysis was done on 15 838 apps that did not require a download or subscription fee compared with 8468 baseline non-mHealth apps.
Conclusions This analysis found serious problems with privacy and inconsistent privacy practices in mHealth apps. Clinicians should be aware of these and articulate them to patients when determining the benefits and risks of mHealth apps.
With the improved accessibility of smartphone devices, mobile applications (or apps) available through a variety of marketplaces have grown exponentially. As of 2021, almost 2.87 million apps were available on the Google Play store alone.1 Two popular apps come under the categories of medical and health and fitness. Referred to collectively as mobile health or mHealth apps, such apps encompass a wide range of functions, from the management of health conditions and symptom checking to step and calorie counters and menstruation trackers.2 Mobile health is a booming market that targets not only patients and clinicians but also those with an interest in health and fitness.
Although the potential of mHealth apps to improve access to real time monitoring and health care resources is well established,34 they pose problems concerning data privacy because of the sensitive information they can access, the use of a business model that is centred on selling subscriptions or sharing user data,5 and the lack of enforcement of privacy standards around the world. For example, the European Union General Data Protection Regulation6 (GDPR) defines eight rights of individual users, and several rules implemented under the US Health Insurance Portability and Accountability Act7 (HIPAA) establish a baseline of privacy protection and patient rights.
In line with the HIPAA, the US Food and Drug Administration released guidance for the postmarket management of cybersecurity in medical devices in 2016.8 The FDA recommended that manufacturers of medical devices (ie, app developers) should incorporate risk management into the life cycle of their products and implement controls to ensure that the devices were secure and protected patients. Specifically, the guidance covers cybersecurity and privacy factors and stipulates risk management programmes that “address vulnerabilities which may permit the unauthorized access, modification, misuse, or the unauthorized use of information that is stored, accessed, or transferred from a medical device to an external recipient, and may result in patient harm.”
However, regulation and guidance are difficult to enforce in practice. Several recent episodes have highlighted the problem of app data being collected and shared in an unauthorised manner. For example, a Norwegian not-for-profit organisation found that 10 popular apps, including one on health and fitness, shared data with advertising companies without informed user consent, in a clear breach of GDPR.9 Forty one popular apps, some developed by leading technology companies, have been called out by the Chinese Ministry of Industry and Information Technology for illegal data collection.10 A 2019 decision by CNIL, the French data protection authority, found Google to be in breach of the principle of transparency11 because the information on the use of personal data was presented in a vague manner that was difficult to understand.
Because of the inadequate privacy disclosures of top mHealth apps,412 we used a suite of app collection and analysis tools to carry out a large scale privacy analysis of mHealth apps and performed a privacy audit of more than 20 000 mHealth apps available in the Google Play store, the largest mobile app marketplace.13
Since 2015, app marketplaces such as Google Play and Apple Store have grown by about 38%, and are expected to generate 111.1 billion apps by 2025.19 The number of mHealth apps available in app stores continues to increase.20 Of the 2.8 million apps on Google Play and the 1.96 million apps on Apple Store, an estimated 99 366 belong to medical and to health and fitness categories. These apps account for 2% (47 890) of those available through Google Play and 3% (51 476) available through the Apple store.2122 Our analysis focused on Google Play, the largest app store, which virtually covers all the Google Play mHealth apps accessible from Australia, as a proxy for the worldwide Google Play app marketplace.
mHealth app dataset
Google Play does not provide a complete list of mHealth apps and its search functionality does not show all the available apps. To overcome this problem and to detect as many mHealth apps as possible, we developed a crawler that interacted directly with the app store’s interface.23 Starting from the top 100 medical and health and fitness apps on Google Play, the crawler systematically searched through other apps considered to be similar by Google Play. For each app, the crawler collected several metadata: app category and price, locations where the app is available, app description, number of installs, developer information, user reviews, and app rating. From 1 October to 15 November 2019, the crawler searched through more than 1.7 million apps.
We selected apps belonging to the medical and health and fitness categories on Google Play. Overall, we identified 20 991 mHealth apps, of which 15 893 (75.7%) were free to download, 3 228 (15.4%) were purchased instore, and 1 872 (8.9%) were geoblocked (that is, could not be downloaded in Australia). In addition, we used the crawler to sample a random set of popular non-mHealth apps to be used as a baseline comparator. This set contained 8 468 apps from the tools, communication, personality, and productivity categories. Table 1 shows the dataset characteristics.
We analysed the mHealth app files and source code (static analysis), investigated the network traffic generated during execution of the app (dynamic analysis), and inspected reviews provided by users of the apps (fig 1).
Third party presence in app resources—to retrieve and classify all third party libraries included in the app, we performed a dictionary based search of the folder containing the decoded app files and embedded libraries. To achieve this, we used a comprehensive dictionary of third party libraries,25 which comprises 338 third parties, including adverts (eg, GoogleAds); analytics (eg, GoogleAnalytics); utilities (eg, Github); and other social, banking, and gaming services (eg, Facebook or PayPal).
Data collection operations in the app code—we extracted the set of Android operating system functions associated with access to users’ personal data. For example, the presence of the function android.telephony.TelephonyManager.getLine1Number in the app code indicates the retrieval of the user’s contact phone number. In addition, we extracted the set of permissions requested by the app to access components of the operating system such as contact list or global positioning system (GPS) location. Using the permissions, we checked whether each data collection function had all the required authorisations for execution, and, if not, it was discarded. The final set of functions represented all the potential data collection in the app: in practice, it is a superset of the actual user data collection, because some parts of the app code might rarely (or never) be triggered during execution of the app.
Traffic analysis—we intercepted and analysed all the network traffic generated by the apps during the execution of automated app testing.28 To achieve this, we built a dedicated testbed composed of a smartphone that connects to the internet through a computer configured as a WiFi access point, which runs a tool29 intercepting all the traffic transmitted to the internet. Each of the 15 893 downloaded free apps were individually tested (apps purchased in-store or geoblocked were excluded): for each app, on average we performed 35 different activities (eg, opened app, opened menu, clicked on button) in a 180 second test session.
The intercepted traffic was analysed as follows:
Adverts and trackers in app traffic—we extracted the communications with external advert and tracking services—most likely third party recipients of personal data.30 To isolate the traffic components associated with adverts and trackers, we used two comprehensive filter lists: EasyList,31 an advert block list, and EasyPrivacy,32 a supplementary block list for tracking.
Personal data transmission in app traffic—we identified the transmissions of user data performed by the apps during testing. A machine learning method33 was used to find personally identifiable information in the app traffic considered to be the specific device identifier (eg, Android ID), user identifier (eg, name or email), credentials (eg, password), or location. The machine learning was trained on a large public dataset of annotated mobile app traffic flows34 and yielded a validation accuracy of 97%, with 97% precision and 96% recall. The result only includes data collection practices that are actually performed when the app is used; this set is, however, not complete owing to coverage limitations of dynamic app testing—which might not trigger some menus, views, or functionalities of the app. For this reason, we studied the user data collection in mHealth apps by leveraging both the app code and the app traffic.
Secure transmission of user data—using the HTTP/HTTPS protocol we measured the fractions of user data transmissions. Whereas HTTP based communications are unencrypted, HTTPS encrypts all messages to protect app users from malicious data interception and content tampering. In the light of recent reports of widespread internet surveillance35 and legislation permitting internet service providers to sell user information extracted from network traffic,36 the adoption of the HTTPS protocol is essential to protect users’ privacy.30
App review analysis—to obtain the complete list of reviews for each app we downloaded the content of the app’s page in the Google Play store. After excluding those reviews with no text, we obtained a dataset of 2 130 684 reviews for 6 938 mHealth apps, of which 366 198 (17.2%) referred to medical apps and 1 764 486 (82.8%) to health and fitness apps. We categorised these reviews as positive (4 or 5 stars), negative (1 or 2 stars), or neutral (3 stars), resulting in 1 788 463 (83.9%) positive reviews and 235 210 (11.0%) negative reviews.
Patient and public involvement
No patients or members of the public were directly involved in the study. The subject of the study was mHealth mobile apps publicly available on Google Play. The data collection and analysis methods leveraged an automated testing platform designed by the authors, not requiring the involvement of mHealth app users or developers. Likewise, we analysed public app reviews from Google Play, which were voluntarily contributed by the app users. To raise awareness of privacy risks in mHealth, we plan on sharing the collected datasets, the analyses library, and our findings with clinicians, patients, app developers, and the public.
Personal data collection practices
The analysis of apps files and codes identified 65 068 data collection operations; on average four for each app. This result provided the broad set of all information that the apps can potentially access and share with third parties. At the same time, analysis of apps traffic identified 3148 transmissions of user data across 616 (3.9%) different apps. The main types of data collected by mHealth apps include contact information, user location, and several device identifiers. Part of these identifiers (specifically, international mobile equipment identity (IMEI), a unique identifier used for fingerprinting mobile phones; media access control (MAC), a unique identifier of the network interface in the user’s device; and international mobile subscriber identity (IMSI), a unique number that uniquely identifies every user of a cellular network) are unique and persistent (ie, they are immutable and cannot be changed or replaced) and can be used by third parties to track users across networks and applications. Supplementary appendix A provides further details about the collected data types.
Most of the mHealth apps included codes for collecting the MAC identifiers (67.0% (14 064) of apps) and app cookies (64.0% (13 434) of apps; fig 2)—that is, small text files used for customising web browsing and app experience, but also for generating online user profiles. Other common types of data were the user’s email address and current cell tower location (33.0% (6927) and 25.0% (5248) of apps, respectively). User data transmissions were observed in 3.9% (616) of mHealth apps, mostly for health and fitness apps (fig 3). This percentage is substantial and should be taken as a lower bound for the real data transmissions performed by the apps, because some transmissions might not be triggered in automated app testing. The most common transmissions were for contact (user’s first or full name) and location (eg, zipcode; fig 3). When compared with baseline (non-mHealth) apps, mHealth apps, especially medical ones, were considerably less likely to collect personal data (fig 2).
Third parties that can access the personal data were also studied by distinguishing between collection on behalf of the first party (app’s own entities and domains) and collection on behalf of third party services (eg, external adverts, analytics, and tracking providers). The results show a predominant role of third parties (fig 4); 54 155 of 61 920 data collection operations in the app codes (87.5%, fig 4) were related to third party services—that is, they originated from third party libraries embedded in the apps. The result might in part overestimate the actual role of these services, as some embedded libraries may never be used. The strong presence of third parties, however, was confirmed by the apps’ traffic, where 1756 of 3148 detected transmissions of user data (55.8%, fig 5) were towards third party servers.
Third party data recipients
Overall, 665 unique third party entities were identified, of which a small list of prominent third parties (the top 50) were responsible for most data collection operations in app code, and data transmissions in app traffic (68.0% (2140), collectively).
Third party presence—in general, a strong integration (in app code and files) and interaction (in app traffic) with third parties indicated an increased collection of user data by these services. This is crucial, as these entities might also share personal information with commercial partners or transfer the information as a business asset.
To quantify the third parties in the app code, the number of third party libraries for each app was measured across the different app categories. Although 63.0% (13 224) of mHealth apps embedded at least one third party service, this proportion was substantially lower than for non-mHealth apps (table 2). In particular, only 6.0% (1260) of mHealth apps included six or more third party libraries compared with 43.0% (3641) of non-mHealth apps. Although medical and health and fitness categories showed similar trends, health and fitness apps integrated slightly more third party libraries. This difference could explain why data collection operations were less common in medical apps (fig 2).
Table 2 also reports the fractions of communications with third party services in the app traffic, focusing on advert and tracking services (other third-party services (eg, social, widgets) have negligible presence in the intercepted traffic). mHealth apps tended to have fewer interactions with advert and tracking services than non-mHealth apps. For example, advert related traffic was observed for only 5.3% (1103) of mHealth apps compared with 18.0% (1526) of non-mHealth apps. Supplementary appendix C shows the top 10 mHealth apps for presence of adverts, along with popular health and fitness apps.
Most common third parties—third party libraries Google Ads (adverts) and Google Analytics (analytics) were detected in mHealth apps code and files in 45.3% (3659) of medical apps and almost 50.0% (6453) of health and fitness apps (fig 6). Results were mainly consistent across the two mHealth app categories, although mHealth apps incorporated fewer Facebook widgets. Similarly, compared with non-mHealth apps, mHealth apps adopted SquareApp payment and Amazon services less often. The most common advert and tracking services contacted by the apps were Google ads (domains googlesyndication.com and doubleclick.net, which indicate the use of Google AdSense or Google Ad Manager for loading and managing adverts) and trackers (domain google-analytics.com) (fig 7).
Third party data collection in app code—a substantial fraction (34.0% (7137)) of the data collection operations in the app code were associated with Google services, and there was also a significant presence of Facebook (14.0% (2939) of apps embedded Facebook cookies), Flurry analytics (6.3% (1322) of apps), and PayPal payment service (table 3). The services most included in the app resources (eg, Google and Facebook libraries) were also prevalent in the data collection operations identified in the app code. Contact data were mainly shared with analytics services (eg, Google’s crashalytics.com), whereas the location and device ID transmissions were mainly towards adverts (eg, Liftoff app marketing) and smartphone notification services (eg, Pushwoosh).
Privacy conduct issues
Insecure transmission of user data—as much as 23.0% (724) of transmissions took place on unencrypted HTTP traffic, with unencrypted transmissions being particularly common for sensitive data such as contact password and GPS location. Supplementary appendix E provides a detailed breakdown of insecure data transmission by user data type.
User complaints in app reviews
The main complaints raised by mHealth app users were extracted from negative app reviews (ratings with two stars). Supplementary appendix F lists 41 keywords mapped to six complaint categories that were searched through the review texts. For example, the keyword “crash” was mapped to the complaint category “bugs,” whereas the keyword “private” was mapped to “privacy.” A scan of the 235 210 negative reviews yielded a set of 288 238 user complaints, of which 58 349 referred to medical apps and 229 889 to health and fitness apps.
When those apps targeted by adverts, trackers, and privacy complaints were investigated further, a correlation was observed between the presence of the complaints and the actual behaviour of the app. Specifically, apps associated with complaints about adverts or trackers embedded more third party libraries, which suggests an increasing penetration of adverts and trackers. When reviews included direct complaints about privacy, the apps had more personal data collection operations incorporated in their code (supplementary appendix G provides further details).
Our analysis, performed on a set of 20 991 mHealth apps, showed that most of the apps (88.0%, 18 472) could access and potentially share personal data. The transmission of user information in the app traffic was detected for 3.9% (616) of apps; however, the transmission obtained in automated app testing was a lower bound of the real data sharing by the apps. We also observed that, compared with baseline non-mHealth apps, the mHealth apps included fewer data collection operations in their code, transmitted fewer user data, and showed a reduced penetration of third party services. Health and fitness apps were generally more likely to collect and share user information than medical apps, and integration of adverts and tracking services was also more pronounced (fig 6 and fig 7). Among the data that mHealth apps could collect, we found an important presence of persistent device identifiers and user contact information. The persistent device identifiers allowed individuals to be tracked over time and across different services, whereas the contact information directly affected an individual’s privacy.
The role of third parties was predominant—more than 87.0% (54 155) of data collection practices were carried out on behalf of external services. Notably, 50 prominent services were responsible for roughly 70.0% (43 344) of the data collection operations in apps code and the data transmissions in apps traffic. In the analysed app set, Google owned services were the most common. This probably relates to the dominant position of Google’s analytics and advert services and reflects the choice of Google Play store as the source of our app dataset. Android apps leverage support tools (eg, for reporting bugs) that directly report to Google, which might share additional information on devices. Hence, we would expect a slightly less pronounced role of Google for mHealth apps in the Apple store.
Strengths and limitations of this study
Strengths of our study included the sample size and the comparison between the behaviour of mHealth apps and that of non-mHealth (baseline) apps. We also determined the type of user information mHealth apps can retrieve and share, with our analysis building on both static app resources (application code and files) and dynamically generated app traffic.
To scale up the study and cope with a large number of mHealth apps, we leveraged automated analysis tools as well as modern machine learning techniques. Although the validity of the accuracy of these techniques was high (>96% for both the detection of user data transmissions and the disclosure of privacy practices), these techniques might still generate limited false positives. To deal with the scale of the app set, our live testing of mHealth apps relied heavily on extensive randomised interactions as opposed to hand crafted app usage patterns and profiles, with the drawback that some parts of the applications (eg, tabs, views, menus) might have not been triggered during testing. Owing to the number of available apps, we restricted our analyses to free apps. This restriction might have introduced a bias, because the business models of instore purchasable apps depend less on selling user data,5 and therefore retrieve fewer user data, with a reduced presence of adverts and trackers. However, we believe that this should not have affected the generalisability of our findings, because up to 15.4% (3228) of mHealth apps found on Google Play could be purchased (table 1).
Comparison with previous studies
mHealth apps and associated privacy risks have received much attention from the research community. Huckvale et al investigated the privacy of 79 health and wellness mobile apps accredited by the UK’s national health service15 and found that most of the apps (78%, 62) that transmitted user information did not describe their data collection practices in the privacy policies. When the researchers assessed the privacy practices of 36 top ranked apps for smoking cessation and depression, they found that only a small fraction (12 of 29) disclosed the transmission of data to Facebook or Google in their privacy policies.4 While these studies focused on consistency between the data collection practices and privacy policies of mHealth apps, the study by Grundy et al focused on the recipients of user information collected by 24 medical apps.14 Their findings on the prevalence of analytics and advert services among user data recipients is in line with our results.
Our study analysed more than 20 000 mHealth apps on Google Play, 15 838 in detail, rather than the tens of apps assessed in previous studies.4121415 The only other study to analyse a comparable range of mHealth apps was conducted in 2015.18 That study, however, only categorised mHealth apps into classes of potential risk (low, medium, high risk of privacy leaks), while not providing any results on the type of user information collected, recipients of the information, and consistency of the app practices with the disclosed privacy policies.
Our study presents a broad assessment of mHealth apps compared with previous studies. In previous studies, the analysis was generally restricted to the data transmitted by mHealth apps14 or to the consistency of the apps with their privacy policies.1215 We analysed the privacy risks associated with mHealth apps by considering the information the apps transmit or can access through their code, the potential recipients of this information, and the correct disclosure of data sharing practices.
Considering the concentration of user data transmission towards dominant third party services, our findings on mHealth apps are aligned with recent large scale analyses of tracking and data sharing ecosystem in mobile apps.394041 An analysis of 959 426 apps found that most trackers embedded in the apps were linked to a small number of commercial entities, with Google the most prominent.39 Similarly, traffic analysis of 14 599 Android apps found that despite owning just 3.9% (616) of all third party tracking services, Google was present in 50.8% (10 657) of the analysed apps.40
Our results show that the collection of personal user information is a pervasive practice in mHealth apps, and not always transparent and secure. Patients should be informed on the privacy practices of these apps and the associated privacy risks before installation and use. Clinicians should understand the main privacy aspects of mHealth apps in their specialist area, along with their key functionalities, and be able to articulate these to patients in lay language. This is important because of the scarcity of app privacy auditing tools and the substantial lack of information on the user data flows in the apps—neither Google Play store nor the Apple store currently provide such auditing functionalities.
For most of the 20 000 medical and health and fitness apps analysed, we found that most can collect and potentially share data with third parties, including advertising and tracking services. The apps collected user data on behalf of hundreds of third parties, with a small number of service providers accounting for most of the collected data. The analysis also revealed that mHealth apps were far from transparent when dealing with user data, with only about half being compliant with their declared privacy policies (if available at all).
Mobile apps are fast becoming sources of information and decision support tools for both clinicians and patients. Such privacy risks should be articulated to patients and could be made part of app usage consent. We believe the trade-off between the benefits and risks of mHealth apps should be considered for any technical and policy discussion surrounding the services provided by such apps.
What is already known on this topic
Mobile applications (apps) often collect user data and share it with developers’ controlled servers as well as external third party, commercial entities
Mobile health (mHealth) apps pose concerns about privacy owing to the sensitive user information they can access
Inadequate privacy disclosures have been repeatedly identified for top mHealth apps, preventing users from making informed choices around the data
What this study adds
88% of the 20 991 mHealth apps included in this study could access and potentially share personal data
mHealth apps collected less user data than other types of mobile apps
Data collection in mHealth apps was found to be far from transparent and secure, and often exceeded what is publicly disclosed by app developers
Contributors: GT designed the study, led the data analysis, and wrote the first draft of the manuscript. MI secured funding, designed the study, led the data collection, and analysed the data. MI is the guarantor. KI collected the data and analysed the user reviews. MAK helped to design the study and acquired funding. SB designed the study and acquired funding. All the authors critically revised the manuscript drafts and approved the submission. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding: This work was funded by Optus Macquarie University Cyber Security Hub; the research was also supported by the National Health and Medical Research Council (NHMRC) grant APP1134919 (Centre for Research Excellence in Digital Health). GT and KI were supported by a postdoctoral fellowship from Macquarie University. Optus Macquarie University Cyber Security Hub and the NHMRC Centre of Research Excellence in Digital Health had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from the Optus Macquarie University Cyber Security Hub and the National Health and Medical Research Council Centre of Research Excellence in Digital Health for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: Technical appendix, statistical code, and dataset available from the corresponding author at https://mhealthapps2020.github.io/.
The manuscript’s guarantor (MI) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned have been explained.
Dissemination to participants and related patient and public communities: We will release all our dataset and analysis script for further research at https://mhealthapps2020.github.io/.
Provenance and peer review: Not commissioned; externally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.