The Making of Our Crypto Wallet Security Ranking

UPDATE: It’s here! Coinspect’s Wallet Security Ranking has officially launched, and we’re genuinely excited to share the insights we’ve uncovered. Dive in and see how your favorite wallets score!

Coinspect Wallet Security Ranking is designed to help wallet developers, security researchers, and web3 users better understand and improve crypto wallet security. Whether you’re a developer looking to enhance your product’s defenses or a user navigating the world of decentralized apps, our ranking system offers insights that will help you make informed decisions.

In this post, we’ll walk you through the crypto wallet security scoring methodology we created, the process behind it, the challenges we encountered, and what’s next.

Screenshot of Coinspect web3 wallet testing dashboard.

Why a Wallet Security Ranking?

For those following Coinspect, you know that we’ve dedicated significant research resources to tackling crypto scams and threats aimed at web3 users. While much of the web3 security focus has been on smart contracts and protecting DeFi vaults from hacks, phishing remains a persistent and under-addressed issue.

Crypto wallet security features serve as the first line of defense against phishing attacks, compromised dApps, and fraud. Developing a scalable approach to evaluate the security of software wallets, allows us to maintain an objective, independent ranking to help users make informed decisions and raises the bar for web3 security standards.

See the Ranking!

The Evolution of the Crypto Wallet Attack Surface

In Bitcoin’s early stages, the basic crypto wallet had a simple transaction validation interface and a limited attack surface; its security primarily depended on private key generation and management. Coinspect’s first blog post, published a decade ago, exposed a vulnerability in the most innovative and user-friendly multisig Bitcoin wallet at the time. Since then, we have assessed the security of numerous wallets as part of our crypto security services for clients, and reported vulnerabilities discovered through our research to improve web3 security for users.

As we began reviewing multi-sig interfaces, mobile, web, and browser extension wallets, new attack vectors emerged. It became common for us to find critical remote code execution vulnerabilities, such as JavaScript injection through links (e.g., bitcoin:// handlers), malicious server responses from third-party blockchain APIs, or integrated social features like P2P exchange chats.

dApps as Web3 Wallet Attack Vector

As the Ethereum network grew, the number of dApps surged, with many web3 front-ends being traditional web2 applications. These front-ends inherit the same cybersecurity risks (e.g., DNS hijacking) and are vulnerable to phishing attacks via cloned websites on similar-looking domains.

To bridge the gap between web front-ends and blockchain applications, web browsers with integrated wallets and wallet browser extensions emerged. The Wallet JavaScript API was developed to allow websites to interact with users’ web3 wallets. In the web3 ecosystem, wallets began exposing their API through a JavaScript object on web pages, a practice that later became standardized in EIP-1193.

As a result, the attack surface of crypto wallets expanded significantly, as malicious websites could now directly interact with wallets via the Wallet API.

Auditing One Versus Dozens of Crypto Wallets At a Time

The goal of continuously testing as many crypto wallets as possible to provide a security score and ranking requires a different approach than the one we use for conducting wallet security audits for clients.

How Do We Assess Wallet Security Individually?

Gray-box Penetration Testing

When a client hires us for a wallet audit, we typically receive access to the source code and conduct a gray-box penetration test. In this scenario, our consultants combine black-box penetration testing techniques with the insights gained from source code and documentation.

This approach is highly effective, as it allows us to thoroughly examine the code for high-risk, hard-to-detect vulnerabilities while also testing the wallet from a user’s perspective. By interacting with malicious web3 apps, we can identify user-interface (UI/UX) weaknesses that scammers often exploit.

Checklists for Individual Wallet Testing

Over the years, we’ve developed a comprehensive threat model and checklist for conducting crypto wallet security audits. While following these predefined checklists is part of our process, they serve as reminders for auditors rather than a rigid methodology. Our penetration testing remains largely manual and creative.

The goal of a time-boxed wallet penetration test is to simulate real-world attackers aiming for the highest impact with the least effort. Therefore, the objective of auditing an individual crypto wallet is to uncover as many vulnerabilities as possible within the given timeframe.

How Do We Assess Wallet Security At Scale?

Our initial effort to elevate crypto wallet security began with refining our internal checklist and publishing it as the wallet security standard, inviting the community to help improve it. However, based on the feedback we received, we identified the need for an “L2Beat for wallets,” aimed at end users. This led us to brainstorm how to address this gap.

After numerous iterations, we developed a methodology that starts as a manual process but is designed to be fully automated over time, ensuring the project’s long-term sustainability. Just as important, this approach allows us to conduct consistent and objective tests by defining the passing criteria upfront.

Interactive Testing

We quickly realized that focusing on wallet testing from the user’s perspective (i.e., black-box testing) was more effective than source code analysis for several reasons:

Many popular, good-quality wallets are not open source.
The vulnerabilities that can be identified reliably through automated source code analysis did not justify the effort.
Interactive testing is especially crucial when phishing is the main attack vector, as a well-designed user interface is often the best defense.
We also envisioned a way to automate this interactive black-box testing, making it scalable and more efficient.

Checklist for Testing Wallets at Scale

To rank multiple wallets efficiently, our checklist focuses on key security features that can be tested in a black-box scenario, targeting high-impact vulnerabilities. The wallet ranking checklists prioritize scalable, repeatable, and automatable checks, ensuring fair comparison across different wallet types. The objective of the security ranking process is not to uncover as many vulnerabilities as possible but to provide actionable recommendations to users and developers.

Developing a Wallet Security Testing Framework

The iterative process of developing our wallet security testing framework led us to focus on ensuring the following goals:

Ensure Repeatable, Objective, and Accurate Testing
Enable Result Traceability and Facilitate Peer Reviews
Simplify Testing and Prepare for Automation
Establish a Fair Scoring System
Communicate Results Clearly to All Stakeholders

These goals guided us in developing, in addition to designing the checklist and testing procedures, the following four key building blocks of our framework:

Guided Testing Wizard and Results Management
Wallet Penetration Testing dApp
A Checklist-based Scoring System
User-Friendly Web App for Communication of Results

In the following sections, we explore the benefits and challenges of each component in detail.

Guided Testing Wizard and Results Management

To ensure tests are repeatable and objective, we realized that testers needed a comprehensive guide that specifies how to perform each check, the expected correct outcomes, and which outcomes are considered invalid. As testers work through this guide, they upload evidence such as notes and screenshots for each result. By centralizing all results in a single repository, we can track and organize key information to improve the accuracy and traceability of the final results. This system also enables thorough peer reviews, ensuring the highest quality of testing. Additionally, the application allows for side-by-side comparisons of results for the same wallet from multiple independent testers.

Wallet Penetration Testing dApp

In line with the testing wizard, we also developed a web dApp to simulate various attack scenarios and ensure wallets are tested under consistent conditions. Given that testing a single wallet takes 2-4 hours on average, the dApp streamlines the process by simulating common attacks, particularly those related to phishing. This allows us to easily recreate scenarios where wallets interact with compromised or malicious dApps, centralizing all testing functionality on a single screen. Furthermore, the custom dApp lets us fine-tune parameters to trigger specific testing scenarios efficiently.

Screenshot of Coinspect penetration testing dApp.

A Scoring System for Wallet Security Features (or Lack Thereof)

As mentioned in a previous blog post, it’s clear that some security checks are more relevant than others to evaluate the overall security of a crypto wallet. For example, a vulnerability that could lead to a remote wallet-draining scenario is far more critical than a missing manual lock mechanism. Therefore, we needed a structured approach to determine their relative importance, prioritize the checks fairly, and reduce bias.

To achieve this, we adopted the Analytic Hierarchy Process (AHP). This method allowed us to assign a numerical weight to each check by conducting direct pairwise comparisons within the same category (see our Wallet Security Ranking Methodology). The criteria used for these comparisons were based on security risk, specifically the likelihood and impact of an exploit resulting from either a vulnerability or the absence of a key security feature.

Effective User-Friendly Communication of Results

Currently in its final development phase, the last essential component is a website to publish our results. Given the diverse range of users in the web3 space, from beginners to highly technical experts, it’s important to present the results in a way that clearly illustrates the overall security score while also providing insights into how each wallet performed across different categories.

Phishing Resistance as an Indicator of Overall Wallet Security

While this methodology may not directly test every security aspect, such as cryptographic robustness, our black-box tests serve as a proxy for assessing the wallet’s overall security posture. Similar to how Google’s PageRank uses links as a measurable proxy for content relevance, our interactive tests reveal deeper insights into the security of a web3 wallet.
By thoroughly testing how well a crypto wallet defends against phishing attacks, we indirectly evaluate the developer’s alignment with web3 user safety and security.
The tests in our checklist were chosen with this approach in mind. For example, the absence of basic, easily testable security features (e.g., minimum password length) often signals a development process where security is not prioritized, serving as an indicator of potential broader weaknesses. The clarity of the information presented in a token spend approval dialog is also a good indicator of the alignment of a web3 wallet with user safety.

Avoiding Score Manipulation

We need to anticipate and prevent wallet developers from gaming the scoring system by optimizing their wallets just enough to meet the minimum criteria on our security checklist, without truly enhancing overall security for users.
As previously mentioned, the ranking should not be seen as an absolute measure of security but as a reflection of the vendor’s commitment to security.

Evolving Strategies for Web3 Security Testing

While this blog post may make the journey sound straightforward, we encountered numerous obstacles that directly impacted our testing efforts and conditioned the viability of this project. We anticipate these challenges will continue to shape our work moving forward.

Adapting Security Checks to the Evolving Web3 Experience

Throughout the process, we realized that different web3 wallets use varying approaches to achieve similar outcomes that are not directly comparable. We had to carefully determine what is acceptable to mitigate specific risks and consider the range of possible countermeasures to a given threat. We anticipate smart wallets, account abstraction, MPC services, and other web3 experience improvement proposals will require adapting our current approach.

Increasing Effort

Ensuring the accuracy of test results for an increasing number of products and features in a sustainable way requires significant effort, not only in the testing itself but also in research and software development to systematize our process.

We tested 17 browser extension wallets and 26 mobile wallets across different platforms, totaling 69 wallets. Testing involves not just performing manual tests by multiple human testers but also collecting evidence and peer-reviewing results to ensure accuracy.

Periodic Retesting: As web3 wallets continue to improve and aim for better scores, we will need to retest new wallet versions. A significant challenge will be balancing the number of wallets periodically tested and the ranking’s update frequency.

Automation: To reduce manual steps and increase efficiency, we plan to evolve our testing system towards automation. Given the variety of platforms and test characteristics, this will be no easy task.

Conclusion

Developing our comprehensive crypto wallet security ranking methodology and executing extensive tests has been challenging yet rewarding. Despite the significant investment of time and resources, we are proud to contribute a valuable tool to elevate security standards across the web3 ecosystem. Our conviction of this resource’s unique and necessary value justifies our commitment to maintaining and enhancing this project.

See the Ranking!

Attention Wallet Vendors: Join Us in Elevating Web3 Security

We invite developers to connect with Coinspect to raise the standard of wallet security. Our ranking is an objective assessment designed to highlight strengths and identify areas for improvement.

By collaborating on its design and improving your wallet’s score, you can:

Showcase your commitment to user safety
Differentiate your wallet in a competitive market
Contribute to a safer web3 ecosystem for everyone

Take the Next Step

Follow Coinspect on social media and Discord to be the first to receive updates, get behind-the-scenes insights, and contribute your ideas to this project.

The Making of Our Crypto Wallet Security Ranking

Why a Wallet Security Ranking?